Some Notes on Error Analysis for Kernel Based Regularized Interpolation

Kernel based regularized interpolation is one of the most important methods for approximating functions. The theory behind the kernel based regularized interpolation is the well-known Representer Theorem, which shows the form of approximation function in the reproducing kernel Hilbert spaces. Because of the advantages of the kernel based regularized interpolation, it is widely used in many mathematical and engineering applications, for example, dimension reduction and dimension estimation. However, the performance of the approximation is not fully understood from the theoretical perspective. In other word, the error analysis for the kernel based regularized interpolation is lacking. In this paper, some error bounds in terms of the reproducing kernel Hilbert space norm and Sobolev space norm are given to understand the behavior of the approximation function.


Introduction
Approximating functions in high dimensional spaces is one of the central problems in both mathematics and engineering. Many real-world problems can be viewed as a function approximation problem. For example, the classification problem in engineering can be viewed as approximating a function whose function values give the classes that the inputs belong to [7] and in image processing, the patch-based image denoising problem can be seen as approximating a function from noisy patches to clean pixels.
Mathematically speaking, the function approximation problem is to approximate an unknown continuous function f : X → R from the knowledge of some observations {(x i , y i )} n i=1 ⊂ X × R, where X ⊂ R d , d ≥ 1 is the input space and n ∈ N is the number of observations. One of the classical ways for approximating functions to explain the real-world phenomenon is called Kriging (a.k.a Gaussian process regression) [4]. However, to use Kriging, we need a large amount of observations, which is expensive to achieve. When only a few samples are given, Kriging will yield non-stationary behavior [10].
Another alternative for approximating multivariate functions is the inverse distance weighting (IDW) method, which was originally proposed in [9]. The key assumption of IDW is that the points that are close to each other are more alike than those that are far away from each other. Therefore, IDW produces prediction at a point relying on observed points close to that point. In other word, the measured values close to the prediction location have more effects on the predicted value than those which are far away. Based on which, the approximation function can be represented aŝ where α is a parameter and usually it takes the value 2. IDW can produce good accuracy for the points that are near the observations. However, for the points that are away from the observations, IDW cannot work well. This is one drawback of IDW. Another drawback of IDW is that the gradient vanishes at the observations.
Considering the drawbacks of these methods, the kernel based regularized interpolation (see (2.2) for details) was proposed. The kernel based regularized interpolation in 1D is similar to the spline interpolation.
It has several advantages. First of all, the kernel based regularized interpolation works for any dimensions with many different situations [1,2]. It also enjoys solid theoretical understandings -the representer theorem -which we will show it below. Furthermore, comparing to other methods, one can have the same approximation accuracy using kernel based regularized interpolation by solving a better conditioned linear system. The kernel based regularized interpolation also works for the case that the observations have some noise (see Section 2 for analysis). This is one of the biggest advantages that other interpolation methods do not have. Last but not least, we have much flexibility in choosing kernels when using kernel based regularized interpolation. Kernel based regularized interpolation is also widely used in other engineering problems, for example, clustering, dimension reduction [1] and dimension estimation [11].
In view of the advantages of kernel based regularized interpolation, there is pretty much work on using kernel based regularized interpolation to deal with different situations, as we mentioned above. However, the work on the error analysis for the kernel based regularized interpolation is very limited. It is actually worthwhile to reflect on the nature of error bounds that are needed to characterize the behavior of the learned functions from observations. In this paper, we would like to provide some error analysis for the kernel based regularized interpolation from the function space point of view. In other word, we focus specifically on deriving Hilbert space type and Sobolev space type error estimates for the kernel based regularized interpolation.

Preliminary
Let us recall here some basic facts from kernel methods for our analysis. We consider a positive definite kernel K : X × X → R, i.e., for n ∈ N, x 1 , · · · , x n ∈ X and a 1 , · · · , a n ∈ R, Associated with the kernel K, there exists a unique native Hilbert space H K of functions from X to R. The elements in H K are of the form where I is a countable set. The norm defined for H K is given by because the kernel K acts as a reproducing kernel on H K , i.e., We require the kernel to be positive definite because when we approximate the functions, linear systems will be involved. The positive-definite property guarantees the linear systems are well conditioned and the approximation problem can be solved in H K using the given observations {(x i , y i )} n i=1 . Indeed, one can solve the functions approximation problem by considering the regularization problem with the regularization parameter λ > 0, The solution of this regularization problem is characterized by the Representer Theorem [8], which states that the solution of (2.1) is given as This expression is called kernel based regularized interpolation. In the rest of this paper, we use the notation I X f to stand for the kernel based regularized interpolation, i.e., To obtain error analysis, we need to have the true function. We use the notation f to represent the true function that is only known at the sampling locations, i.e., we only have the information the true function f . The goal of this work is to obtain the bounds for the error between the true function f and the kernel based regularized interpolation I X f .
It is worth mentioning that the kernel based regularized interpolation is well defined for positive definite kernels, but it is not a pure interpolation. This means that we may not have in the case of strictly positive definite kernels, the regularization problem (2.1) will become an interpolation problem which exactly interpolates f at the observations and hence the resulted kernel interpolation is a pure interpolation. However, there is no need to consider strictly positive definite kernels. Using positive definite kernel to obtain kernel based regularized interpolation is more general. This is because we have a tunable parameter λ in the regularization problem and it provides a trade-off between pointwise accuracy and stability. Another important reason is that in real-world applications, the observations may be corrupted by noise and it makes no sense to have pure interpolations anymore.

Hilbert space type error analysis
In this section, we would like to provide a special case about the error analysis for the kernel based regularized interpolation. The term "special case" is used here because we will assume in this section that the true function f is living in the RKHS. While in some cases, we cannot guarantee that the true function is within the RKHS. We will show the error analysis for the general case in the next section.
Before proceeding to the error analysis, we first recall an important concept in numerical analysis: best approximation [5]. Given f ∈ H K , define where S ⊂ H K . Then the number ρ is called the minimax error for the approximation of the function f by functions in S. In fact [5], there is a uniquep ∈ S such that The functionp ∈ S is called the best approximation of f w.r.t. the H K -norm.
Now, let us look at the inner product of the RKHS. By Cauchy-Schwarz inequality [3], we have that Based on which, we can claim from (3.1) that there exists a constant 0 < β ≤ 1 such that We then have the first error estimation.
Assume that the true function f ∈ H K . I X f is the kernel based regularized interpolation. Then we have the following error bound where C > 1, 0 < β ≤ 1 are two constants and ρ is the minimax error for the approximation of the function f by functions in S in terms of the H K -norm.
Proof. First of all, let us consider the trivial case that the true function f ∈ S ⊂ H K . In this case, the kernel based regularized interpolation is exact, meaning that we have I X f = f . Then both ||f − I X f || H K and ρ are zero. So the conclusion becomes trivial in this case.
Next, we assume that f ∈ H K but f ∈ S. Then for any g ∈ S, we have Because we have f = g, then for the right-hand side (RHS) of the above inequality, there exists a constant C > 1 such that The last inequality is due to the Cauchy-Schwarz inequality.
Therefore, we have Then by (3.3), we can obtain that which implies that This completes the proof.
In fact, this error bound can be further tightened. The improvement is due to the observation that if the true function f is within H K , the kernel based regularized interpolation is actually a projection. Let S = span{K(x, x i )} n i=1 ⊂ H K as defined in Theorem 3.1. Then if f ∈ H K , we can see that the kernel based regularized interpolation is a projection ϕ : H K → S. The most important property of a projection is the idempotence. If the function f is already in S, the projection does nothing but to keep the function, i.e., ϕ(g) = g, ∀g ∈ S. This implies the idempotence of the projector: If we define ||ϕ|| to be the operator norm, which is given by Then we have the identity where I is the identity operator. This conclusion is due to the following result given in [6]. Using the observation that the kernel based regularized interpolation is a projection if f ∈ H K , we then have the following tightened error bound.
The kernel based regularized interpolation is given by I X f = ϕ(f ), where ϕ is a projector from H K to S. Then we have the following error bounds where 0 < β ≤ 1 is a constant and ρ is the minimax error for the approximation of the function f by functions in S in terms of the H K -norm.
Proof. Since ϕ is a projection, we have ||ϕ|| = ||I − ϕ||. Based on which, we can derive that for all g ∈ S, To show the error bound, we need to show that ||ϕ|| ≤ 1 β . First, it is straightforward that if f ∈ H K , we have Since 0 < β ≤ 1, we can get 1 β ≥ 1. So for f ∈ H K , Therefore, we can obtain that This gives us the desired bound.

Sobolev space type error analysis
In the previous section, we have introduced a special case for the error analysis about kernel based regularized interpolation, where we assumed that the true function f lives in the reproducing kernel Hilbert space H K . However, in general we do not have this guarantee. The true functions are usually within a larger function space. In this section, we consider the case that the true function f is in the Sobolev spaces, which are in fact the spaces that consist of all f ∈ L p (X) with certain properties. The formal definition of the Sobolev space is given as follows.
Definition 4.1. Let k be a nonnegative integer, p ∈ [1, ∞]. The Sobolev space W k,p (X) is the set of all the functions f ∈ L p (X) such that for each multi-index α with |α| ≤ k, the αth weak derivative ∂ α f exists and ∂ α f ∈ L p (X). The norm of the sobolev space is defined as When p = 2, we usually write W k,2 (X) := H k (X). For simplicity, we replace ||f || W k,p (X) by ||f || k,p,X when no confusion and we ignore the p when p = 2.
Except for the standard norm defined for the Sobolev space, we also have the definition of the seminorm.
The standard seminorm over the space W k,p (X) is given as Similarly, we write |f | W k,p (X) as |f | k,p,X when no confusion and ignore the p when p = 2.
We now define the continuous embedding between two Banach spaces. Let W, V be two Banach spaces with V ⊂ W . We say the space V is continuously embedded in W and write V → W , if there exists a constant c > 0 such that for all v ∈ V , we have ||v|| W ≤ c||v|| V .
With these concepts in the theory of Sobolev spaces, we can then have the error bound for the kernel based regularized interpolation. We assume that the true function f is within H k (X).
When the positive definite kernel is chosen to be the polynomial kernel with degree d and k + 1 > d, we have Proof. Since k > 0, we then have H k+1 (X) → C(X), where C(X) is the set of all the continuous functions defined on X. So f ∈ H k+1 (X) is continuous and I X f is well defined. Besides, there exists a constant c 1 > 0 such that Since H k+1 (X) → C(X), we then have that there exists a constant c 2 > 0 such that ||f || C(X) ≤ c 2 ||f || k+1,X .
Therefore, there exists some c 3 > 0 such that Now, for all f ∈ H k+1 and g ∈ S, we can obtain from (4.1) that there exist some constant c > 0, By which we have We now consider the case that the positive definite kernel K is the polynomial kernel. First, we have the following inequality using the norm equivalence theorem [12] ||f || k+1,X ≤ c 4 ∀f ∈ H k+1 (X).
Since the kernel is a polynomial kernel and k + 1 > d, we have that for all g ∈ S, ∂ α g = 0 for |α| = k + 1.
Replacing f by f + g in (4.2) gives us that ∀f ∈ H k+1 (X), g ∈ S, For the term X ∂ α (f + g)dx, we should note that for any f ∈ H k+1 (X), one can always have a g ∈ S such that for |α| ≤ k, This is because when we set |α| = k, k − 1, k − 2, · · · , we will have a set of equations from (4.3). We can then solve the set of equations to obtain the coefficients of the polynomial g ∈ S. Besides, we note that such set of equations is always solvable because we assumed that k + 1 > d. Then the g which was constructed by such way will annihilate the integral X ∂ α (f + g)dx for |α| ≤ k.
Therefore, we have that inf g∈S ||f + g|| k+1,X ≤ c 4 |f | k+1,X , for some constant c 4 > 0. Then by the conclusion we obtained from the first part, we know that there exists some c > 0 such that This completes the proof of the result.
From this error bound, we can see that if we consider the polynomial kernel and the true function is in the RKHS H K , we will have for some constant c > 0. This is because the RKHS H K is continuously embedded in H k+1 (X) and we then have |f | k+1,X ≤ c||f || H K .

Conclusion
In this paper, some error bounds for the kernel based regularized interpolation are provided. We derived these error bounds in terms of the reproducing kernel Hilbert space norm and Sobolev space norm. We also introduced a error bound for a special type of a commonly used kernel: polynomial kernel. The error bounds then provide theoretical understandings of the approximation performance.

Conflicts of Interest:
The author(s) declare that there are no conflicts of interest regarding the publication of this paper.