Principle of Duality in Cubic Smoothing Spline

Du, Ruixue; Yamada, Hiroshi

doi:10.3390/math8101839

Open AccessArticle

Principle of Duality in Cubic Smoothing Spline

by

Ruixue Du

¹ and

Hiroshi Yamada

^2,*

¹

Graduate School of Social Sciences, Hiroshima University, 1-2-1 Kagamiyama, Higashi-Hiroshima 739-8525, Japan

²

School of Informatics and Data Science, Hiroshima University, 1-2-1 Kagamiyama, Higashi-Hiroshima 739-8525, Japan

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(10), 1839; https://doi.org/10.3390/math8101839

Submission received: 20 September 2020 / Revised: 12 October 2020 / Accepted: 15 October 2020 / Published: 19 October 2020

(This article belongs to the Section Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Fitting a cubic smoothing spline is a typical smoothing method. This paper reveals a principle of duality in the penalized least squares regressions relating to the method. We also provide a number of results derived from them, some of which are illustrated by a real data example.

Keywords:

cubic smoothing spline; principle of duality; penalized regression; right-inverse matrix; ridge regression

MSC:

62G05

1. Introduction

Fitting a cubic smoothing spline, which was developed by [1,2] and others, is a typical smoothing method. The cubic smoothing spline fitted to a scatter plot of ordered pairs

(x_{i}, y_{i})

for

i = 1, \dots, n

is a function such that

\begin{matrix} \hat{f} (x) = \arg min_{f \in W} \sum_{i = 1}^{n} {\{y_{i} - f (x_{i})\}}^{2} + λ \int_{a}^{b} {\{f^{″} (x)\}}^{2} d x, \end{matrix}

(1)

where

x_{1}, \dots, x_{n}

are points satisfying

a < x_{1} < \dots < x_{n} < b

,

W

denotes a function space that contains all functions of which the second derivative is square integrable over

[a, b]

, and

λ

is a positive smoothing/tuning parameter, which controls the trade-off between goodness of fit and smoothness.

Let

\hat{f} = {[\hat{f} (x_{1}), \dots, \hat{f} (x_{n})]}^{⊤}

. Then, given

\hat{f} (x)

is a natural cubic spline of which the knots are

x_{1}, \dots, x_{n}

(see, e.g., [3,4]), it follows that

\begin{matrix} \hat{f} & = \arg \min_{f \in R^{n}} {∥ y - f ∥}^{2} + λ f^{⊤} C^{⊤} R^{- 1} C f \end{matrix}

(2)

\begin{matrix} = {(I_{n} + λ C^{⊤} R^{- 1} C)}^{- 1} y, \end{matrix}

(3)

where

y = {[y_{1}, \dots, y_{n}]}^{⊤}

,

I_{n}

denotes the

n \times n

identity matrix, and

C

and

R

are explicitly presented later. Then, as shown in [3],

\hat{f} (x)

in (1) is uniquely determined by

\hat{f} \in R^{n}

in (3). Thus, estimating

\hat{f} (x)

is equivalent to estimating

\hat{f}

.

Let

Π = [ι_{n}, x] \in R^{n \times 2}

, where

ι_{n} = {[1, \dots, 1]}^{⊤} \in R^{n}

and

x = {[x_{1}, \dots, x_{n}]}^{⊤}

. Note that since

x_{1} < \dots < x_{n}

,

ι_{n}

and

x

are linearly independent and thus

Π

is of full column rank. Let

\begin{matrix} \hat{τ} = Π {(Π^{⊤} Π)}^{- 1} Π^{⊤} y . \end{matrix}

(4)

Denote the difference between

\hat{f}

and

\hat{τ}

(resp.

y

and

\hat{f}

) by

\hat{c}

(resp.

\hat{u}

):

\begin{matrix} \hat{c} = \hat{f} - \hat{τ}, \hat{u} = y - \hat{f} . \end{matrix}

(5)

Accordingly, we have

\begin{matrix} y = \hat{τ} + \hat{c} + \hat{u} . \end{matrix}

(6)

In this paper, we present a comprehensive list of penalized least squares regressions relating to (6). One such example is the ridge regression [5] that leads to

\hat{c}

. Then, we reveal a principle of duality in them. In addition, based on them, we provide a number of theoretical results, for example,

ι_{n}^{⊤} \hat{c} = 0

.

This paper is organized as follows. Section 2 fixes some notations and gives key preliminary results used to derive the main results in the paper. Section 3 provides a comprehensive list of penalized least squares regressions relating to (6), and reveals a principle of duality in them. Section 4 shows some results that are obtainable from the regressions shown in Section 3. Section 5 illustrates some results provided in Section 3 and Section 4 by a real data example. Section 6 deals with the cases such that the other right-inverse matrices are used. Section 7 concludes the paper.

2. Preliminaries

In this section, we give key preliminary results used to derive the main results of this paper. Before stating them, we fix some notations.

2.1. Notations

Let

{\hat{f}}_{i}

(resp.

{\hat{τ}}_{i}

) denote the ith entry of

\hat{f}

(resp.

\hat{τ}

) for

i = 1, \dots, n

,

δ_{i} = x_{i + 1} - x_{i}

, which is positive by definition, for

i = 1, \dots, n - 1

,

Δ = diag (δ_{1}, \dots, δ_{n - 1}) \in R^{(n - 1) \times (n - 1)}

, and for a full-row-rank matrix

M \in R^{m \times n}

,

M^{⊤} {(M M^{⊤})}^{- 1} \in R^{n \times m}

, which is a right-inverse matrix of

M

, be denoted by

M_{r}^{- 1}

. For a full-column-rank matrix

W \in R^{n \times p}

, let

S (W)

(resp.

S^{⊥} (W)

) denote the column space of

W

(resp. the orthogonal complement of

S (W)

) and

P_{W}

(resp.

Q_{W}

) denote the orthogonal projection matrix to the space

S (W)

(resp.

S^{⊥} (W)

). Explicitly, they are

P_{W} = W {(W^{⊤} W)}^{- 1} W^{⊤}

and

Q_{W} = I_{n} - P_{W}

.

D_{(i)} \in R^{(n - i) \times (n - i + 1)}

is a Toeplitz matrix of which the first (resp. last) row is

[- 1, 1, 0, \dots, 0]

(resp.

[0, \dots, 0, - 1, 1]

) and we define matrices

C \in R^{(n - 2) \times n}

and

R \in R^{(n - 2) \times (n - 2)}

as follows:

\begin{matrix} C = [\begin{matrix} δ_{1}^{- 1} & - δ_{1}^{- 1} - δ_{2}^{- 1} & δ_{2}^{- 1} & 0 & \dots & 0 \\ 0 & ⋱ & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & δ_{n - 2}^{- 1} & - δ_{n - 2}^{- 1} - δ_{n - 1}^{- 1} & δ_{n - 1}^{- 1} \end{matrix}] \end{matrix}

(7)

and

\begin{matrix} R = [\begin{matrix} \frac{1}{3} (δ_{1} + δ_{2}) & \frac{1}{6} δ_{2} & 0 & \dots & 0 \\ \frac{1}{6} δ_{2} & \frac{1}{3} (δ_{2} + δ_{3}) & ⋱ & ⋱ & ⋮ \\ 0 & ⋱ & ⋱ & ⋱ & 0 \\ ⋮ & ⋱ & ⋱ & ⋱ & \frac{1}{6} δ_{n - 2} \\ 0 & \dots & 0 & \frac{1}{6} δ_{n - 2} & \frac{1}{3} (δ_{n - 2} + δ_{n - 1}) \end{matrix}] . \end{matrix}

(8)

Finally, we denote the eigenvalues of

R

by

ω_{1}, \dots, ω_{n - 2}

in descending order.

2.2. Key Preliminary Results

Lemma 1.

(i): $C$ can be factorized as $C = D_{(2)} Δ^{- 1} D_{(1)}$ .
(ii): We have the following inequalities:

$\begin{matrix} ω_{n - 2} \geq min \{\frac{1}{3} δ_{1} + \frac{1}{6} δ_{2}, \frac{1}{6} (δ_{2} + δ_{3}), \dots, \frac{1}{6} (δ_{n - 3} + δ_{n - 2}), \frac{1}{6} δ_{n - 2} + \frac{1}{3} δ_{n - 1}\} > 0 . \end{matrix}$

Proof of Lemma 1.

(i): Let $w = {[w_{1}, \dots, w_{n}]}^{⊤}$ be an n-dimensional column vector. Then, by definition of $C$ , it follows that

$\begin{matrix} C w & = [\begin{matrix} - \frac{w_{2} - w_{1}}{δ_{1}} + \frac{w_{3} - w_{2}}{δ_{2}} \\ ⋮ \\ - \frac{w_{n - 1} - w_{n - 2}}{δ_{n - 2}} + \frac{w_{n} - w_{n - 1}}{δ_{n - 1}} \end{matrix}] = D_{(2)} [\begin{matrix} \frac{w_{2} - w_{1}}{δ_{1}} \\ ⋮ \\ \frac{w_{n} - w_{n - 1}}{δ_{n - 1}} \end{matrix}] = D_{(2)} Δ^{- 1} [\begin{matrix} - w_{1} + w_{2} \\ ⋮ \\ - w_{n - 1} + w_{n} \end{matrix}] \\ = D_{(2)} Δ^{- 1} D_{(1)} w \in R^{n - 2}, \end{matrix}$

which leads to $C = D_{(2)} Δ^{- 1} D_{(1)}$ .
(ii): The first inequality follows by applying the Gershgorin circle theorem and the second inequality holds from $δ_{i} > 0$ for $i = 1, \dots, n - 1$ .

□

Remark 1.

In Appendix A, we give some remarks on a special case such that

x = {[1, \dots, n]}^{⊤}

.

Lemma 2.

(i): $S (C^{⊤})$ equals $S^{⊥} (Π)$ and
(ii): $S (C_{r}^{- 1})$ equals $S^{⊥} (Π)$ .

Proof of Lemma 2.

(i): Given that $δ_{i} > 0$ for $i = 1, \dots, n - 1$ , both $Π$ and $C^{⊤}$ are of full column rank. In addition, $[Π, C^{⊤}]$ is a square matrix. Thus, if ${(C^{⊤})}^{⊤} Π = C Π = 0$ , then it follows that $S (C^{⊤}) = S^{⊥} (Π)$ . From $D_{(1)} ι_{n} = 0$ , we have $C ι_{n} = D_{(2)} Δ^{- 1} D_{(1)} ι_{n} = 0$ . Likewise, from $Δ^{- 1} D_{(1)} x = Δ^{- 1} Δ ι_{n - 1} = ι_{n - 1}$ and $D_{(2)} ι_{n - 1} = 0$ , we obtain $C x = D_{(2)} Δ^{- 1} D_{(1)} x = 0$ . Accordingly, we have $C Π = 0$ , which completes the proof.
(ii): Recall that $C_{r}^{- 1} = C^{⊤} {(C C^{⊤})}^{- 1}$ . It is clear that $C_{r}^{- 1}$ is a full-column-rank matrix such that $[Π, C_{r}^{- 1}]$ is a square matrix. In addition, ${(C_{r}^{- 1})}^{⊤} Π = {(C C^{⊤})}^{- 1} C Π = 0$ . Thus, it follows that $S (C_{r}^{- 1}) = S^{⊥} (Π)$ .

□

Denote the spectral decomposition of

R

by

V Ω V^{⊤}

and let

R^{- 1 / 2} = V Ω^{- 1 / 2} V^{⊤}

, where

Ω^{- 1 / 2} = diag (1 / \sqrt{ω_{1}}, \dots, 1 / \sqrt{ω_{n - 2}})

. Then,

R^{- 1 / 2}

is a positive definite matrix such that

R^{- 1 / 2} R^{- 1 / 2} = R^{- 1}

. Define

\begin{matrix} D = R^{- 1 / 2} C . \end{matrix}

(9)

Then, given that

C^{⊤}

is of full column rank and

R^{- 1 / 2}

is nonsingular,

D \in R^{(n - 2) \times n}

is also of full row rank. In addition, we have

\begin{matrix} D^{⊤} D = C^{⊤} R^{- 1} C . \end{matrix}

(10)

(We provide Matlab/GNU Octave and R functions for calculating

C

,

R

, and

D

in Appendix A).

Lemma 3.

(i): $S (D^{⊤})$ equals $S^{⊥} (Π)$ and
(ii): $S (D_{r}^{- 1})$ equals $S^{⊥} (Π)$ .

Proof of Lemma 3.

Both (i) and (ii) may be proved similarly to Lemma 2 (ii). For example, given

C Π = 0

, we have

{(D^{⊤})}^{⊤} Π = D Π = R^{- 1 / 2} C Π = 0

. □

Denote the eigenvalues of

C^{⊤} R^{- 1} C

by

g_{1}, \dots, g_{n}

in ascending order and the spectral decomposition of

C^{⊤} R^{- 1} C

by

U G U^{⊤}

, where

U = [u_{1}, \dots, u_{n}]

and

G = diag (g_{1}, \dots, g_{n})

. Let

T = [u_{1}, u_{2}] \in R^{n \times 2}

,

E^{⊤} = [u_{3}, \dots, u_{n}] \in R^{n \times (n - 2)}

, and

S = diag (g_{3}, \dots, g_{n}) \in R^{(n - 2) \times (n - 2)}

.

Lemma 4.

(i): $S (T)$ equals $S (Π)$ ,
(ii): $S (E^{⊤})$ equals $S^{⊥} (Π)$ , and
(iii): $S (E_{r}^{- 1})$ equals $S^{⊥} (Π)$ .

Proof of Lemma 4.

(i) Since

C^{⊤} R^{- 1} C \in R^{n \times n}

is a nonnegative definite matrix of which the rank is

n - 2

, we have

0 = g_{1} = g_{2} < g_{3} < \dots < g_{n}

. In addition, given

C Π = 0

, it follows that

C^{⊤} R^{- 1} C Π = 0 \cdot Π

, which completes the proof. (ii) and (iii) may be proved similarly to Lemma 2 (ii). □

Given

g_{1} = g_{2} = 0

, we have

\begin{matrix} E^{⊤} S E = C^{⊤} R^{- 1} C . \end{matrix}

(11)

Define

\begin{matrix} F = S^{1 / 2} E, \end{matrix}

(12)

where

S^{1 / 2} = diag (\sqrt{g_{3}}, \dots, \sqrt{g_{n}}) \in R^{(n - 2) \times (n - 2)}

. Then, we have

\begin{matrix} F^{⊤} F = C^{⊤} R^{- 1} C . \end{matrix}

(13)

Lemma 5.

(i): $S (F^{⊤})$ equals $S^{⊥} (Π)$ and
(ii): $S (F_{r}^{- 1})$ equals $S^{⊥} (Π)$ .

Proof of Lemma 5.

Both (i) and (ii) may be proved similarly to Lemma 2 (ii). For example, given

E Π = 0

, we have

{(F^{⊤})}^{⊤} Π = F Π = S^{1 / 2} E Π = 0

. □

Lemma 6.

There exists an orthogonal matrix

Υ \in R^{(n - 2) \times (n - 2)}

such that

F^{⊤} = D^{⊤} Υ

.

Proof of Lemma 6.

Recall that both

D^{⊤} \in R^{n \times (n - 2)}

and

F^{⊤} \in R^{n \times (n - 2)}

are of full column rank and

S (D^{⊤}) = S (F^{⊤})

. Accordingly, there exists a nonsingular matrix

Υ \in R^{(n - 2) \times (n - 2)}

such that

F^{⊤} = D^{⊤} Υ

. Given that

D^{⊤} D = F^{⊤} F

, we have

D^{⊤} (I_{n - 2} - Υ Υ^{⊤}) D = 0

. Then, from

D_{r}^{- 1 ⊤} D^{⊤} (I_{n - 2} - Υ Υ^{⊤}) D D_{r}^{- 1} = I_{n - 2} - Υ Υ^{⊤} = 0

, we have

Υ^{⊤} = Υ^{- 1}

. □

Let (i)

A = D, F

, (ii)

(B, Q) = (C, R), (E, S^{- 1})

, (iii)

D = C, D, E, F

, and (iv)

P = Π, T

. From the results above, we immediately obtain the following results:

Proposition 1.

(i): $C^{⊤} R^{- 1} C = A^{⊤} A = B^{⊤} Q^{- 1} B$ ,
(ii): $D P = D_{r}^{- 1 ⊤} P = 0$ ,
(iii): both $[P, D^{⊤}]$ and $[P, D_{r}^{- 1}]$ are nonsingular, and
(iv): $P_{D^{⊤}} = P_{D_{r}^{- 1}} = Q_{P}$ .

3. Several Regressions Relating to (6) and Principle of Duality in Them

In this section, we provide a comprehensive list of penalized least squares regressions relating to (6), and reveal a principle of duality in them. The penalized regressions are, more precisely, those to compute

\hat{c}

,

\hat{u}

,

\hat{τ}

,

\hat{τ} + \hat{c}

,

\hat{c} + \hat{u}

, and

\hat{τ} + \hat{u}

.

3.1. Penalized Regressions to Compute $\hat{τ} + \hat{c}$

Concerning

\hat{τ} + \hat{c}

, we have the following results:

Lemma 7.

It follows that

\begin{matrix} \hat{τ} + \hat{c} & = \arg min_{f \in R^{n}} {∥ y - f ∥}^{2} + λ {∥ A f ∥}^{2} = {(I_{n} + λ A^{⊤} A)}^{- 1} y \end{matrix}

(14)

\begin{matrix} = \arg min_{f \in R^{n}} {∥ y - f ∥}^{2} + λ f^{⊤} B^{⊤} Q^{- 1} B f = {(I_{n} + λ B^{⊤} Q^{- 1} B)}^{- 1} y . \end{matrix}

(15)

Proof of Lemma 7.

From Proposition 1, we have

C^{⊤} R^{- 1} C = A^{⊤} A = B^{⊤} Q^{- 1} B

. Then, (2) and (3) can be represented as follows:

\begin{matrix} \hat{f} & = \arg min_{f \in R^{n}} {∥ y - f ∥}^{2} + λ {∥ A f ∥}^{2} = {(I_{n} + λ A^{⊤} A)}^{- 1} y \\ = \arg min_{f \in R^{n}} {∥ y - f ∥}^{2} + λ f^{⊤} B^{⊤} Q^{- 1} B f = {(I_{n} + λ B^{⊤} Q^{- 1} B)}^{- 1} y . \end{matrix}

In addition, by definition of

\hat{c}

, we have

\hat{f} = \hat{τ} + \hat{c}

. Hence, we obtain (14) and (15). □

3.2. Penalized Regressions to Compute $\hat{c}$

Concerning

\hat{c}

, we have the following results:

Lemma 8.

Consider the following penalized regressions:

\begin{matrix} \hat{γ} = \arg min_{γ \in R^{n - 2}} ∥ y - A_{r}^{- 1} {γ ∥}^{2} + λ {∥ γ ∥}^{2} = {(A_{r}^{- 1 ⊤} A_{r}^{- 1} + λ I_{n - 2})}^{- 1} A_{r}^{- 1 ⊤} y, \end{matrix}

(16)

\begin{matrix} \hat{κ} = \arg min_{κ \in R^{n - 2}} {∥ y - B_{r}^{- 1} κ ∥}^{2} + λ κ^{⊤} Q^{- 1} κ = {(B_{r}^{- 1 ⊤} B_{r}^{- 1} + λ Q^{- 1})}^{- 1} B_{r}^{- 1 ⊤} y . \end{matrix}

(17)

Then, we have

\begin{matrix} \hat{c} = A_{r}^{- 1} \hat{γ} = B_{r}^{- 1} \hat{κ} . \end{matrix}

(18)

Proof of Lemma 8.

Let

K = [P, A_{r}^{- 1}]

. From Proposition 1, it follows that

A P = 0

,

A_{r}^{- 1 ⊤} P = 0

, and

K

is nonsingular. Accordingly, given that

K^{⊤} K = diag (P^{⊤} P, A_{r}^{- 1 ⊤} A_{r}^{- 1})

and

A K = [A P, A A_{r}^{- 1}] = [0, I_{n - 2}]

, it follows that

\begin{matrix} \hat{f} & = K {(K^{⊤} K + λ K^{⊤} A^{⊤} A K)}^{- 1} K^{⊤} y \\ = [P, A_{r}^{- 1}] [\begin{matrix} {(P^{⊤} P)}^{- 1} & 0 \\ 0 & {(A_{r}^{- 1 ⊤} A_{r}^{- 1} + λ I_{n - 2})}^{- 1} \end{matrix}] [\begin{matrix} P^{⊤} \\ A_{r}^{- 1 ⊤} \end{matrix}] y \\ = P {(P^{⊤} P)}^{- 1} P^{⊤} y + A_{r}^{- 1} {(A_{r}^{- 1 ⊤} A_{r}^{- 1} + λ I_{n - 2})}^{- 1} A_{r}^{- 1 ⊤} y = \hat{τ} + A_{r}^{- 1} \hat{γ}, \end{matrix}

from which we have

\hat{f} - \hat{τ} = A_{r}^{- 1} \hat{γ}

. Given

\hat{f} - \hat{τ} = \hat{c}

, we thus obtain

\hat{c} = A_{r}^{- 1} \hat{γ}

. Similarly, we can obtain

\hat{c} = B_{r}^{- 1} \hat{κ}

. □

Lemma 9.

\hat{c}

can be calculated by the following penalized regressions:

\begin{matrix} \hat{c} & = \arg min_{c \in R^{n}} ∥ (y - \hat{τ}) {- c ∥}^{2} + λ {∥ A c ∥}^{2} = {(I_{n} + λ A^{⊤} A)}^{- 1} (y - \hat{τ}) \end{matrix}

(19)

\begin{matrix} = \arg min_{c \in R^{n}} {∥ (y - \hat{τ}) - c ∥}^{2} + λ c^{⊤} B^{⊤} Q^{- 1} B c = {(I_{n} + λ B^{⊤} Q^{- 1} B)}^{- 1} (y - \hat{τ}) . \end{matrix}

(20)

Proof of Lemma 9.

Given (14),

\hat{f} = \hat{τ} + \hat{c}

, and

A P = 0

, we have

\begin{matrix} y = (I_{n} + λ A^{⊤} A) \hat{f} = (I_{n} + λ A^{⊤} A) (\hat{τ} + \hat{c}) = \hat{τ} + (I_{n} + λ A^{⊤} A) \hat{c}, \end{matrix}

which leads to (19). Similarly, we can obtain (20). □

Remark 2.

We add some more exposition about (16). Let

K = [P, A_{r}^{- 1}]

as before. In addition, let

θ = {[β^{⊤}, γ^{⊤}]}^{⊤} \in R^{n}

be a vector such that

f = K θ = P β + A_{r}^{- 1} γ

. Then, it follows that

A f = A (P β + A_{r}^{- 1} γ) = γ

. Given that

f = P β + A_{r}^{- 1} γ

and

A f = γ

, the minimization problem in (14) can be represented as follows:

\begin{matrix} min_{β \in R^{2}, γ \in R^{n - 2}} ∥ y - P β - A_{r}^{- 1} {γ ∥}^{2} + λ {∥ γ ∥}^{2} . \end{matrix}

(21)

It is noteworthy that

β

is not penalized in (21) and

{(A_{r}^{- 1})}^{⊤} P = 0

. Thus, the minimization problem (21) can be decomposed into (16) and (40). Moreover, (21) gives the best linear unbiased predictors of

β

and

γ

of the following linear mixed model:

\begin{matrix} y = P β + A_{r}^{- 1} γ + u, {[u^{⊤}, γ^{⊤}]}^{⊤} \sim N (0, diag (σ_{u}^{2} I_{n}, σ_{v}^{2} I_{n - 2})), \end{matrix}

(22)

where

λ = σ_{u}^{2} / σ_{v}^{2}

.

Remark 3.

By using

C_{r}^{- 1}

, ref. [6] derived the following expressions in our notations:

\begin{matrix} \hat{f} = \hat{τ} + C_{r}^{- 1} \hat{κ}, \hat{κ} = {(C_{r}^{- 1 ⊤} C_{r}^{- 1} + λ R^{- 1})}^{- 1} C_{r}^{- 1 ⊤} y . \end{matrix}

(23)

Here, we make the following remarks on (23).

(i): First, $\hat{κ}$ is the solution of the following penalized regression:

$\begin{matrix} min_{κ \in R^{n - 2}} {∥ y - C_{r}^{- 1} κ ∥}^{2} + λ κ^{⊤} R^{- 1} κ . \end{matrix}$

(24)
(ii): Moreover, (23) is a special case of $\hat{c} = B_{r}^{- 1} \hat{κ}$ in (18).

3.3. Penalized Regressions to Compute $\hat{u}$

Concerning

\hat{u}

, we have the following results:

Lemma 10.

Consider the following penalized regressions:

\begin{matrix} \hat{η} = \arg min_{η \in R^{n - 2}} ∥ y - A^{⊤} {η ∥}^{2} + λ^{- 1} {∥ η ∥}^{2} = {(A A^{⊤} + λ^{- 1} I_{n - 2})}^{- 1} A y, \end{matrix}

(25)

\begin{matrix} \hat{υ} = \arg min_{υ \in R^{n - 2}} {∥ y - B^{⊤} υ ∥}^{2} + λ^{- 1} υ^{⊤} Q υ = {(B B^{⊤} + λ^{- 1} Q)}^{- 1} B y . \end{matrix}

(26)

Then, it follows that

\begin{matrix} \hat{u} = A^{⊤} \hat{η} = B^{⊤} \hat{υ} . \end{matrix}

(27)

Proof of Lemma 10.

Applying the matrix inversion lemma to

{(I_{n} + λ A^{⊤} A)}^{- 1}

, we have

\begin{matrix} {(I_{n} + λ A^{⊤} A)}^{- 1} = I_{n} - A^{⊤} {(A A^{⊤} + λ^{- 1} I_{n - 2})}^{- 1} A . \end{matrix}

(28)

Postmultiplying (28) by

y

yields

y - \hat{f} = A^{⊤} \hat{η}

. Given

y - \hat{f} = \hat{u}

, we thus have

\hat{u} = A^{⊤} \hat{η}

. Similarly, we can obtain

\hat{u} = B^{⊤} \hat{υ}

. □

Lemma 11.

\hat{u}

can be calculated by the following penalized regressions:

\begin{matrix} \hat{u} & = \arg min_{u \in R^{n}} ∥ (y - \hat{τ}) {- u ∥}^{2} + λ^{- 1} {∥ A_{r}^{- 1 ⊤} u ∥}^{2} \\ = {(I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤})}^{- 1} (y - \hat{τ}) \end{matrix}

(29)

and

\begin{matrix} \hat{u} & = \arg min_{u \in R^{n}} {∥ (y - \hat{τ}) - u ∥}^{2} + λ^{- 1} u^{⊤} B_{r}^{- 1} Q B_{r}^{- 1 ⊤} u \\ = {(I_{n} + λ^{- 1} B_{r}^{- 1} Q B_{r}^{- 1 ⊤})}^{- 1} (y - \hat{τ}) . \end{matrix}

(30)

Proof of Lemma 11.

Given (34),

\hat{g} = \hat{τ} + \hat{u}

, and

A_{r}^{- 1 ⊤} P = 0

, we have

\begin{matrix} y & = (I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤}) \hat{g} = (I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤}) (\hat{τ} + \hat{u}) \\ = \hat{τ} + (I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤}) \hat{u}, \end{matrix}

which leads to (29). Similarly, we can obtain (30). □

Remark 4.

In [2] and ([3], p. 20), there are equations expressed in our notation as follows:

\begin{matrix} (R + λ C C^{⊤}) ϕ = C y, \hat{f} = y - λ C^{⊤} ϕ . \end{matrix}

(31)

Here, we make the following remarks on (31).

(i): First, these lead to a penalized least squares problem. Given that $\hat{u} = y - \hat{f}$ , removing $ϕ$ from the above equations leads to

$\begin{matrix} \hat{u} & = y - \hat{f} = λ C^{⊤} {(R + λ C C^{⊤})}^{- 1} C y \\ = C^{⊤} {(C C^{⊤} + λ^{- 1} R)}^{- 1} C y = C^{⊤} \hat{υ}, \end{matrix}$

(32)

where

$\begin{matrix} \hat{υ} = \arg min_{υ \in R^{n - 2}} {∥ y - C^{⊤} υ ∥}^{2} + λ^{- 1} υ^{⊤} R υ . \end{matrix}$

(33)
(ii): Moreover, (32) is a special case of $\hat{u} = B^{⊤} \hat{υ}$ in (27).

3.4. Penalized Regression to Compute $\hat{τ} + \hat{u}$

Concerning

\hat{τ} + \hat{u}

, we have the following results:

Lemma 12.

Let

\hat{g} = \hat{τ} + \hat{u}

. Then, it follows that

\begin{matrix} \hat{g} & = \arg min_{g \in R^{n}} {∥ y - g ∥}^{2} + λ^{- 1} {∥ A_{r}^{- 1 ⊤} g ∥}^{2} = {(I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤})}^{- 1} y \end{matrix}

(34)

\begin{matrix} = \arg min_{g \in R^{n}} {∥ y - g ∥}^{2} + λ^{- 1} g^{⊤} B_{r}^{- 1} Q B_{r}^{- 1 ⊤} g = {(I_{n} + λ^{- 1} B_{r}^{- 1} Q B_{r}^{- 1 ⊤})}^{- 1} y . \end{matrix}

(35)

Proof of Lemma 12.

Let

J = [P, A^{⊤}]

. From Proposition 1, it follows that

A P = 0

,

A_{r}^{- 1 ⊤} P = 0

, and

J

is nonsingular. Accordingly, given that

J^{⊤} J = diag (P^{⊤} P, A A^{⊤})

and

A_{r}^{- 1 ⊤} J = [A_{r}^{- 1 ⊤} P, A_{r}^{- 1 ⊤} A^{⊤}] = [0, I_{n - 2}]

, it follows that

\begin{matrix} {(I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤})}^{- 1} y \\ = J {(J^{⊤} J + λ^{- 1} J^{⊤} A_{r}^{- 1} A_{r}^{- 1 ⊤} J)}^{- 1} J^{⊤} y \\ = [P, A^{⊤}] [\begin{matrix} {(P^{⊤} P)}^{- 1} & 0 \\ 0 & {(A A^{⊤} + λ^{- 1} I_{n - 2})}^{- 1} \end{matrix}] [\begin{matrix} P^{⊤} \\ A \end{matrix}] y \\ = P {(P^{⊤} P)}^{- 1} P^{⊤} y + A^{⊤} {(A A^{⊤} + λ^{- 1} I_{n - 2})}^{- 1} A y = \hat{τ} + A^{⊤} \hat{η} . \end{matrix}

Given

\hat{u} = A^{⊤} \hat{η}

, we obtain (34). Similarly, we can obtain (35). □

Remark 5.

Similarly to Remark 2, we add some more exposition about (25). Let

ξ = {[β^{⊤}, η^{⊤}]}^{⊤} \in R^{n}

be such that

g = J ξ = P β + A^{⊤} η

. As stated,

A_{r}^{- 1 ⊤} J = [0, I_{n - 2}]

. Then, it follows that

\begin{matrix} A_{r}^{- 1 ⊤} g = A_{r}^{- 1 ⊤} J ξ = η . \end{matrix}

(36)

Given

g = P β + A^{⊤} η

and

A_{r}^{- 1 ⊤} g = η

, the minimization problem (34) can be represented as follows:

\begin{matrix} min_{β \in R^{2}, η \in R^{n - 2}} ∥ y - P β - A^{⊤} {η ∥}^{2} + λ^{- 1} {∥ η ∥}^{2} . \end{matrix}

(37)

Again, it is noteworthy that

β

is not penalized in (37). Moreover, it follows that

{(A^{⊤})}^{⊤} P = A P = 0

. Thus, the minimization problem (37) can be decomposed into (25) and (40).

3.5. Ordinary Regressions to Compute $\hat{c} + \hat{u}$ and $\hat{τ}$

Concerning

\hat{c} + \hat{u}

and

\hat{τ}

, we have the following results:

Lemma 13.

(i): Let $\hat{h} = D^{⊤} \hat{α}$ , where

$\begin{matrix} \hat{α} = \arg min_{α \in R^{n - 2}} {∥ y - D^{⊤} α ∥}^{2} = {(D D^{⊤})}^{- 1} D y . \end{matrix}$

(38)

Then, it follows that

$\begin{matrix} \hat{c} + \hat{u} = \hat{h} . \end{matrix}$

(39)
(ii): It follows that $\hat{τ} = P \hat{β}$ , where

$\begin{matrix} \hat{β} = \arg min_{β \in R^{2}} {∥ y - P β ∥}^{2} = {(P^{⊤} P)}^{- 1} P^{⊤} y . \end{matrix}$

(40)

Proof of Lemma 13.

Given Proposition 1, both results are easily obtainable. For example, the former result can be proved as follows:

\begin{matrix} \hat{h} = D^{⊤} \hat{α} = P_{D^{⊤}} y = Q_{P} y = y - \hat{τ} = \hat{c} + \hat{u} . \end{matrix}

□

Remark 6.

From Proposition 1, we also have

\hat{h} (= \hat{c} + \hat{u}) = D_{r}^{- 1} \hat{ρ}

, where

\begin{matrix} \hat{ρ} = \arg min_{ρ \in R^{n - 2}} {∥ y - D_{r}^{- 1} ρ ∥}^{2} = {(D_{r}^{- 1 ⊤} D_{r}^{- 1})}^{- 1} D_{r}^{- 1 ⊤} y . \end{matrix}

(41)

3.6. Principle of Duality in the Penalized Regressions

See the second columns of Table 1 and Table 2. In the columns, the penalized regressions shown above are arranged in pairs that mirror one another. We reveal a principle of duality in the penalized regressions. As stated in Section 1, (D1) is obtainable by replacing

A^{⊤}, λ

in (P1) by

A_{r}^{- 1}, λ^{- 1}

, respectively. Likewise, for example, (D6) in Table 2 is obtainable by replacing

B^{⊤}, Q, λ^{- 1}

in (P6) by

B_{r}^{- 1}, Q^{- 1}, λ

, respectively. In Table 1 and Table 2, we may observe five other pairs of regressions that are duals of each other. From the seven dual pairs shown in Table 1 and Table 2, we observe that the following principle exists:

Proposition 2 (Principle of duality).

The regressions labeled with the letter D in Table 1 and Table 2, for example, (D1), are obtainable by replacing each occurrence of

A^{⊤}, B^{⊤}, D^{⊤}, Q, Q^{- 1}, λ, λ^{- 1}

in the regressions labeled with the letter P, for example, (P1), by

A_{r}^{- 1}, B_{r}^{- 1}, D_{r}^{- 1}, Q^{- 1}, Q, λ^{- 1}, λ

, respectively.

4. Results That Are Obtainable from the Regressions

In this section, we show how the regressions listed in the previous section are of use to obtain a deeper understanding of the fitting a cubic smoothing spline. Before proceeding, recall

\hat{f} = \hat{τ} + \hat{c}

and so on.

First, given that (16) is a ridge regression, it immediately follows that

{lim}_{λ \to \infty} \hat{γ} = 0

, which leads to

{lim}_{λ \to \infty} \hat{c} = A_{r}^{- 1} {lim}_{λ \to \infty} \hat{γ} = 0

and at the same time we have

\begin{matrix} lim_{λ \to \infty} \hat{f} = \hat{τ} + lim_{λ \to \infty} \hat{c} = \hat{τ}, \end{matrix}

(42)

\begin{matrix} lim_{λ \to \infty} \hat{u} = y - \hat{τ} - lim_{λ \to \infty} \hat{c} = y - \hat{τ}, \end{matrix}

(43)

\begin{matrix} lim_{λ \to \infty} \hat{g} = \hat{τ} + lim_{λ \to \infty} \hat{u} = \hat{τ} + (y - \hat{τ}) = y . \end{matrix}

(44)

Second, (25) is again a ridge regression, we have

{lim}_{λ \to 0} \hat{η} = 0

, which yields

{lim}_{λ \to 0} \hat{u} = A^{⊤} {lim}_{λ \to 0} \hat{η} = 0

and accordingly we obtain

\begin{matrix} lim_{λ \to 0} \hat{f} = y - lim_{λ \to 0} \hat{u} = y, \end{matrix}

(45)

\begin{matrix} lim_{λ \to 0} \hat{c} = y - \hat{τ} - lim_{λ \to 0} \hat{u} = y - \hat{τ}, \end{matrix}

(46)

\begin{matrix} lim_{λ \to 0} \hat{g} = \hat{τ} + lim_{λ \to 0} \hat{u} = \hat{τ} . \end{matrix}

(47)

Third, from (19) and

\hat{u} = y - \hat{τ} - \hat{c}

, we have

\begin{matrix} \hat{c} = {(I_{n} + λ A^{⊤} A)}^{- 1} (y - \hat{τ}), \end{matrix}

(48)

\begin{matrix} \hat{u} = \{I_{n} - {(I_{n} + λ A^{⊤} A)}^{- 1}\} (y - \hat{τ}) . \end{matrix}

(49)

Thus,

\hat{f}

can be represented as

\begin{matrix} \hat{f} = \hat{τ} + {(I_{n} + λ A^{⊤} A)}^{- 1} (y - \hat{τ}) . \end{matrix}

(50)

Here, we remark that, given that

{(I_{n} + λ A^{⊤} A)}^{- 1}

is a smoother matrix, the second term on the right-hand side of (50) represents a low-frequency part of

y - \hat{τ}

. In addition, from (49),

\hat{u}

represents a high-frequency part of

y - \hat{τ}

. Thus,

\hat{c}

is generally smoother than

\hat{u}

.

Fourth, given

A P = 0

,

A_{r}^{- 1 ⊤} P = 0

,

\hat{c} = A_{r}^{- 1} \hat{γ}

, and

\hat{u} = A^{⊤} \hat{η}

, we have

\begin{matrix} {\hat{ζ}}^{⊤} \hat{τ} = 0, \hat{ζ} = \hat{c}, \hat{u}, \hat{h} . \end{matrix}

(51)

Fifth, given

A P = 0

,

A_{r}^{- 1 ⊤} P = 0

, (28), and

\begin{matrix} {(I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤})}^{- 1} = I_{n} - A_{r}^{- 1} {(A_{r}^{- 1 ⊤} A_{r}^{- 1} + λ I_{n - 2})}^{- 1} A_{r}^{- 1 ⊤}, \end{matrix}

(52)

if

y \in S (P)

, or in other words, if

y = P ψ

, then we have

\begin{matrix} \hat{τ} = y, \hat{f} = y, \hat{g} = y, \hat{c} = 0, \hat{u} = 0, \hat{h} = 0 . \end{matrix}

(53)

Sixth, given

ι_{n} \in S (P)

, we have

\begin{matrix} P_{P} ι_{n} = ι_{n}, \end{matrix}

(54)

\begin{matrix} {(I_{n} + λ A^{⊤} A)}^{- 1} ι_{n} = ι_{n}, \end{matrix}

(55)

\begin{matrix} {(I_{n} + λ^{- 1} A_{r}^{- 1} A_{r}^{- 1 ⊤})}^{- 1} ι_{n} = ι_{n}, \end{matrix}

(56)

\begin{matrix} A_{r}^{- 1} {(A_{r}^{- 1 ⊤} A_{r}^{- 1} + λ I_{n - 2})}^{- 1} A_{r}^{- 1 ⊤} ι_{n} = 0, \end{matrix}

(57)

\begin{matrix} A^{⊤} {(A A^{⊤} + λ^{- 1} I_{n - 2})}^{- 1} A ι_{n} = 0, \end{matrix}

(58)

\begin{matrix} P_{D^{⊤}} ι_{n} = 0 . \end{matrix}

(59)

Note that

{(I_{n} + λ A^{⊤} A)}^{- 1} ι_{n} = ι_{n}

, for example, indicates that the sum of the entries in each row of the hat matrix of

\hat{f}

equals unity.

Seventh, given (54)–(59), we have

\begin{matrix} \frac{1}{n} ι_{n}^{⊤} \hat{ζ} = \bar{y}, \hat{ζ} = \hat{τ}, \hat{f}, \hat{g}, \end{matrix}

(60)

\begin{matrix} \frac{1}{n} ι_{n}^{⊤} \hat{ζ} = 0, \hat{ζ} = \hat{c}, \hat{u}, \hat{h}, \end{matrix}

(61)

where

\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

.

\frac{1}{n} ι_{n}^{⊤} \hat{f} = \bar{y}

, for example, shows that

\frac{1}{n} \sum_{i = 1}^{n} {\hat{f}}_{i} = \bar{y}

.

5. Illustrations of Some Results

In this section, we illustrate some of the results in the previous sections by a real data example.

Panel A of Figure 1 shows a scatter plot of North Pacific sea surface temperature (SST) anomalies (1891–2018). SST is an essential climate variable and has been used for the detection of climate change. See, for example, Høyer and Karagali [7] and the references therein. We obtained the data from the website of the Japan Meteorological Agency. The solid line in the panel plots

(x_{i}, {\hat{τ}}_{i})

for

i = 1, \dots, n

, where

\hat{τ} = {[{\hat{τ}}_{1}, \dots, {\hat{τ}}_{n}]}^{⊤}

in (4) and

n = 128

. Panel B of Figure 1 depicts a scatter plot of

(x_{i}, y_{i} - {\hat{τ}}_{i})

for

i = 1, \dots, n

. The solid line in the panel plots

(x_{i}, {\hat{c}}_{i})

for

i = 1, \dots, n

, where

\hat{c} = {[{\hat{c}}_{1}, \dots, {\hat{c}}_{n}]}^{⊤}

is calculated by (18) with

λ = 10^{3}

. The solid line in Panel C denotes

(x_{i}, {\hat{f}}_{i})

, where

\hat{f} = {[{\hat{f}}_{1}, \dots, {\hat{f}}_{n}]}^{⊤}

is calculated by (14) with

λ = 10^{3}

. Panel D illustrates a scatter plot of

(x_{i}, y_{i} - {\hat{τ}}_{i})

for

i = 1, \dots, n

. The solid line in the panel plots

(x_{i}, {\hat{u}}_{i})

for

i = 1, \dots, n

, where

\hat{u} = {[{\hat{u}}_{1}, \dots, {\hat{u}}_{n}]}^{⊤}

is calculated by (27) with

λ = 10^{3}

. Figure 2, Figure 3 and Figure 4 correspond to the cases such that

λ = 10^{5}, 10^{10}, 10^{- 10}

, respectively.

Recall that concerning

y

,

\hat{τ}

,

\hat{c}

,

\hat{f}

, and

\hat{u}

, the following equations hold:

\begin{matrix} \hat{τ} + \hat{c} = \hat{f}, \hat{c} + \hat{u} = y - \hat{τ}, lim_{λ \to \infty} \hat{c} = 0, lim_{λ \to \infty} \hat{f} = \hat{τ}, \\ lim_{λ \to \infty} \hat{u} = y - \hat{τ}, lim_{λ \to 0} \hat{c} = y - \hat{τ}, lim_{λ \to 0} \hat{f} = y, lim_{λ \to 0} \hat{u} = 0 . \end{matrix}

From Figure 1, Figure 2, Figure 3 and Figure 4, we can observe that these theoretical results are well illustrated in these figures. For example, from Panel D in Figure 4, we can observe that

\hat{u}

almost equals

0

when

λ = 10^{- 10}

.

6. The Cases Such That the Other Right-Inverse Matrices Are Used

In this section, we illustrate what happens if the other right-inverse matrices are used.

Let

M \in R^{m \times n}

be of full row rank. Recall that in this paper

M_{r}^{- 1}

denotes

M^{⊤} {(M M^{⊤})}^{- 1}

, which is a right-inverse matrix of a full-row-rank matrix

M \in R^{m \times n}

. Define a set of matrices

\begin{matrix} Γ_{M} = {Ξ \in R^{n \times m} : M Ξ = I_{m}} . \end{matrix}

Γ_{M}

denotes the set of right-inverse matrices of

M

and accordingly

M_{r}^{- 1}

belongs to

Γ_{M}

.

Lemma 14.

N = M_{r}^{- 1}

if and only if

N \in Γ_{M}

and

S (N) = S (M^{⊤})

.

Proof of Lemma 14.

It is clear that if

N = M_{r}^{- 1}

, then

N \in Γ_{M}

and

S (N) = S (M^{⊤})

. Conversely, suppose that

N \in Γ_{M}

and

S (N) = S (M^{⊤})

. Then,

M N = I_{m}

and there exists a nonsingular matrix

Σ \in R^{m \times m}

such that

N = M^{⊤} Σ

. By removing

N

from these equations, we have

Σ = {(M M^{⊤})}^{- 1}

, which leads to

N = M^{⊤} {(M M^{⊤})}^{- 1} = M_{r}^{- 1}

. □

From Lemma 14, if

N \neq M_{r}^{- 1}

, then

N \notin Γ_{M}

or

S (N) \neq S (M^{⊤})

. Accordingly, we have the following result:

Proposition 3.

If

N \in Γ_{M} \ {M_{r}^{- 1}}

, then

S (N) \neq S (M^{⊤})

.

Based on the result, we illustrate what happens if the other right-inverse matrices are used. We give an example. Let

Z \in Γ_{D} \ {D_{r}^{- 1}}

. Then, from Proposition 3 and Lemma 3, it follows that

S (Z) \neq S (D_{r}^{- 1}) = S^{⊥} (Π)

. Accordingly, letting

L = [Π, Z]

, it follows that

Z^{⊤} Π \neq 0

and

D L = [D Π, D Z] = [0, I_{n - 2}]

. In addition, given that

D Π = 0

,

D Z = I_{n - 2}

, and

Π

is of full column rank,

L

is nonsingular. Thus, from [8], for example, we have

\begin{matrix} \hat{f} = L {(L^{⊤} L + λ L^{⊤} D^{⊤} D L)}^{- 1} L^{⊤} y = Π \hat{π} + Z \hat{ε}, \end{matrix}

(62)

where

\begin{matrix} \hat{π} = \arg min_{π \in R^{2}} {∥ (y - Z \hat{ε}) - Π π ∥}^{2} = {(Π^{⊤} Π)}^{- 1} Π^{⊤} (y - Z \hat{ε}) \end{matrix}

(63)

and

\begin{matrix} \hat{ε} & = \arg min_{ε \in R^{n - 2}} ∥ Q_{Π} y - Q_{Π} {Z ε ∥}^{2} + λ {∥ ε ∥}^{2} \\ = {(Z^{⊤} Q_{Π} Z + λ I_{n - 2})}^{- 1} Z^{⊤} Q_{Π} y, \end{matrix}

(64)

which shows that we may obtain (penalized) regressions relating to the cubic smoothing spline even if we use the other right-inverse matrices of

D

such that

Z \in Γ_{D} \ {D_{r}^{- 1}}

. Nevertheless, as illustrated here, they are more complex than those shown in Table 1 and Table 2.

7. Concluding Remarks

In this paper, we provided a comprehensive list of penalized least squares regressions relating to the cubic smoothing spline, and then revealed a principle of duality in them. This is the main contribution of this study. Such penalized regressions are tabulated in Table 1 and Table 2 and the principle of duality revealed is stated in Proposition 2. In addition, we also provided a number of results derived from them, most of which are also tabulated in Table 1 and Table 2 and some of which are illustrated in Figure 1, Figure 2, Figure 3 and Figure 4.

Author Contributions

R.D. wrote an initial draft under the supervision of H.Y. and then H.Y. edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The Japan Society for the Promotion of Science supported this work through KAKENHI Grant Number 16H03606.

Acknowledgments

We thank Kazuhiko Hayakawa and three anonymous referees for their valuable comments on an earlier version of this paper. The usual caveat applies.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Some Remarks on a Special Case Such That x = [1,…,n]^⊤

(i): If $x = {[1, \dots, n]}^{⊤}$ , then $C = D_{(2)} D_{(1)} \in R^{(n - 2) \times n}$ , which is a Toeplitz matrix of which the first (resp. last) row is $[1, - 2, 1, 0, \dots, 0]$ (resp. $[0, \dots, 0, 1, - 2, 1]$ ).
(ii): If $x = {[1, \dots, n]}^{⊤}$ , then ${(I_{n} + λ C^{⊤} R^{- 1} C)}^{- 1}$ is bisymmetric (i.e., symmetric centrosymmetric), which may be proved as in Yamada (2020a).
(iii): If $x = {[1, \dots, n]}^{⊤}$ , then $R$ in (8) is not only a symmetric tridiagonal matrix but also a Toeplitz matrix. In the case, we have

$\begin{matrix} ω_{k} = \frac{2}{3} + \frac{1}{3} cos (\frac{k π}{n - 1}), k = 1, \dots, n - 2, \end{matrix}$

(A1)

and thus $ω_{n - 2}$ , which is the smallest eigenvalue of $R$ , satisfies the following inequality (see, e.g., [9]):

$\begin{matrix} ω_{n - 2} = \frac{2}{3} + \frac{1}{3} cos (\frac{n - 2}{n - 1} π) > \frac{1}{3} . \end{matrix}$

(A2)
(iv): If $x = {[1, \dots, n]}^{⊤}$ and $R = I_{n - 2}$ in (2) and (3), then (2) and (3) reduce to

$\begin{matrix} \hat{f} & = \arg min_{f \in R^{n}} {∥ y - f ∥}^{2} + {∥ D_{(2)} D_{(1)} f ∥}^{2} \\ = {\{I_{n} + λ {(D_{(2)} D_{(1)})}^{⊤} (D_{(2)} D_{(1)})\}}^{- 1} y . \end{matrix}$

(A3)

It is a type of the Whittaker–Henderson (WH) method of graduation, which was developed by Bohlmann [10], Whittaker [11] and others. See Weinert [12] for a historical review of the WH method of graduation. (A3) is also referred to as the Hodrick–Prescott (HP) [13] filtering in econometrics. For more details about the HP filtering, see, for example, Schlicht [14], Kim et al. [15], Paige and Trindade [16], and Yamada [17,18,19,20,21].

Appendix A.2. User-Defined Functions

Appendix A.2.1. A Matlab/GNU Octave Function to Make $C$ in (7)

Appendix A.2.2. A Matlab/GNU Octave Function to Make $R$ in (8)

Appendix A.2.3. A Matlab/GNU Octave Function to Make $D$ in (9)

Appendix A.2.4. A R Function to Make $C$ in (7)

Appendix A.2.5. A R Function to Make $R$ in (8)

Appendix A.2.6. A R Function to Make $D$ in (9)

References

Schoenberg, I.J. Spline functions and the problem of graduation. Proc. Natl. Acad. Sci. USA 1964, 52, 947–950. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Reinsch, C. Smoothing by spline functions. Numer. Math. 1967, 10, 177–183. [Google Scholar] [CrossRef]
Green, P.J.; Silverman, B.W. Nonparametric Regression and Generalized Linear Models: A roughness Penalty Approach; Chapman and Hall/CRC: Boca Raton, FL, USA, 1994. [Google Scholar]
Wood, S.N. Generalized Additive Models: An Introduction with R, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Verbyla, A.P.; Cullis, B.R.; Kenward, M.G.; Welham, S.J. The analysis of designed experiments and longitudinal data by using smoothing splines. J. R. Stat. Soc. 1999, 48, 269–311. [Google Scholar] [CrossRef]
Høyer, J.L.; Karagali, I. Sea surface temperature climate data record for the North Sea and Baltic Sea. J. Clim. 2016, 29, 2529–2541. [Google Scholar] [CrossRef]
Yamada, H. The Frisch–Waugh–Lovell theorem for the lasso and the ridge regression. Commun. Stat. Theory Methods 2017, 46, 10897–10902. [Google Scholar] [CrossRef]
Pesaran, M.H. Exact maximum likelihood estimation of a regression equation with a first-order moving-average error. Rev. Econ. Stud. 1973, 40, 529–535. [Google Scholar] [CrossRef]
Bohlmann, G. Ein ausgleichungsproblem. Nachrichten von der Gesellschaft der Wissenschaften zu Gottingen, Mathematisch-Physikalische Klasse 1899, 1899, 260–271. [Google Scholar]
Whittaker, E.T. On a new method of graduation. Proc. Edinb. Math. Soc. 1923, 41, 63–75. [Google Scholar] [CrossRef] [Green Version]
Weinert, H.L. Efficient computation for Whittaker–Henderson smoothing. Comput. Stat. Data Anal. 2007, 52, 959–974. [Google Scholar] [CrossRef]
Hodrick, R.J.; Prescott, E.C. Postwar U.S. business cycles: An empirical investigation. J. Money Credit. Bank. 1997, 29, 1–16. [Google Scholar] [CrossRef]
Schlicht, E. Estimating the smoothing parameter in the so-called Hodrick–Prescott filter. J. Jpn. Stat. Soc. 2005, 35, 99–119. [Google Scholar] [CrossRef] [Green Version]
Kim, S.; Koh, K.; Boyd, S.; Gorinevsky, D. ℓ₁ trend filtering. SIAM Rev. 2009, 51, 339–360. [Google Scholar] [CrossRef]
Paige, R.L.; Trindade, A.A. The Hodrick–Prescott filter: A special case of penalized spline smoothing. Electron. J. Stat. 2010, 4, 856–874. [Google Scholar] [CrossRef]
Yamada, H. Ridge regression representations of the generalized Hodrick–Prescott filter. J. Jpn. Stat. Soc. 2015, 45, 121–128. [Google Scholar] [CrossRef] [Green Version]
Yamada, H. Why does the trend extracted by the Hodrick–Prescott filtering seem to be more plausible than the linear trend? Appl. Econ. Lett. 2018, 25, 102–105. [Google Scholar] [CrossRef]
Yamada, H. Several least squares problems related to the Hodrick–Prescott filtering. Commun. Stat. Theory Methods 2018, 47, 1022–1027. [Google Scholar] [CrossRef]
Yamada, H. A note on Whittaker–Henderson graduation: Bisymmetry of the smoother matrix. Commun. Stat. Theory Methods 2020, 49, 1629–1634. [Google Scholar] [CrossRef]
Yamada, H. A smoothing method that looks like the Hodrick–Prescott filter. Econom. Theory 2020. [Google Scholar] [CrossRef]

Figure 1. Panel A shows a scatter plot of North Pacific sea surface temperature anomalies (1891–2018). The solid line in the panel plots

(x_{i}, {\hat{τ}}_{i})

for

i = 1, \dots, n

, where

\hat{τ} = {[{\hat{τ}}_{1}, \dots, {\hat{τ}}_{n}]}^{⊤}

in (4) and

n = 128

. Panel B depicts a scatter plot of

(x_{i}, y_{i} - {\hat{τ}}_{i})

for

i = 1, \dots, n

. The solid line in the panel plots

(x_{i}, {\hat{c}}_{i})

for

i = 1, \dots, n

, where

\hat{c} = {[{\hat{c}}_{1}, \dots, {\hat{c}}_{n}]}^{⊤}

is calculated by (18) with

λ = 10^{3}

. The solid line in Panel C denotes

(x_{i}, {\hat{f}}_{i})

, where

\hat{f} = {[{\hat{f}}_{1}, \dots, {\hat{f}}_{n}]}^{⊤}

is calculated by (14) with

λ = 10^{3}

. Panel D illustrates a scatter plot of

(x_{i}, y_{i} - {\hat{τ}}_{i})

for

i = 1, \dots, n

. The solid line in the panel plots

(x_{i}, {\hat{u}}_{i})

for

i = 1, \dots, n

, where

\hat{u} = {[{\hat{u}}_{1}, \dots, {\hat{u}}_{n}]}^{⊤}

is calculated by (27) with

λ = 10^{3}

.

Figure 1. Panel A shows a scatter plot of North Pacific sea surface temperature anomalies (1891–2018). The solid line in the panel plots

(x_{i}, {\hat{τ}}_{i})

for

i = 1, \dots, n

, where

\hat{τ} = {[{\hat{τ}}_{1}, \dots, {\hat{τ}}_{n}]}^{⊤}

in (4) and

n = 128

. Panel B depicts a scatter plot of

(x_{i}, y_{i} - {\hat{τ}}_{i})

for

i = 1, \dots, n

. The solid line in the panel plots

(x_{i}, {\hat{c}}_{i})

for

i = 1, \dots, n

, where

\hat{c} = {[{\hat{c}}_{1}, \dots, {\hat{c}}_{n}]}^{⊤}

is calculated by (18) with

λ = 10^{3}

. The solid line in Panel C denotes

(x_{i}, {\hat{f}}_{i})

, where

\hat{f} = {[{\hat{f}}_{1}, \dots, {\hat{f}}_{n}]}^{⊤}

is calculated by (14) with

λ = 10^{3}

. Panel D illustrates a scatter plot of

(x_{i}, y_{i} - {\hat{τ}}_{i})

for

i = 1, \dots, n

. The solid line in the panel plots

(x_{i}, {\hat{u}}_{i})

for

i = 1, \dots, n

, where

\hat{u} = {[{\hat{u}}_{1}, \dots, {\hat{u}}_{n}]}^{⊤}

is calculated by (27) with

λ = 10^{3}

.