Fitting parameters on spline curves produces more parsimonious models, maintaining fit quality. Smoothing the splines reduces predictive variance. Individual splines are often fit by type of variable, e.g., in age-period-cohort models. Linear and cubic splines are most common. Several smoothing criteria have been used for parameter curves, with cubic splines fit by constraining the integral of splines’ squared-second derivatives popular recently. Constraining the sum of squared second differences for linear splines is analogous. Generally the degree of smoothing is selected using cross-validation. Known spline dummy-variable matrices allow regression estimation of splines, with smoothing done via constrained regression. Smoothing criteria based on sums of squares or absolute values of parameters, as in ridge regression or LASSO, improves predictive accuracy and produces splines similar to smoothing by second-derivative constraints. Variables with very low t-statistics represent points where curve-shapes barely change. Eliminating those variables leaves knots concentrated where spline shapes change. A Bayesian version of this puts shrinkage priors on spline parameters. This yields realistic joint parameter distributions, avoids problems associated with using cross-validation for parameter estimation, and readily expands to non-linear modeling, such as interactions among variable types. Regularized regression and Bayesian spline methods are compared for two example datasets.
Keywords:
Subject: Computer Science and Mathematics - Probability and Statistics
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.