Regularization is a fundamental technique in machine learning for controlling model complexity and improving generalization. By adding constraints, penalties, or noise to the learning process, regularization reduces overfitting — the tendency of models to memorize training data rather than learning generalizable patterns. My research on regularization spans classical penalty methods, Bayesian approaches, neural network-specific techniques, and the connection between adversarial robustness and regularization.
Classical Regularization Methods
L1 and L2 Regularization
L1 (lasso) and L2 (ridge) regularization add penalty terms to the loss function that constrain the magnitude of the model parameters. L1 regularization encourages sparsity (many parameters equal to zero), making it useful for feature selection. L2 regularization shrinks all parameters towards zero without enforcing sparsity, and has a natural Bayesian interpretation as a Gaussian prior on the parameters.
Research on how L1 and L2 regularization affect the models learned by probabilistic graphical models, evolutionary algorithms, and neural networks, and on combining multiple regularization strategies for improved generalization.
Parsimony in Genetic Programming
Application of regularization concepts to genetic programming through parsimony pressure: penalizing the fitness of programs in proportion to their complexity (number of nodes, depth, etc.). Parsimony pressure controls bloat — the tendency of genetic programs to grow in size without improving fitness — and encourages the evolution of compact, generalizable programs.
Bayesian Regularization
Prior Distributions as Regularizers
In the Bayesian framework, regularization arises naturally through the specification of prior distributions over model parameters. A Gaussian prior on the weights of a regression model corresponds to L2 regularization (ridge regression); a Laplace prior corresponds to L1 regularization (lasso). Research on how different choices of prior distribution shape the generalization properties of Bayesian models.
Bayesian Information Criterion for Structure Learning
Use of the Bayesian information criterion (BIC) as a regularized score for learning the structure of probabilistic graphical models. The BIC penalizes model complexity (number of parameters) to prevent overfitting during structure learning, balancing goodness of fit against parsimony. Research on how the BIC penalty affects the structures learned by Bayesian network structure learning algorithms.
Neural Network Regularization
Dropout and Stochastic Regularization
Investigation of dropout and other stochastic regularization techniques for neural networks, including their connection to Bayesian approximation. Research on how dropout can be interpreted as approximate Bayesian inference and used to provide uncertainty estimates for neural network predictions, particularly in the context of semi-supervised learning and adversarial robustness.
Architecture-Based Regularization in NAS
Regularization through architecture design: research on how the choice of neural network architecture (layer types, connectivity patterns, activation functions) implicitly regularizes the model. In neural architecture search, this translates to the design of search spaces that naturally favor generalizable architectures over highly expressive but potentially overfit ones.
Regularization in Physics-Informed Neural Networks
Regularization of physics-informed neural networks (PINNs) through the physics loss term. The physics loss acts as a strong regularizer by constraining the network to satisfy the governing partial differential equations, significantly reducing the risk of overfitting to noisy data. Research on how to balance the data loss and physics loss terms for optimal generalization.
Adversarial Regularization
Adversarial Training as Regularization
Adversarial training — including adversarially perturbed examples in the training set — can be viewed as a form of data augmentation and regularization that improves robustness to adversarial attacks and out-of-distribution inputs. Research on the regularization effects of adversarial training, including its impact on the geometry of the model's decision boundaries and its generalization to unseen adversarial perturbations.
Adversarial Perturbations and Generalization
Investigation of the relationship between adversarial vulnerability and generalization in neural networks. Research shows that the features that make a model accurate on the training distribution may be non-robust to adversarial perturbations, suggesting a fundamental tension between accuracy and robustness that regularization methods must address.
Implicit Regularization
Implicit Regularization in Optimization Algorithms
Research on the implicit regularization effects of optimization algorithms used for training neural networks. Different optimizers (SGD, Adam, second-order methods) impose different implicit regularization through their update rules, affecting the solutions found even without explicit regularization terms. This is particularly relevant for over-parameterized neural networks where the optimization landscape contains many global minima with different generalization properties.
Regularization in Probabilistic Graphical Models
Regularization of probabilistic graphical models through structure constraints (e.g., limiting the maximum number of parents in a Bayesian network), parameter smoothing (e.g., Laplace smoothing for conditional probability tables), and Bayesian regularization (e.g., Dirichlet priors on categorical distributions). Research on how these regularization choices affect the quality and interpretability of the learned models.
Selected Publications