Learning algorithm parameters#
Parameters for a learning algorithm determine the way that the model is fitted to the data. A common example is the ”C” or ”slack” parameter for soft margin SVM that defines the penalty associated with misclassifying individuals – i.e., whether the model does not allow any error within the CV1 folds and very tightly fits the decision boundary, or whether it allows errors in order to improve generalizability. Therefore, parameters such as these can greatly affect the model performance and it is common practice in machine learning to optimize the parameters for your specific problem and data.
NeuroMiner was developed in order to optimize performance across pre-defined parameter ranges within the nested cross-validation framework using a gridsearch, which protects against overfitting due to the application of the trained models to held-out data. Default parameter ranges are provided in NeuroMiner based on the literature and empirical testing, but we strongly recommend that the parameters are defined for your study and problem.
As previously stated, NeuroMiner uses a dynamic menu configuration that changes based on previous input. This is also true for the learning algorithm parameters, whereby the menu options will change based on what you have selected in the ”Classification algorithm” section.
SUPPORT VECTOR MACHINES (LIBSVM)#
Here we show an example for a RBF-Gaussian kernel SVM classifier with LIBSVM. The options might vary depending on the configurations of the model.
1 | Define Slack/Regularization parameter(s)
2 | Define RBF/Gaussian kernel parameter(s)
3 | Enable regularization of model selection
4 | Criterion for cross-parameter model selection
5 | Define weight (lambda) of SV ratio
6 | Define non-linearity (big gamma) of SV ratio
7 | Specify cross-parameter model selection process
1 | Define Slack/Regularization parameter(s) The Slack/Regularization parameter, also known as C parameter, is a hyperparameter specific to SVM. In NeuroMiner, a range of values can be specified here and it will be optimized throughout model training within the defined cross-validation structure.
2 | Define RBF/Gaussian kernel parameter(s) See link.
3 | Enable regularization of model selection This option enables the control of selection of models by considering the variability of performance across thre CV1 test folds. This approach aims to improve model selection by factoring in the consistency of performance metrics obtained from cross-validation.
4 | Criterion for cross-parameter model selection Criterion used to choose the optimal model when multiple hyperparameters are being tested. The options are model complexity, ensemble diversity, model performance, or complexity and ensemble diversity.
5 | Define weight (lambda) of SV ratio The weight controls the trade-off between between maximizing the margin and minimizing the classification errors of the SVM.
6 | Define non-linearity (big gamma) of SV ratio Determines the shape of the decision boundary. Higher gammas create a more complex (highly non-linear) decision boundary that can fit intricate patterns in dataThis can be helpful for highly non-linear datasets but might also lead to overfitting.
7 | Specify cross-parameter model selection process This option controls the number of models (parameters combination) selected. One optimal model based on the criteria and regularization defined previously or an ensemble of the top performing models You are given the possibility to select between: (1) select a single optimum parameter node which returns one optimal model based on the criteria, (2) generate cross-node ensemble by aggregating base learners above a predefined percentile which results in an ensemble of the top performing models or (3) automatically determine optimal percentile for optimum cross-node ensemble performance which depends on the regularization.
SUPPORT VECTOR MACHINES (LIBLINEAR)#
Here we show an example of a linear SVM using the LIBLINEAR library.
1 | Define Slack/Regularization parameter(s)
2 | Define Weighting exponents
3 | Enable regularization of model selection using CV1 test performance variance
4 | Specify cross-parameter model selection process
1 | Define Slack/Regularization parameter(s) The Slack/Regularization parameter, also known as C parameter, is a hyperparameter specific to SVM. In NeuroMiner, a range of values can be specified here and it will be optimized throughout model training within the defined cross-validation structure.
2 | Define Weighting exponents Configures the weight applied to weighting the coefficient applied to the samples from different classes in imbalanced data.
3 | Enable regularization of model selection This option enables the control of selection of models by considering the variability of performance across thre CV1 test folds. This approach aims to improve model selection by factoring in the consistency of performance metrics obtained from cross-validation.
4 | Specify cross-parameter model selection process This option controls the number of models (parameters combination) selected. One optimal model based on the criteria and regularization defined previously or an ensemble of the top performing models You are given the possibility to select between: (1) select a single optimum parameter node which returns one optimal model based on the criteria, (2) generate cross-node ensemble by aggregating base learners above a predefined percentile which results in an ensemble of the top performing models or (3) automatically determine optimal percentile for optimum cross-node ensemble performance which depends on the regularization.
GLMNET (Hastie’s library for LASSO/Elastic-net regularized GLMs)#
The configuration options for the GLMNET models are the following:
1 | Define GLMNET parameters
1 | Define GLMNET parameters This option allows the user to access another menu where 5 GLMNET parameters can be configured.
Inside option 1, the following parameters appear:
1 | Define mixing factor (alpha) range (0 =ridge <-> 1 =lasso)
2 | Define minimum lambda range (e.g. if N_cases>N_feats: 0.0001, otherwise 0.01)
3 | Define number of lambda optimization steps
4 | Define maximum number(s) of variables in the model
5 | Standardize input matrix to unit variance prior to training the elastic net
1 | Define mixing factor (alpha) range
Alpha is the elastic net mixing parameter, with range alpha∈[0,1], alpha=1 is lasso regression (default) and α=0 is ridge regression.
2 | Define minimum lambda range
Specifies the minimum value of lambda for regularization. A smaller value applies lighter regularization, while a larger value applies stronger regularization, particularly useful in high-dimensional settings.
3 | Define number of lambda optimization steps
Determines the number of lambda values over which the model will be optimized. A larger number gives a finer-grained search over the regularization path but increases computation time.
4 | Define maximum number(s) of variables in the model Sets a limit on the number of variables that can be selected by the model. This controls the model’s complexity by constraining the number of features it can use.
5 | Standardize input matrix to unit variance prior to training the elastic net Specifies whether to standardize the input data before fitting the model. When ‘Yes’, the features are scaled to unit variance.
RANDOM FORESTS (sklearn RandomForestClassifier)#
The configuration options for the RF classifier are originally defined in the scikit-learn documentation of the RandomForestClassifier class. We will give a brief description of each parameter here.
1 | Define Number of decision trees
2 | Define Maximum number of features
3 | Define Function to measure the quality of a split
4 | Define Maximum depth of the tree
5 | Define Minimum number of samples to split
6 | Define Minimum number of samples to be at a leaf
7 | Define Minimum weighted fraction of the sum total of weights at a leaf
8 | Define Maximum number of leaf nodes
9 | Define Minimum decrease of impurity
10 | Define Bootstrap samples yes/no
11 | Define Out-of-bag samples yes/no
12 | Define Class weights
13 | Define Complexity parameter for Minimal Cost-Complexity Pruning
14 | Define Number of samples to draw from X to train base estimators (if bootstrap)
1 | Define Number of decision trees
Sets the number of decision trees (estimators) to be used in the random forest model. A higher number of trees generally improves accuracy but increases computation time.
2 | Define Maximum number of features
Specifies the maximum number of features to consider when looking for the best split at each node. Common settings include ‘sqrt’ (square root of the total number of features), ‘log2’, or an integer value.
3 | Define Function to measure the quality of a split
Chooses the criterion for splitting nodes in the trees. Options include ‘Gini impurity’, which measures the likelihood of misclassification, or ‘Entropy’, which measures the information gain.
4 | Define Maximum depth of the tree
Limits the maximum depth of each tree in the forest. Setting this prevents trees from growing too deep and potentially overfitting, while ‘No max. depth defined’ allows trees to grow until all leaves are pure or contain fewer than the minimum samples.
5 | Define Minimum number of samples to split
Determines the minimum number of samples required to split an internal node. A higher number can prevent overfitting by requiring larger segments of data to create a split.
6 | Define Minimum number of samples to be at a leaf
Specifies the minimum number of samples required to be at a leaf node. Increasing this value makes the model more conservative and prevents overly specific rules in the trees.
7 | Define Minimum weighted fraction of the sum total of weights at a leaf
Sets the minimum weighted fraction of the input data required to be at a leaf node. This parameter is particularly useful in datasets with varying sample weights.
8 | Define Maximum number of leaf nodes
Limits the maximum number of leaf nodes per tree. A smaller number results in simpler models, while ‘No max. N defined’ allows trees to grow fully based on the other criteria.
9 | Define Minimum decrease of impurity
Specifies the minimum decrease in impurity required for a node to be split. Setting this threshold helps prevent splitting nodes that do not significantly improve the model.
10 | Define Bootstrap samples yes/no
Indicates whether bootstrap samples (random samples with replacement) are used when building trees. ‘Yes’ enables bootstrapping, which is a common approach in random forests.
11 | Define Out-of-bag samples yes/no
Determines whether out-of-bag samples (samples not included in the bootstrap sample) are used to estimate the model’s performance. ‘Yes’ allows for internal performance evaluation without the need for a separate validation set.
12 | Define Class weights
Assigns weights to classes to handle imbalanced datasets. If ‘All equal’, each class is treated equally, but you can adjust weights to give more importance to underrepresented classes.
13 | Define Complexity parameter for Minimal Cost-Complexity Pruning
Specifies the threshold used for pruning the tree by considering the cost complexity. Higher values lead to more aggressive pruning.
14 | Define Number of samples to draw from X to train base estimators (if bootstrap)
Determines the number of samples to draw from the input dataset when bootstrapping is enabled. If set to ‘0’, all samples are used, making it equivalent to no bootstrapping.
GRADIENT BOOSTING (sklearn GradientBoostingClassifier)#
The configuration options for the Gradient boosting classifier are originally defined in the scikit-learn documentation of the GradientBoostingClassifier class. We will give a brief description of each configuration parameter included in NeuroMiner here.
1 | Define maximum number of boosting iterations
2 | Select type of loss (exponential/deviance for classification, squared/absolute/huber/quantile for regression)
3 | Define learning rate (0<->1)
4 | Define subsampling factor (0<->1)
5 | Define maximum tree depth
1 | Define maximum number of boosting iterations
Sets the number of boosting stages (iterations) to be run. More iterations can improve model performance but may also lead to overfitting.
2 | Select type of loss (exponential/deviance for classification, squared/absolute/huber/quantile for regression)
Specifies the loss function used to measure model performance. For classification, options include ‘log_loss’ (deviance), while for regression, options include squared_error
, absolute_error
, huber
, and quantile
.
3 | Define learning rate (0<->1)
Sets the step size for each boosting iteration. A smaller learning rate requires more boosting iterations but can lead to a more accurate model.
4 | Define subsampling factor (0<->1)
Specifies the fraction of samples used for fitting each individual tree. Values less than 1 can help prevent overfitting by introducing randomness.
5 | Define maximum tree depth
Limits the depth of each individual tree in the boosting process. Shallower trees are less likely to overfit, while deeper trees can capture more complex patterns.
MLP PERCEPTRON (sklearn MLPClassifier)#
The configuration options for the MLP classifier are originally defined in the scikit-learn documentation of the MLPClassifier class. Please refer to the original scikit-learn documentation for an exhaustive definition of each parameter. We will provide a brief description of each parameter here.
1 | Define MLPERC parameters
2 | Enable regularization of model selection using CV1 test performance variance
3 | Specify cross-parameter model selection process
1 | Define MLPERC parameters This option allows the user to access another menu where MLP parameters can be configured. By default, NeuroMiner allows to tune 4 MLP parameters (see below). Using expert mode (initialize neurominer using “nm expert”), 21 MLP parameters can be configured.
2 | Enable regularization of model selection This option enables the control of selection of models by considering the variability of performance across thre CV1 test folds. This approach aims to improve model selection by factoring in the consistency of performance metrics obtained from cross-validation.
3 | Specify cross-parameter model selection process This option controls the number of models (parameters combination) selected. One optimal model based on the criteria and regularization defined previously or an ensemble of the top performing models You are given the possibility to select between: (1) select a single optimum parameter node which returns one optimal model based on the criteria, (2) generate cross-node ensemble by aggregating base learners above a predefined percentile which results in an ensemble of the top performing models or (3) automatically determine optimal percentile for optimum cross-node ensemble performance which depends on the regularization.
Inside option 1, the following parameters appear:
1 | Define MLP structures: hidden layers (length of row) and sizes (values)
2 | Select activation function
3 | Select solver method
4 | Define alpha parameter
5 | Select batch size (expert mode)
6 | Select learning rate strategy (expert mode)
7 | Define initial learning rate (expert mode)
8 | Define power parameter for learning rate decay (expert mode)
9 | Set maximum number of iterations (expert mode)
10 | Set random seed (expert mode)
11 | Define tolerance for optimization (expert mode)
12 | Use warm start (expert mode)
13 | Set momentum parameter (expert mode)
14 | Use Nesterov’s momentum (expert mode)
15 | Set early stopping (expert mode)
16 | Define validation data fraction (expert mode)
17 | Define beta_1 parameter (expert mode)
18 | Define beta_2 parameter (expert mode)
19 | Define epsilon parameter for numerical stability (expert mode)
20 | Set number of iterations with no improvement before stopping (expert mode)
21 | Set maximum function evaluations (expert mode)
1 | Define MLP structures: hidden layers (length of row) and sizes (values)
Specifies the architecture(s) of the neural network including the number and size of hidden layers. Each row of the defined matrix will be used as a different network architecture.
For example, ‘[100 100; 200 100]’ will train two neural networks, one with a structure of 100x100 (two hidden layers, each with 100 neurons) and 200x100 (two hidden layers, the first one with 200 neurons and the second one with 100 neurons). If the user needs to define structures with different number of hidden layers, set layer sizes as ‘0’. For example, If testing a 100x100x100 and 100x100, input the following: ‘[100, 100, 100; 100,100, 0]’.
2 | Select activation function
Chooses the activation function used in the neurons. Common options include ‘relu’ (Rectified Linear Unit), ‘tanh’, and ‘sigmoid’.
3 | Select solver method
Determines the optimization algorithm used for training. Options include ‘adam’ (Adaptive Moment Estimation), ‘sgd’ (Stochastic Gradient Descent), and ‘lbfgs’ (Limited-memory Broyden-Fletcher-Goldfarb-Shanno).
4 | Define alpha parameter
Sets the regularization strength to prevent overfitting. A higher ‘alpha’ value increases regularization, helping to control model complexity.
5 | Select batch size
Specifies the number of samples used in each iteration of training. Options include ‘auto’ (which chooses a default value) or a specific integer value.
6 | Select learning rate strategy
Defines the learning rate schedule. Options include ‘constant’ (fixed learning rate), ‘invscaling’ (decreases learning rate as ‘1 / pow(t, power_t)’), and ‘adaptive’ (learning rate adapts based on performance).
7 | Define initial learning rate
Sets the starting learning rate for the training process. This parameter controls the step size during gradient descent.
8 | Define power parameter for learning rate decay
Specifies the power of the inverse scaling learning rate decay. Used when ‘learning_rate’ is set to ‘invscaling’.
9 | Set maximum number of iterations
Limits the number of iterations for training. This parameter helps control the time and resources spent on training.
10 | Set random seed
Specifies the seed for the random number generator to ensure reproducibility of results.
11 | Define tolerance for optimization
Sets the tolerance for stopping criteria. Training will stop when the improvement of the optimization is less than this tolerance.
12 | Use warm start
Indicates whether to reuse the solution of the previous call to fit and add more estimators. Setting this to ‘yes’ allows incremental training.
13 | Set momentum parameter
Specifies the momentum for the ‘sgd’ solver. Momentum helps accelerate convergence and smooth out training dynamics.
14 | Use Nesterov’s momentum
Enables or disables Nesterov’s Accelerated Gradient, which improves the convergence speed by considering the gradient of the anticipated future position.
15 | Set early stopping
Determines whether to use early stopping to halt training when the validation score is not improving. This helps prevent overfitting.
16 | Define validation data fraction
Sets the fraction of training data used for validation in early stopping. This helps monitor performance during training.
17 | Define beta_1 parameter
Specifies the exponential decay rate for the first moment estimates in the ‘adam’ optimizer. Typically set to a value like ‘0.9’.
18 | Define beta_2 parameter
Specifies the exponential decay rate for the second moment estimates in the ‘adam’ optimizer. Typically set to a value like ‘0.999’.
19 | Define epsilon parameter for numerical stability
Sets a small constant added to prevent division by zero in the ‘adam’ optimizer and other algorithms.
20 | Set number of iterations with no improvement before stopping
Specifies the number of iterations with no improvement in the validation score before stopping the training (early stopping).
21 | Set maximum function evaluations
Limits the number of function evaluations during the optimization process. This helps control the computational cost of training.