Visualize classifiers#

A standard NeuroMiner analysis will produce the predictive accuracy of the models using the preprocessed data (e.g., the scaled PCA components). This means that the model weights (e.g., from an SVM) are only available within this space (e.g., a weight for each PCA component). For accurate interpretation, it is necessary to project these weights back to the original data and this is called ”visualization” within NeuroMiner. The term visualization applies best to neuroimaging data where the weights are projected back to the original voxel space and then can be visualized, but also applies to matrix data where the results for each test (e.g., for each questionnaire item) can be visualized in a graph. Running the visualization routine in NeuroMiner will produce three different types of results that represent how the classifier relates to the original data: mean results, grand mean results, and z/p value results associated with the individual feature weights. Visualization in NeuroMiner also applies to the calculation of the model-based permutation p-values that can be selected in the ”Visualization” menu.

Mean results#

This image represents the median of the weights across all models selected during the CV1 cycle. For example, if you have a 10x10 cross-validation cycle and you’re selecting the winner for each fold, then it will create a median across 100 models.

Mean

This image represents the median of the weights across all models selected during the CV1 cycle. For example, if you have a 10x10 cross-validation cycle and you’re selecting the winner for each fold, then it will create a median across 100 models.

SE

The standard error of the weights across all models selected during the CV1 cycle

CVratio

This is the sum of the median weights across all CV1 folds divided by the standard deviation. It is a measure of stability.

Grand mean results#

Mean-GrM

Within a nested cross-validation framework, the grand median (GrM) in NeuroMiner is defined as the sum of the median weights of selected CV1 models for each CV2 fold, divided by the number of CV2 folds.

SE-GrM

The grand median standard error is the sum of the standard error values of selected CV1 models for each CV2 fold, divided by the number of CV2 folds.

CVratio-GrM

The grand median CV ratio is the sum across CV2 folds of the selected CV1 median weights divided by the selected CV1 standard error. This value is then divided by the number of CV2 folds.

Mean-gr-SE-GrM

Across selected models for CV2 folds, this is the grand median thresholded to only display the voxels where the median is greater than the standard error.

Prob95CI-GrM

Across selected models for CV2 folds, this is the amount of times that the median value has been greater or less than the 95% confidence interval.

Sign-Based-Consistency

Sign-based consistency mapping for features is based on the logic outlined in Gomez-Verdejo et al., 2019. The importance of each input variable is calculated as the number of times that the sign of the feature (i.e., a positive ”+” or negative ”-” sign) is consistent within an ensemble (e.g., number of times that it is consistently positive) multiplied by the number of times that the variable was non-zero (e.g., in the case of a L1-regularized SVM or regression, the weights can be set to zero and thus the more that this happens the lower the importance of the feature). The measure is between 0 to 1, with 1 representing perfect consistency within the ensemble and 0 if the weights are equally positive and negative or when the feature is omitted with a zero weight. A p-value is then calculated by defining a hypothesis test for the importance score with a null hypothesis of 0 and an alternative hypothesis of not 0. A z-score was calculated as the importance divided by the square root of the variance of the importance scores. A standard p-value was then calculated using a normal cumulative distribution function to choose the right-tailed significance. P-values are then corrected using the false-discovery rate.

Z and P score images

Visualization can also include the derivation of Z or P scores for the entire models (i.e., not the individual features) using permutation testing. These are outlined in visualization options.