NeuroMiner tutorial

NeuroMiner tutorial#

This NeuroMiner tutorial provides a fundamental introduction to key NeuroMiner functions required to execute your initial machine learning analysis. This includes data entry, training, visualization, interpretation and external application.

Please note that NeuroMiner requires MATLAB version R2022b for proper functionality on all operating systems (R2023b or newer for Mac OS with M1 or M2 chip).

Begin by adding the NeuroMiner directory, which you can download from here, to the MATLAB search path.

Data entry#

NeuroMiner supports various data types. In this tutorial, we will import matrix data from a text file using simulated brain parcellation data. Example data is included in the NeuroMiner installation under NeuroMiner_1.2/example_data_simulated/.

Start by loading the covars.mat file into MATLAB before initiating NeuroMiner (See here for more information) One way to load the data is by executing

load((PATH/TO/NeuroMiner_1.2/example_data_simulated/covars.mat))

Now, you should see two variables in the workspace ‘covars’ and ‘covarsNames’.

Then, open NeuroMiner by typing:

>> nm

NeuroMiner has a text-based interface, you navigate it by typing the numbers offered on the menu, or the letter Q in case you want to go back to the previous menu/exit.

In the NeuroMiner text-based interface, navigate using the provided menu options. Begin with

>> 1 (Load data for model discovery and cross-validation). T

>> 2 (Select data input format)

>> 4 Matrix data (2D). 

>> 3 (Define data provenance)

>> 4 (from text file). 
[Select the file example_roidata.csv from the /example_data_simulated/ directory]

Selecting the file updates the NM workspace manager and you’ll see more options to choose from. Complete the required fields, specifying the column delimiter, label column as group, column for case identifiers as ID, and provide a data description. For the example data these settings are:

Choose “comma” as the delimiter.
Identify the label column as “group” (representing the variable that distinguishes between patients and controls) and press the enter button.
Specify the column in the CSV file that contains case identifiers (ID) as “ID” (indicating individual cases) and press the enter button.
Provide a description for the entered data, such as “NeuroMiner example data,” and press the enter button.

Then:

>>10 (IMPORT matrix) 

We could now add another data modality now.

In this tutorial, we will instead continue with adding covariates by choosing

>> 5 (Add covariate(s) to NM workspace)

Enter “covars” as the variable name, then press “v” and enter “CovarsNames”, note that these correspond to the variable names in our MATLAB workspace. Review the summary and choose

>> 7 (Finish data import for discovery & cross-validation analysis).

In the MAIN INTERFACE of NeuroMiner, select

>> 4 (Save NeuroMiner structure) 

This will save your progress. Save the NM structure regularly.

Parameter entry#

After completing the data import, it is time to define your model pipeline by navigating to

>> 2 (Set up NM parameter workspace). 

As long as not all the necessary settings are taken care of, you will not be able to train an model. You can read about all the different options implemented in detail here. For now, focus on the cross-validation settings, preprocessing pipeline, classification algorithm, and model saving options.

When you see [ … ], this means that, if not changed, default settings will be used. The symbol [ ??? ] in front of cross-validation settings (1) and model saving options (8), means they still require user input.

1. Cross-validation setup#

NeuroMiner defaults to using repeated nested-cross validation (pooled cross-validation, where CV folds are randomly created), although other frameworks are also available (more information). For now, let’s stick with the default settings, including the default number of inner and outer permutations (1) and folds (10). Therefore, choose:

>> 1 (Cross-validation settings) 

>> 7 (Build Cross-Validation structure). 

>> Q or 9 (to go back to the previous menu)

Back, in the Define parameter template menu. You can see that the [???] behind the cv settings has disappeared.

2. Preprocessing pipeline setup#

Preprocessing is essential in machine learning, and within NeuroMiner it describes all steps that need to be applied to the data before the learning algorithm. These steps comprise scaling the data, dimensionality reduction, filtering, etc (see the preprocessing section in the manual). Note that preprocessing is happening within the cross-validation, thus, preprocessing pipeline and learning algorithm are jointly optimized and together make up your model.

To move to the preprocessing menu, type:

>> 2 (Preprocessing pipeline)

Review the default preprocessing settings. To get an overview of other available options, type:

>> 1 (Add preprocessing step)

But in this example, we use the two default options, so navigate back to the Define parameter template menu.

3. Learning algorithm setup#

Configure learning algorithm options:

>> 3 (Classification algorithm)

If you wanted to select another learning algorithm except the default Support Vector Machine (SVM), type:

>> 2 (Select learning algorithm)

For now, we will use the default settings. Navigate back to the Define parameter template menu.

4. Model saving options#

The model saving options are still undefined. Go to:

>> 10 (Model saving options)

Then, press Enter to accept default suggestions and to return to the main menu.

Initialize analysis#

To create an analysis with your parameter template settings, which we can then train, you need to intialize it. This tells NM that we are happy with the settings defined in the parameter template. To do so, from the main interface:

>> 3 (Initialize analysis)

>> 1 (Define analysis identifier) 

>> CV10x10_defaults

>> 2 (Provide analysis description)

>> My first NeuroMiner analysis!

[Press Enter key to go to a new line]

>> . [The dot symbol tells NeuroMiner that you have finished]

>> 3 (Specify parent directory of the analysis) [Specify the directory using the pop-up window.]

If you have filled all the required information, you will now see option 4 PROCEED. Choose it.

>> 4 (PROCEED)

Upon selecting “PROCEED,” NeuroMiner initializes the analysis, preserving all the defined settings, such as the specific cross-validation structure, preprocessing steps, and learning algorithm. Any subsequent changes made in the parameter template menu won’t impact the already initialized analysis. NeuroMiner also generates a directory on your disk, set to be filled with result files later on. Verify that the analysis folder was created in the specified directory, identifiable by its name containing an ID number and the analysis identifier.

For now, save your NM structure again, just as you did after data entry.

Training#

Now that your settings are initialized, you can preprocess your data and train your model. Here, we will train the model directly without running only the preprocessing first. Preprocessing the data is done ‘on the fly’ but processed data is not stored to disk.

Note

Preprocessing only can be useful to reduce computational costs and training time, e.g., when you are working with high dimensional data, such as neuroimaging, or you want to compare the performance of different learning algorithms but using the exact same preprocessing pipeline.

To start training, from the “MAIN INTERFACE” menu:

>> 5 (Train supervised classifiers)

>> 3 (Select CV2 partitions to operate on)

>> 1 (Select all CV2 positions)
[Press enter key to return to the previous menu]

>> 4 (PROCEED)

After clicking “PROCEED,” a pop-up window called the NM Optimization viewer will appear. This viewer displays the real-time performance of your model during the training process. As we retained the default setting of optimizing for “accuracy” as the performance metric, the graph primarily exhibits the accuracy of training, CV1, and CV2 folds.

Note for macOS users.

If you are using NeuroMiner for the first time you might encounter an error here caused by additional security settings. Please see our manual on how to proceed in that case.

Once the training is finished, you will see the training menu in the Matlab command window again. To view your results, from the main interface, choose

>> 8 (View Result Viewer)

The different plots are described in the section.

Visualization#

Following the training analysis, an option is available to visualize the model. This allows for gaining deeper insights into the input features (pre-preprocessing) that contributed to the predictions made by the model.

Go to the main interface, then:

>> 6 (Visualize Classifiers)

>> 7 (PROCEED)

To check the results, go back to the main interface and press:

>> 8 (Open NM Results Viewer (cross-validation results))

The different plots are explained in more detail here.

At this point, save your NM structure again.

Interpretation#

TO DO

External validation#

Another important concept is external validation, which assesses the generalizability of your model to external data.

In recognizing that model validation should function as a test of generalizability rather than a tool to optimize model performance, NeuroMiner strictly separates the training and external validation phases. Once you enter the external validation phase, there’s no option to revert to the training phase. This precaution is in place to safeguard the credibility of claims regarding the model’s generalizability.

To initiate external validation, lock the NeuroMiner structure containing your trained analyses:

>> 9 (Lock analyses and start NM application mode)

>> y[es]
[the available options in the menu should now change]

Now, close NeuroMiner to ready the external validation data located in the example_simulated_data folder. Load the “validation_data_covars.mat” variable into the MATLAB workspace. It’s crucial to exit NM before loading this data.

In this straightforward analysis example, we are currently not utilizing covariates for preprocessing. You can skip this step in this instance. However, if you plan to implement additional processing steps, such as a site correction method like combat in your pipeline later on, it becomes essential to avoid encountering errors. Then, restart NeuroMiner.

>> nm

>> 2 (Load data for model application)

In the newly opened pop-up window, you can create external validation data containers by selecting “Create New…”. Specify whether the labels in this validation set are known. If labels are present, you can calculate external validation performance metrics; otherwise, you will only receive prediction scores. For now, check the box “Target labels known” and click “Create New”. This action generates a field in the middle blue center box resembling the following: [1] (0 cases); date: DATE; labels known: yes.

Now that we have an empty container for our validation data, the next step is to populate it. Press “add/modify data in dataset”, which directs you to the entry menu of NeuroMiner, similar to the initial data entry before training. NeuroMiner provides an overview of the modalities entered for the training data at the top of the menu, along with information about whether validation data has been loaded for these modalities. In our example, only Modality 1 is present. To load the data,

>> 1 (Add/Modify OOCV data)

>> 1 (Define data provenance)
[choose simulated_validation_ROIdata.csv through the pop-up window]

>> 1 [to choose the first sheet]

>> 6 (IMPORT matrix)

>> 5 (Add covariate data in OOCV data)

>> validation_covars

Warning

Since the initial data was input from a text file, NeuroMiner mandates a text file for the external validation data as well. Additionally, NeuroMiner expects identical variable names for cases, groups, and features; otherwise, it will result in an error.

Note

If you’ve loaded multiple modalities during the training phase, NeuroMiner does not require loading all of them for external validation datasets. It’s crucial, however, to ensure that you load all the modalities used for the specific models you intend to validate; if you fail to do so, this may lead to errors.

The same principle applies to covariates; there’s no need to load them here. Yet, if the model you later validate on this data incorporates them during a preprocessing step, it will result in an error.

Return to the main interface and:

>> 3 (Set up parameters for model application)

TO DO: do these options actually change anything???

If you want the application of your external model to have a similar logic to the original model training, select:

>> 1 (Ensemble mode)

>> 2 (Compute mean decisions of CV1 partition ensembles before aggregating). 

Then, to accept the default in option 2 (Model retraining mode), exit to the main interface, and:

>> 4 (Apply classifiers to independent data)

By selecting this, using the winning parameters from your discovery analyses, the models will now be retrained in the discovery data before being applied to the external validation data.

Once done, you can view the results in the NM result viewer:

>> 6 (Open Result Viewer (cross-validation & independent test results))

You will then be taken to the Results Display as before. After selecting the analysis you chose for the model application, you can click on the second drop-down box which shows an option to select the external validation data that you entered, as opposed to the default “Training/CV data”.