Preprocess features#

Once the data has been initialized, the user has the option to preprocess the features. In NeuroMiner nomenclature, this refers to any steps that are conducted within the cross-validation folds prior to training a model. For example, the data within each fold could be scaled and then imputed, or a principal components analysis could be conducted.

When this option is selected, the user will be presented with the following:

1 | Select analysis to operate on [Analysis 1]

2 | Overwrite existing preprocessed data files [ no ]

3 | Use existing preprocessing parameter files, if available [ no ]

4 | Write out computed parameters to disk (may require A LOT of disk space) [ no ]

5 | Select CV2 partitions to operate on [ 0 selected ]

1 | Select analysis to operate on Here, you choose which analysis or sequence of analyses you want to preprocess. You can select one of the analyses from the summary list by entering the corresponding number or type S (Sequence) and, in the next step, specify all the analyses you want to preprocess after each other (again by entering the corresponding numbers, e.g. ‘1 2 3’).

Note

If you are choosing to preprocess several analyses at once, be aware that you might not be able to select the CV2 partitions to oeprate on (see option 5 above). This will only be possible if the selected analyses use the same CV structure. If the analyses use different CV structures, by default all CV2 partitions for each analysis respectively will be preprocessed.

2 | Overwrite existing preprocessed data files

NeuroMiner will save pre processed data files and selecting this option will overwrite them.

3 | Use existing preprocessing parameter files, if available

The user has the option to save parameter files (e.g., betas from regression of age from a feature) and this option will allow them to use the existing files.

4 | Write out computer parameters to disk

This option will write the parameters (e.g., beta coefficients) to disk. This is not recommended due to space limitations.

5 | Select CV2 partition to operate on

This option will only appear if an analysis has been selected in step 1. It allows the user to select the folds that the preprocessing analysis will be applied to (see Box: Selecting CV folds in NeuroMiner).

Selecting CV-folds in Neurominer

Because NeuroMiner was developed with nested, double cross-validation as the gold standard, there is an option to only select the outer, CV2, folds that you want to process. For example, you could select only the first permutation in order to check that everything is working before running the rest of the permutations.

When the option is selected, the user will see the CV2 cross-validation folds in a block of permutations and folds:

e.g., Perm 1: * * * * . A star ”” indicates that the fold is selected, whereas the letter ”o” indicates the fold is not selected; e.g., Perm 1: o o o o o. Please note that all folds are selected by default in NeuroMiner, and thus you would need to deselect all folds first if you wanted to select specific perms or folds.

The user can select and deselect the folds in single positions or ranges. The folds can be referenced by first indicating the permutation (e.g., 1st permutation) and then the fold position (e.g., 1st fold). For example, the second fold of the first permutation would be [1,2]. A fold range can also be selected in the following format [perm start, perm end, fold start, fold end]. For example, you may want to select the first two permutations and the first three folds in the following way: [1,2,1,3].