Data Fusion

Data Fusion#

Data fusion is the process of integrating different datasets or classifiers together (see wiki), ultimately with the hope that it will make your prediction better. For example, neuroimaging and clinical data could be fused together and a classifier could be trained on these data. Alternatively, the researcher could conduct multiple analyses with different classifiers and then fuse the decision scores together. For theory of decision-based data fusion see the extensive work of Robi Polikar (e.g., Polikar, 2006) and for a practical example see Cabral et al., 2016.

For data fusion to work in NeuroMiner, the user must enter different datasets that will be processed (called ’modalities’, see data entry). When different data modalities are entered, the first option in the parameter template will be: 1 | Define data fusion options. Selecting this option will lead to the following menu:

_images/NM_paramtemp_datafusion_options.png

Fig. 6 NeuroMiner’s data fusion options.#

No fusion#

Don’t fuse the modalities. Analyses will be run on one modality only (can be be specified in the next step).

Early fusion#

Modality concatenation BEFORE feature preprocessing

This is the most basic method of data fusion and means that the feature matrices will be concatenated and then processing is conducted as normal.

In the next step, you can select the modalities that you would like to fuse. Modalities can be selected by entering single numbers (e.g., 1,2) or ranges (e.g., 1:3). You can select as many modalities as you want.

Note

Think about which type of fusion fits your situation best! Some modalities, e.g. neuroimaging data may require different preprocessing steps than another modality, e.g. matrix data. In such cases intermediate fusion (see below) may be more suitable.

Intermediate fusion#

Modality concatenation AFTER preprocessing

The feature matrices are concatenated after they are preprocessed and then training is conducted.

After selecting this fusion option, you can select the modalities that you would like to fuse. Modalities can be selected by entering single numbers (e.g., 1,2) or ranges (e.g., 1:3). You can select as many modalities as you want.

Note

Keep in mind that the free parameters defined in the separate processing of each modality will expand the overall model optimization process conducted in the training stage.

Late fusion#

Decision-based data fusion

Late fusion is when completely separate pipelines (preprocessing + learning algorithm) are run on each data modality, decision scores are produced, and only then the scores are fused together. This decision score is based on an average of the individual decision scores. This is the most basic strategy for combining modalities at the classifier level and is known as bagging.

After selecting this fusion option, you can select the modalities that you would like to fuse. Modalities can be selected by entering single numbers (e.g., 1,2) or ranges (e.g., 1:3). You can select as many modalities as you want.

Important note for intermediate and late fusion

If either intermediate or late fusion is selected, then it will be necessary to establish different settings (preprocessing, training, or both) for each of the different analyses. For this purpose, another menu item will be introduced into the parameter template called 2 | Set active modality for configuration. Selecting this option will display the modalities that you chose for fusion. By choosing the number associated with a modality listed here, you can switch between the modalities’ respective parameter templates. It is necessary to specify the preprocessing/training settings for all modalities. Following this process, you can then initialize the analyses and proceed as normal.

Important

It is important to note that fusing data will result in more complicated results that may not be as interpretable as a single analysis without fusion.