A Mathematical Review on EEG Channel Selection Techniques for Motor Imagery Classification

Seungjun Lee

doi:10.20944/preprints202501.1148.v1

Submitted:

15 January 2025

Posted:

16 January 2025

You are already at the latest version

Abstract

Brain Computer Interface (BCI) technology is recently being spotlighted as the performance of Artificial Intelligence (AI) increased drastically. Although there are several instruments to measure brain's activity, Electroencephalography(EEG) signal is in limelight as it has high temporal resolution and is non-invasive, while being extremely portable. However, the raw EEG signals can not be directly used in BCI systems as it includes a lot of artifacts, and has numerous channels. Therefore, selecting only the needed channels and rejecting ones that include big noises are crucial to increase the performance of classification. Motor Imagery EEG (MI-EEG) is a type of EEG where subjects imagine that they are moving their body. MI-EEG channel selection techniques can be divided into two big groups: Common Spatial Pattern (CSP) based models and non-CSP based models. Therefore in this paper, we introduce the models classified in the criteria above, using detailed and strict mathematical terms, and compare each approach's accuracies.

Keywords:

Electroencephalography

;

EEG

;

Motor Imagery

;

Channel Selection

Subject:

Engineering - Bioengineering

1. Introduction

Brain Computer Interface(BCI) systems measure the activities of the brain in order to perform specific tasks such as interpreting one’s will to speak or move. While there are numerous methods to measure the brain’s activity, such as EEG (electroencephalography), fMRI (functional magnetic resonance imaging), and MEG (Magnetoencephalography), BCI systems utilizing scalp EEG(henceforth called EEG) is trending as it has high temporal resolution, is non-invasive, and extremely portable.

Classification of EEG signals follows the procedure shown in Figure 1. [1,2]

EEG signals are measured by attaching electrodes on the scalp according to the montage, which is a standardized arrangement of electrodes and selection of channel pairs. Montages are divided into two groups: referential and bipolar, by how channels are formed. In referential montage, the voltage difference between each electrode and one reference electrode is recorded, and forms a channel. In Bipolar montage, the voltage difference among consecutive two electrodes are recorded, and forms a channel. The recorded signals then go through artifact and noise removal process to remove undesired signals.[3]

Then, channel selection reduces the number of channels used in feature extraction and classification and ensures that only meaningful channels are included in the process afterwards. After the channel selection, only the necessary channels’ data are needed to be acquired from subjects, so it can help reduce the time consuming setup process and inconvenience for subjects. Also, reducing the number of channels helps reduce computational complexity and increases classification accuracy. [2] Channel selection usually follows the following Figure 2:

Motor Imagery EEG (MI-EEG) is a measurement of EEG signals while subject simulates its movement without actually moving. Therefore MI-EEG based BCIs have been suggested as a substitute method for control, especially for people who are disabled or suffering from locked-in syndrome.[4] It is widely accepted that imagination of a movement stimulates same neural pathway as real movement[5], and can even lead to muscle growth[6]. Traditional Statistical filters used for MI-EEG analysis can be divided into CSP (Common Spatial pattern)-based and non CSP-based filters.[2] Therefore, this paper will discuss about CSP based, filters and non-CSP based statistical filters used in MI-EEG classification in the following chapters.

2. Common Spatial Pattern(CSP) Based Filters

2.1. Original CSP Filter

The Common Spatial Pattern algorithm is known as a spatial filter effective for EEG filtering by maximizing one class’s variance while minimizing other class’s variance in n orthogonal bases as Figure 3, and can be mathematically induced as follows Derivation.

Derivation^[7,8,9]

Let

X \in R^{C \times N \times T}

and

X (t) \in R^{C \times N}

where C is the number of channels, N is the number of samples per epoch, and T is the number of epochs.

X (t)

represents the t th epoch’s signal. In this paper,

X

only consists of two classes:

c l_{1}

(class 1) and

c l_{2}

(class 2) for simplicity. Then normalized covariance matrix of each classes:

C_{1}

and

C_{2}

is defined as

C_{i} = \sum_{j \in c l_{i}} \frac{X (j) X {(j)}^{T}}{t r (X (j) X {(j)}^{T})}

(1)

The covariance matrix is normalized with the sum of the diagonal elements of

X (j) X {(j)}^{T}

to remove variances among trials.

As $X (j) X {(j)}^{T}$ is symmetric, $C_{i}$ is symmetric and therefore if we define $C ≜ (C_{1} + C_{2})$ , it can be orthogonally diagonalized as

C = (C_{1} + C_{2}) = U Λ U^{T}

(2)

where

Λ

is a diagonal matrix consisting eigenvalues and

U

is an orthonormal matrix whose columns are eigenvectors of

C

and

U^{T} U = I

as

U^{T} = U^{- 1}

. Therefore,

U^{T} (C_{1} + C_{2}) U = U^{T} C U = Λ

(3)

By applying whitening transform and simultaneously diagonalizing using method proposed in [9], we get

{\bar{U}}^{T} (C_{1} + C_{2}) \bar{U} = I

(4)

where

\bar{U} = U Λ^{- \frac{1}{2}}

, and letting

S_{1} ≜ {\bar{U}}^{T} C_{1} \bar{U}

and

S_{2} ≜ {\bar{U}}^{T} C_{2} \bar{U}

,

S_{1}

and

S_{2}

can be simultaneously diagonalized as

S_{1} = V Λ_{1} V^{T}, S_{2} = V Λ_{2} V^{T}

(5)

sharing same eigenvectors satisfying

Λ_{1} + Λ_{2} = I

(6)

by properties of simultaneous diagonalization.

Equation (5), (6) are important properties as it shows that when spanned by $V$ , the bigger the variance of one class is in one axis, the smaller the variance of other class in corresponding axis.
By transforming $Λ_{1}$ and $Λ_{2}$ back to equation about $C_{1}$ and $C_{2}$ , we get

Q^{T} C_{1} Q + Q^{T} C_{2} Q = I

(7)

where

Q = U Λ^{- \frac{1}{2}} V

. For an epoch

X \in R^{C \times N}

, we can calculate its CSP transformation as

X_{CSP} = \frac{d i a g ({\bar{Q}}^{T} X X^{T} \bar{Q})}{s u m (d i a g ({\bar{Q}}^{T} X X^{T} \bar{Q}))}

(8)

where

\bar{Q}

is a

2 k \times C

sized matrix which is composed of k best columns of Q that makes class 1’s variance large, and k best columns that makes class 2’s variance large. □

2.2. Filter Bank CSP^[10]

Typical frequency band analyzed in MI-EEG is 8-30Hz[11,12]. However, as EEG signals have variance among trials and subjects, algorithms’ performance can be enhanced by selecting specific frequency bands per subjects[13]. Therefore, the authors suggested to divide the frequency bands in small bands(4Hz in this paper), and apply CSP filters to each bands. This technique allowed automatic selection of meaningful frequency bands per subjects.

Although mutual information is not used very common in MI-EEG classification, MIBIF(Mutual Information-based Best Individual Feature) and MIRSR(Mutual Information-based Rough Set Reduction) was used to select features to be trained in this paper, which calculates mutual information among features and selects k features that best represent the data.

2.3. Sparse Common Spatial Pattern (SCSP)

CSP coefficients calculated above, Q, is dense, meaning that most of the entries in the matrix can be considered meaningful, and only some have negligible effect. However, the matrix

\bar{Q}

that includes the selected channels’ coefficients select only some of the columns of Q; therefore the original signal can not be transformed into a shape that can be discriminated clearest. Therefore, the authors have sparsified the CSP spatial filters, which can increase performance as the removed features include relatively unnecessary information compared to original CSP filters.

As CSP algorithm’s projection matrix

W

can be written as

\begin{matrix} min_{W_{i}} \sum_{i = 1}^{i = m} W_{i} C_{1} {W_{i}}^{T} + & \sum_{i = m + 1}^{i = 2 m} W_{i} C_{2} {W_{i}}^{T} \\ while W_{i} (C_{1} + C_{2}) W_{i}^{T} = 1, & i = {1, 2, \dots 2 m} \\ and W_{i} (C_{1} + C_{2}) W_{j}^{T} = 0, & i, j = {1, 2, \dots 2 m} i \neq j \end{matrix}

(9)

Sparsity of W can be increased with several methods, including

L_{0}

norm,

L_{p}

norm, and

L_{1} / L_{2}

norm, where

L_{p}

norm of

X

is defined as

{| | X | |}_{p} = {(\sum_{i = 1}^{n} {| X |}^{p})}^{1 / p}

(10)

where

| \cdot |

represents magnitude of vector

(\cdot)

, making

L_{0}

norm the number of non-zero entries of

X

and

L_{1}

norm the sum of magnitudes of vectors, also known as Manhattan distance.

According to [15], the

L_{1} / L_{2}

norm is more suit as a measure of non-sparsity than

L_{1}

and

L_{p}

norm as it is bounded within

(0, \sqrt{n})

, while

L_{1}

and

L_{p}

norm does not have an upper bound. Also,

L_{1} / L_{2}

norm does not depend on vectors’ size, while

L_{1}

and

L_{p}

depends on it.

Therefore,

L_{1} / L_{2}

norm is adopted to (9) with the scaling factor r and

(1 - r)

multiplied to it, forming

\begin{matrix} min_{W_{i}} (1 - r) \sum_{i = 1}^{i = m} W_{i} C_{1} {W_{i}}^{T} + & \sum_{i = m + 1}^{i = 2 m} W_{i} C_{2} {W_{i}}^{T} + r \sum_{i = 1}^{2 m} \frac{| | w_{i} {| |}_{1}}{| | w_{i} {| |}_{2}} \\ while W_{i} (C_{1} + C_{2}) W_{i}^{T} = 1, & i = {1, 2, \dots 2 m} \\ and W_{i} (C_{1} + C_{2}) W_{j}^{T} = 0, & i, j = {1, 2, \dots 2 m} i \neq j \end{matrix}

(11)

The original CSP algorithm is a complementary case when

r = 0

.

2.4. Robust Sparsd Common Spatial Pattern(RSCSP)

The RSCSP algorithm is a derivative algorithm of the SCSP algorithm developed to reduce the effect of outliers in the EEG signal. The non-robust SCSP algorithm calculates covariance matrix of class i:

C_{i}

as (1). As EEG signals are regularized,

C_{i}

can be written in the form of

C_{w} = \frac{1}{(t \times n_{w}) - 1} E_{w} E_{w}^{T}

(12)

, where

w = 1, 2

and

E_{w} \in R^{C \times (t \cdot n_{w})}

denotes EEG signals of class w concatenated by channels. As outliers can distort covariance matrices, MCD(Minimum Covariance Determinant) is used. MCD can be described as a process to find a subset H of original matrix

X = [c_{1}, c_{2}, \dots c_{n}]

, such that its determinant

det (Σ_{H})

is reduced, and can be mathematically represented as an optimization problem of

min_{H \subset X, | H | = h} det (Σ_{H})

where

C_{w} = \frac{1}{(α \times t \times n_{w}) - 1} E_{w} E_{w}^{T}

(13)

when

1 - α

is the breakdown point. FASTMCD algorithm can also be adopted to decrease the complexity of the calculation. FASTMCD algorithm first selects random h points and updates the points by ranking the data by their Mahalanobis distances and select h points with the smallest distances to form a new subset. Then if the new subset has a smaller determinant of covariance matrix, it functions as a new initial set and continues this process until the covariance matrix converges.

3. CSP Based Channel Selection Techniques

3.1. Bhattacharyya Bound of CSP Features

In [7], the authors reduced the number of channels by selecting a set of channels using forward search. Bhattacharyya bound, which is the upper bound of bayes error probability was used as evaluation, as the calculation of Bayes error probability is challenging. The statement: Bhattacharyya bound is the upper bound of Bayes error probability can be proved by Proof.

To summarize, the channels are selected by Algorithm 1

Proof.

The Bayes error probability

P_{e}

can be written as

P_{e} ≜ \int_{X} min (P (C_{1}) p (x | C_{1}), P (C_{2}) p (x | C_{2})) d x

(14)

where

P (C_{1})

and

P (C_{2})

are the prior probabilities of class 1 and 2, respectively.

p (x | C_{1})

and

p (x | C_{2})

are are class conditional probability density functions, and x represents an observed data point.

Sinces $min (a, b) \leq \sqrt{a b}$ ,

min (P (C_{1}) p (x | C_{1}), P (C_{2}) p (x | C_{2})) \leq \sqrt{P (C_{1}) P (C_{2}) p (x | C_{1}) p (x | C_{2})}

(15)

and integrating both sides over

X

results

\begin{matrix} P_{e} & \leq \int_{X} \sqrt{P (C_{1}) P (C_{2}) p (x | C_{1}) p (x | C_{2})} d x \end{matrix}

(16)

\begin{matrix} = \sqrt{P (C_{1}) P (C_{2})} \int_{X} \sqrt{p (x | C_{1}) p (x | C_{2})} d x \end{matrix}

(17)

By [17], (17) is a Bhattacharyya bound, which is a complementary case of Chernoff bound where

β = 0.5

, yielding

P_{e} \leq \sqrt{P (C_{1}) P (C_{2})} e^{- k (1 / 2)}

(18)

where

k (1 / 2) = \frac{1}{8} {(μ_{2} - μ_{1})}^{T} {[\frac{Σ_{1} + Σ_{2}}{2}]}^{- 1} (μ_{2} - μ_{1}) + \frac{1}{2} ln \frac{| \frac{Σ_{1} + Σ_{2}}{2} |}{\sqrt{| Σ_{1} | | Σ_{2} |}}

(19)

and

μ_{i}

and

Σ_{i}

represent mean vectors and covariance matrix of class i’s CSP feature vectors, respectively.

Algorithm 1 Bhattacharyya Bound Based Channel Selection

3.2. CSP-Rank

CSP-Rank is based on the idea that the matrix

\bar{Q}

in (8) includes the importance of each channel in classification as it acts as a weight to the original signal. The authors of this paper therefore set

k = 1

and selected two vectors:

S F_{1}

and

S F_{2}

, where

S F_{i}

maximizes class i’s variance. Therefore, the channels with the largest entries in each of

S F_{1}

,

S F_{2}

,

S F_{3}

, ⋯ are consecutively selected until the model trained with the selected channels reaches the desired accuracy.

The results are statistically significant at

p < 0.05

significance compared to random selection. Also, the number of channels needed to reach 90% accuracy was 8-38 channels, while SVM-RFE(Support-vector machine recursive feature elimination, explained in Section 4.1) required 12-28 channels, therefore being better than SVM-RFE.

3.3. $L_{1}$ Norm of CSP Filter

Since it is widely accepted that the effectiveness of the CSP algorithm decreases as the number of channels increases due to overfitting, the authors first applied the CSP filter to all channels and scored each channel based on its

L_{1}

norm. Then, the channels with high scores were selected and used to train the model. The following score is used for calculating channels’ weight:

S C (i) = \frac{| | {\bar{Q}}_{i} {| |}_{1}}{| | \bar{Q} {| |}_{1}}

(20)

where

\bar{Q} \in R^{2 k \times C}

is the CSP projection matrix from (8) , and

{\bar{Q}}_{i}

is the ith column of

\bar{Q}

, and can be ordered in decreasing order to select wanted number of channels.

3.4. Cohen’s d Effect Size CSP(E-CSP)

The E-CSP algorithm aims to remove channels that contain redundant information. The algorithm is divided into two major parts: part 1, where the noisy channels are removed, and part 2, where the remaining channels are evaluated to determine if they are meaningful.

First, to remove trials with large artifacts, the z-score calculated by equation, z-score, calculated by (21) is used.

Z_{l i}^{j} = \frac{| C_{l i}^{j} - \bar{C_{l i}} |}{σ}

(21)

where i, j, l represents channel number, trial,and class, respectively, and

σ

is the standard deviation of lth task’s ith channel, is calculated. The z-score was adopted as it measures how far the signal deviates from the average. Therefore, the ith channel’s jth trial can be considered noisy if

z_{l i}^{j} > α

, where

α

is the criterion of noisy channel. Therefore, let

s_{i j}^{l} = \{\begin{matrix} 1, & (z_{l i}^{j} > α) \\ 0, & (e l s e) \end{matrix}

and let the frequency of lth class’ jth trial being noisy as

f_{l}^{j} = \frac{\sum_{i = 1}^{N} s_{i j}^{l}}{N} \times 100

(22)

If the trial’s frequency exceeds the threshold

β %

, it is considered as a noisy trial. Therefore, we can define the set of noisy trials as

K_{l} = {j | f_{l}^{j} > β}; \forall j

(23)

Therefore, after removing the noisy channels, the ith channel’s mean for jth task can be calculated as

\bar{c_{l i}} = \frac{1}{| J_{s}^{l} |} \sum_{j \in J_{s}^{l}} c_{l i}^{j}

(24)

where

J_{s}^{l}

represents non-noisy channels.

Second, the effect size

d_{i}

of channel i is evaluated as a meaningful channel if

d_{i} = \frac{| \bar{c_{1 i}} - \bar{c_{2 i}} |}{σ} > γ

(25)

where

σ = \frac{σ_{1 i}^{j} + σ_{2 i}^{j}}{2}; \forall j \in J_{s}^{l}

(26)

and

σ_{l i}^{j}

stands for the standard deviation of lth class’ ith channel, and

γ

represents the criterion for a channel to be thought as meaningless, as the variance among two class is small.

Finally, the leftover channels L, are selected and CSP filter is applied.

4. Non-CSP Based Methods

4.1. Support-Vector Machine Recursive Feature Elimination (SVM-RFE)^[21,22]

SVM(Hard Margin Support Vector Machine) Algorithm

Support vector machine is an algorithm that aims to find a hyperplane that maximizes the margin among the given classes. Let there be N data points:

(x_{1}, y_{1}), (x_{2}, y_{2}), \dots (x_{N}, y_{N})

where d is the dimension of the point,

x_{i} \in R^{d}

and

y_{i} \in {1, 2}

be a label. The hyperplane can be written as

f o l l o w s (x) = w^{T} x + b

where w is the weight and b is the bias. The optimization problem of w and b to maximize the margin can be written as

J = min_{w, b} \frac{1}{2} {| | w | |}^{2} subject to y_{i} (w^{T} x_{i} + b) \geq 1, \forall i

(27)

, and this problem can be solved using the Lagrange multiplier.

Let Lagrange function

L (w, b, α) = \frac{1}{2} {| | w | |}^{2} - \sum_{i = 1}^{N} α_{i} (y_{i} (w^{T} x_{i} + b) - 1)

(28)

, where

α \geq 0

is the Lagrange multiplier. By partial differentiating with w, we get

\frac{\partial L}{\partial w} = w - \sum_{i = 1}^{N} α_{i} x_{i} y_{i} = 0 ⟹ w = \sum_{i = 1}^{N} α_{i} x_{i} y_{i}

(29)

By partial differentiating with b,

\frac{\partial L}{\partial b} = - \sum_{i = 1}^{N} α_{i} y_{i} = 0 ⟹ \sum_{i = 1}^{N} α_{i} y_{i} = 0

(30)

Therefore, we get a dual Lagrangian function

L_{D} (α) = \sum_{i = 1}^{N} α_{i} - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} α_{i} α_{j} y_{i} y_{j} x_{i}^{T} x_{j},

(31)

making (27)

max \sum_{i = 1}^{N} α_{i} - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} α_{i} α_{j} y_{i} y_{j} (x_{i} \cdot x_{j}) subject to \sum_{i = 1}^{N} α_{i} y_{i} = 0

(32)

Soft Margin SVM Algorithm

However, as real datasets are not discriminated using linearly, slack variable

ξ_{i}

is adopted, changing the objective (27) into

J = min_{w, b} \frac{1}{2} {| | w | |}^{2} + C \sum_{i = 1}^{N} ξ_{i} subject to y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0, \forall i

(33)

and can be derived into a same shape as (32) using the same method from above as

C \sum_{i = 1}^{N} ξ_{i}

is a constant term.

SVM-RFE Algorithm

SVM-RFE algorithm aims to find a subset of size r among n variables

(r < n)

which maximizes the performance of the classifier using backward sequential selection.

From previous research [22,23], it is well known that

Δ J

can be used as a criterion for elimination. The equation (33) can be transformed into

Δ J (i) = \frac{\partial J}{\partial w_{i}} Δ w_{i} + \frac{\partial^{2} J}{\partial w_{i}^{2}} {(Δ w_{i})}^{2}

(34)

using Taylor series to the second order. As the objective function J reaches its optimal point,

\frac{\partial J}{\partial w_{i}} Δ w_{i} = 0

. Therefore,

Δ J (i) = \frac{\partial^{2} J}{\partial w_{i}^{2}} {(Δ w_{i})}^{2}

(35)

. When removing ith feature, the weight

w_{i}

must be 0, therefore

Δ w_{i} = w_{i}

. Therefore, the criterion for removing the ith feature can be approximated as

C_{i} = Δ J (i) = {(w_{i})}^{2}

(36)

4.2. $γ^{2}$ Value Based Channel Selection

Similar to how the Bhattacharyya bound-based CSP selection used the F-ratio for selecting the initial subset of channels, the authors of [19] have utilized the

γ^{2}

value as a control group.

γ^{2}

is defined as

γ^{2} (i) = {(\frac{\sqrt{n_{1} n_{2}}}{n_{1} + n_{2}} \frac{\bar{| | x_{i}^{(1)} {| |}_{2}} - \bar{| | x_{i}^{(2)} {| |}_{2}}}{s t d e v (| | x_{i}^{(1, 2)} | |_{2})})}^{2}

(37)

where

k \in {1 (alias of c l_{1}), 2 (alias of c l_{2})}

,

x_{i}^{(k)} \in R^{T}

denotes ith channel of

X^{(k)}

, which means

X^{(k)} = {[x_{1}^{(k)}, x_{2}^{(k)}, \dots, x_{N}^{(k)}]}^{T}

, and

n_{k}

denotes number of epochs in class k.

4.3. Correlation Based Channel Selection^[24,25]

EEG has a limitation that even though one is repeating the same task, due to other activities and thoughts in the brain, different EEG is recorded, and makes it hard to analyze the task performed. The authors assumed that as channels related to MI-EEG while performing same task will contain similar information across trials, thus having high correlation among trials, while channels that are not related will not have alike information, therefore having low correlation.

Pearson’s correlation is used in both papers, which measures linear dependency among random variables, and is calculated by

ρ (X, Y) = \frac{1}{N - 1} \sum_{i = 1}^{n} (\frac{X_{i} - \bar{X}}{σ_{X}}) (\frac{Y_{i} - \bar{Y}}{σ_{Y}})

(38)

where

X, Y

are random variables,

\bar{X}, \bar{Y}

are means of random variables, and

σ_{X}, σ_{Y}

represents standard deviation of X and Y respectively. Correlations are calculated among every channels, forming a correlation matrix of size

C \times C

. By averaging each row, we can find the channel that is most effective. By repeating this procedure among every epochs, we can select

N_{s}

number of channels by selecting the channels that are considered to be effective over certain criteria times.

4.4. Genetic Algorithm (GA) Based Selection

Genetic Algorithm(henceforth GA) is a heuristic methodology for finding an optimal solution by mimicking Charles Darwin’s natural evolution theory. Genetic algorithm mimics how genetic information are passed over through generations. Like DNA is formed with genes, a gene is an element that forms a chromosome. A chromosome is a set of genes that forms a potential solution to the problem. The fitness function evaluates how well each chromosome solves the problem. The scores calculated are used for selection, which is a process of selecting chromosomes that will create an offspring. The higher the score is, the higher chance it has to be selected. Like our DNA, an offspring is created by combining the genetic information of two chromosomes. The point where that crossover will be made is randomly selected. For example, aaaa and bbbb can become aaaa, abbb, aabb, abbb,and bbbb. Mutation happens randomly at each position, changing the value from 0 to 1 or 1 to 0.

The genetic algorithm follows the flow chart Figure 4:

Genetic Algorithm For Selecting Optimal Channels for P300 Based Classification ^[26,27]

EEG signals can be roughly divided into four categories: Visual Evoked Potential (VEP), Sensorimoter Rythm (SRM), Slow Cortical Potentials (SCP), and Event-Related Potentials (ERP). VEP is a signal acquired from three occipital electrodes, while forming two channels by setting the middle electrode as a reference, and represents the pathway of visual activity, from optic nerve to primary optic cortex. SRM is a signal recorded from sensorimotor cortex at mu(7–13 Hz), beta(13–30 Hz), and gamma(30–200 Hz) frequency bands, and is strongly modulated with simple movements and certain cognitive tasks. SCP is a measurement of a slowly changing electrical signal with frequency lower than 1Hz. ERP is not a term referencing an EEG measured in specific location, but is a fluctuation generated by neurons due to some event, such as motor or speech imagery, at a specific time. [26,28,29,30,31,32,33,34].

P300 is an unique ERP, which appears as a positive peak near 300ms after stimulation happens. A 64 channel EEG was recorded and four channels were selected in [26], by defining the fitness function

f (X) = \sum_{k = 1}^{n} A_{K}

where X is a chromosome consisting four genes, which are channels in this case.

A_{k}

is defined as the maximum

r^{2}

among four channels, where

r^{2}

is a value between 0 and 1, and higher the

r^{2}

is, the higher the variance among classes are.

5. Result

In this study, we utilized the EEG dataset from BCI Competition IV Dataset 2a, which is specifically designed for benchmarking motor imagery classification algorithms. The dataset comprises EEG recordings from nine healthy subjects, each performing four different motor imagery tasks: imagining movements of the left hand, right hand, both feet, and tongue. EEG signals were recorded using 22 Ag/AgCl electrodes placed according to the international 10-20 system at a sampling rate of 250Hz [35].

For preprocessing, we applied a band-pass filter between 8Hz and 30Hz, a frequency band commonly used in previous research[11,12]. Baseline correction was performed by subtracting the mean of a 2 second pre-cue interval, mentioned as "Fixation cross" (from -2s to 0s relative to the cue onset), from the subsequent motor imagery data. The entire 4seconds of motor imagery data (from 0s to 4s after the cue onset) were used for classification.

Data were split into training and evaluation datasets using 5-fold cross-validation, ensuring that the classes were balanced in terms of the number of trials in each fold. Cross-validation was performed individually for each subject to account for inter-subject variability of EEG signals.

The accuracies of each algorithm were measured by training a support vector machine (SVM) classifier with a linear kernel using the data from the selected channels.

5.1. CSP and SCSP Algorithm

CSP algorithm can be thought of as a special case of the SCSP algorithm where

r = 0

. The accuracy of SCSP algorithm depends on its r value, so accuracy of SCSP algorithm where

r = 0, 0.1, 0.2, 0.3, 0.4, 0.5

is compared as Figure 5.

5.2. Accuracy of CSP Based Channel Selection Methods

The accuracy of CSP based channel selection methods: CSPRank(Section 3.2), L1 Norm applied CSPRank(Section 3.3), SCSPRank(SCSP used instead of CSP in CSPRank), E-CSP(Section 3.4), and Bhattacharyya bounded CSP(Section 3.1) is as following Table 1 and Figure 6.

Table 1. Comparison of Classification Performance for Various CSP Based EEG Channel Selection Methods Across Different Numbers of Channels.

5.3. Accuracy of Non-CSP Based Statistical Channel Selection Methods

The accuracy of Non-CSP based channel selection methods: SVM-RFE(Section 4.1),

γ^{2}

value based channel selection(Section 4.2), Correlation based channel selection(Section 4.3), and Genetic Algorithm based channel selection(Section 4.4) is as following Table 2 and Figure 7.

Table 2. Comparison of Classification Performance for Various Non-CSP Based Channel Selection Methods Across Different Numbers of Channels.

References

Alotaiby, T.; El-Samie, F.E.A.; Alshebeili, S.A.; Ahmad, I. A review of channel selection algorithms for EEG signal processing. EURASIP Journal on Advances in Signal Processing 2015, 2015, 1–21. [Google Scholar] [CrossRef]
Baig, M.Z.; Aslam, N.; Shum, H.P. Filtering techniques for channel selection in motor imagery EEG applications: a survey. Artificial intelligence review 2020, 53, 1207–1232. [Google Scholar] [CrossRef]
Electroencephalography (EEG): An Introductory Text and Atlas of Normal and Abnormal Findings in Adults, Children, and Infants.
Choi, J.; Kaongoen, N.; Jo, S. Investigation on Effect of Speech Imagery EEG Data Augmentation with Actual Speech. In Proceedings of the 2022 10th International Winter Conference on Brain-Computer Interface (BCI); pp. 1–5. [CrossRef]
Roth, M.; Decety, J.; Raybaudi, M.; Massarelli, R.; Delon-Martin, C.; Segebarth, C.; Morand, S.; Gemignani, A.; Décorps, M.; Jeannerod, M. Possible involvement of primary motor cortex in mentally simulated movement: a functional magnetic resonance imaging study. NeuroReport 1996, 7. [Google Scholar] [CrossRef] [PubMed]
Yue, G.; Cole, K.J. Strength increases from the motor program: Comparison of training with maximal voluntary and imagined muscle contractions. Journal of Neurophysiology 1992, 67, 1114–1123. [Google Scholar] [CrossRef] [PubMed]
Lin, H.; Zhuliang, Y.; Zhenghui, G.; Yuanqing, L. Bhattacharyya bound based channel selection for classification of motor imageries in EEG signals. In Proceedings of the 2009 Chinese Control and Decision Conference; pp. 2353–2356. [CrossRef]
Ramoser, H.; Müller-Gerking, J.; Pfurtscheller, G. Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng 2000, 8, 441–6. [Google Scholar] [CrossRef]
Fukunaga, K. Chapter 2 - RANDOM VECTORS AND THEIR PROPERTIES. In Introduction to Statistical Pattern Recognition, 2nd ed.; Fukunaga, K., Ed.; Academic Press: Boston, 1990; pp. 11–50. [Google Scholar] [CrossRef]
Ang, K.K.; Chin, Z.Y.; Wang, C.C.; Guan, C.T.; Zhang, H.H. Filter bank common spatial pattern algorithm on BCI competition IV Datasets 2a and 2b. Frontiers in Neuroscience 2012, 6. [Google Scholar] [CrossRef] [PubMed]
Blankertz, B.; Tomioka, R.; Lemm, S.; Kawanabe, M.; Muller, K.r. Optimizing Spatial filters for Robust EEG Single-Trial Analysis. IEEE Signal Processing Magazine 2008, 25, 41–56. [Google Scholar] [CrossRef]
Müller-Gerking, J.; Pfurtscheller, G.; Flyvbjerg, H. Designing optimal spatial filters for single-trial EEG classification in a movement task. Clinical Neurophysiology 1999, 110, 787–798. [Google Scholar] [CrossRef] [PubMed]
Blankertz, B.; Dornhege, G.; Krauledat, M.; Müller, K.R.; Curio, G. The non-invasive Berlin Brain–Computer Interface: Fast acquisition of effective performance in untrained subjects. NeuroImage 2007, 37, 539–550. [Google Scholar] [CrossRef]
Arvaneh, M.; Guan, C.; Ang, K.K.; Quek, C. Optimizing the Channel Selection and Classification Accuracy in EEG-Based BCI. IEEE Transactions on Biomedical Engineering 2011, 58, 1865–1873. [Google Scholar] [CrossRef] [PubMed]
Hurley, N.; Rickard, S. Comparing Measures of Sparsity. IEEE Transactions on Information Theory 2009, 55, 4723–4741. [Google Scholar] [CrossRef]
Arvaneh, M.; Guan, C.; Ang, K.K.; Quek, C. Robust EEG channel selection across sessions in brain-computer interface involving stroke patients. In Proceedings of the The 2012 International Joint Conference on Neural Networks (IJCNN); pp. 1–6. [CrossRef]
Duda, R.O.; Hart, P.E.; Stork, D.G. Chapter2 - Bayesian decision theory. In Pattern Classification, 2nd ed.; Duda, R.O., Hart, P.E., Stork, D.G., Eds.; Wiley, 2000; book section 2; pp. 1–32. [Google Scholar]
Tam, W.K.; Ke, Z.; Tong, K.Y. Performance of common spatial pattern under a smaller set of EEG electrodes in brain-computer interface on chronic stroke patients: A multi-session dataset study. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society; pp. 6344–6347. [CrossRef]
Meng, J.; Liu, G.; Huang, G.; Zhu, X. Automated selecting subset of channels based on CSP in motor imagery brain-computer interface system. In Proceedings of the 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO); pp. 2290–2294. [CrossRef]
Das, A.K.; Suresh, S. An Effect-Size Based Channel Selection Algorithm for Mental Task Classification in Brain Computer Interface. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics; pp. 3140–3145. [CrossRef]
Qifeng, Z.; Wencai, H.; Guifang, S.; Weiyou, C. A new SVM-RFE approach towards ranking problem. In Proceedings of the 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Vol. 4; pp. 270–273. [CrossRef]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 2002, 46, 389–422. [Google Scholar] [CrossRef]
LeCun, Y.; Denker, J.; Solla, S. Optimal brain damage. Advances in neural information processing systems 1989, 2. [Google Scholar]
Jin, J.; Miao, Y.Y.; Daly, I.; Zuo, C.L.; Hu, D.W.; Cichocki, A. Correlation-based channel selection and regularized feature optimization for MI-based BCI. Neural Networks 2019, 118, 262–270. [Google Scholar] [CrossRef] [PubMed]
Hsu, W.Y.; Cheng, Y.W. EEG-channel-temporal-spectral-attention correlation for motor imagery EEG classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2023, 31, 1659–1669. [Google Scholar] [CrossRef] [PubMed]
Hasan, I.H.; Ramli, A.R.; Ahmad, S.A. Utilization of Genetic Algorithm for Optimal EEG Channel Selection in Brain-Computer Interface Application. In Proceedings of the 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology; pp. 97–102. [CrossRef]
Kee, C.Y.; Kuppan Chetty, R.M.; Khoo, B.H.; Ponnambalam, S.G. Genetic Algorithm and Bayesian Linear Discriminant Analysis Based Channel Selection Method for P300 BCI. In Proceedings of the Trends in Intelligent Robotics, Automation, and Manufacturing; Ponnambalam, S.G.; Parkkinen, J.; Ramanathan, K.C., Eds. Springer Berlin Heidelberg; pp. 226–235.
Brenner, R. CHAPTER 77 - INVESTIGATIONS IN MULTIPLE SCLEROSIS. In Neurology and Clinical Neuroscience; Schapira, A.H.V., Byrne, E., DiMauro, S., Frackowiak, R.S.J., Johnson, R.T., Mizuno, Y., Samuels, M.A., Silberstein, S.D., Wszolek, Z.K., Eds.; Mosby: Philadelphia, 2007; pp. 1031–1044. [Google Scholar] [CrossRef]
Gibson, R.M.; Owen, A.M.; Cruse, D. Chapter 9 - Brain–computer interfaces for patients with disorders of consciousness. In Progress in Brain Research; Coyle, D., Ed.; Elsevier, 2016; Volume 228, pp. 241–291. [Google Scholar] [CrossRef]
Samar, V.J. Evoked Potentials. In Encyclopedia of Language & Linguistics (Second Edition); Brown, K., Ed.; Elsevier: Oxford, 2006; pp. 326–335. [Google Scholar] [CrossRef]
Cinzia Baiano, M.Z. Visual Evoked Potential, 2023 May 11.
Schmidt, S.; Jo, H.G.; Wittmann, M.; Hinterberger, T. ‘Catching the waves’ – slow cortical potentials as moderator of voluntary action. Neuroscience & Biobehavioral Reviews 2016, 68, 639–650. [Google Scholar] [CrossRef]
Cheyne, D.O. MEG studies of sensorimotor rhythms: A review. Experimental Neurology 2013, 245, 27–39. [Google Scholar] [CrossRef]
Mudgal, S.K.; Sharma, S.K.; Chaturvedi, J.; Sharma, A. Brain computer interface advancement in neurosciences: Applications and issues. Interdisciplinary Neurosurgery 2020, 20, 100694. [Google Scholar] [CrossRef]
Tangermann, M.; Müller, K.R.; Aertsen, A.; Birbaumer, N.; Braun, C.; Brunner, C.; Leeb, R.; Mehring, C.; Miller, K.J.; Mueller-Putz, G.; et al. Review of the BCI Competition IV. Frontiers in Neuroscience 6. [CrossRef] [PubMed]

Figure 1. Procedure of EEG Classification

Figure 2. Procedure of Channel Selection

Figure 3. Example of CSP filtering (

n = 2

)

Figure 3. Example of CSP filtering (

n = 2

)

Figure 4. Flow Chart of Genetic Algorithm

Figure 5. Performance Comparison of SCSP Algorithms With Different r Value Across Different Channels

Figure 6. Performance of CSP Based Channel Selection Methods Across Different Channels

Figure 7. Performance of Non-CSP Based Channel Selection Methods Across Different Channels

Seungjun Lee is a senior student at Korea Science Academy of KAIST. His research interest is bio-signal processing, especially EEG data

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Mathematical Review on EEG Channel Selection Techniques for Motor Imagery Classification

Abstract

Keywords:

Subject:

1. Introduction

2. Common Spatial Pattern(CSP) Based Filters

2.1. Original CSP Filter

Derivation^[7,8,9]

2.2. Filter Bank CSP^[10]

2.3. Sparse Common Spatial Pattern (SCSP)

2.4. Robust Sparsd Common Spatial Pattern(RSCSP)

3. CSP Based Channel Selection Techniques

3.1. Bhattacharyya Bound of CSP Features

3.2. CSP-Rank

3.3. $L_{1}$ Norm of CSP Filter

3.4. Cohen’s d Effect Size CSP(E-CSP)

4. Non-CSP Based Methods

4.1. Support-Vector Machine Recursive Feature Elimination (SVM-RFE)^[21,22]

SVM(Hard Margin Support Vector Machine) Algorithm

Soft Margin SVM Algorithm

SVM-RFE Algorithm

4.2. $γ^{2}$ Value Based Channel Selection

4.3. Correlation Based Channel Selection^[24,25]

4.4. Genetic Algorithm (GA) Based Selection

Genetic Algorithm For Selecting Optimal Channels for P300 Based Classification ^[26,27]

5. Result

5.1. CSP and SCSP Algorithm

5.2. Accuracy of CSP Based Channel Selection Methods

5.3. Accuracy of Non-CSP Based Statistical Channel Selection Methods

References

MDPI Initiatives

Important Links

Subscribe

A Mathematical Review on EEG Channel Selection Techniques for Motor Imagery Classification

Abstract

Keywords:

Subject:

1. Introduction

2. Common Spatial Pattern(CSP) Based Filters

2.1. Original CSP Filter

Derivation[7,8,9]

2.2. Filter Bank CSP[10]

2.3. Sparse Common Spatial Pattern (SCSP)

2.4. Robust Sparsd Common Spatial Pattern(RSCSP)

3. CSP Based Channel Selection Techniques

3.1. Bhattacharyya Bound of CSP Features

3.2. CSP-Rank

3.3. L 1 Norm of CSP Filter

3.4. Cohen’s d Effect Size CSP(E-CSP)

4. Non-CSP Based Methods

4.1. Support-Vector Machine Recursive Feature Elimination (SVM-RFE)[21,22]

SVM(Hard Margin Support Vector Machine) Algorithm

Soft Margin SVM Algorithm

SVM-RFE Algorithm

4.2. γ 2 Value Based Channel Selection

4.3. Correlation Based Channel Selection[24,25]

4.4. Genetic Algorithm (GA) Based Selection

Genetic Algorithm For Selecting Optimal Channels for P300 Based Classification [26,27]

5. Result

5.1. CSP and SCSP Algorithm

5.2. Accuracy of CSP Based Channel Selection Methods

5.3. Accuracy of Non-CSP Based Statistical Channel Selection Methods

References

MDPI Initiatives

Important Links

Subscribe

Derivation^[7,8,9]

2.2. Filter Bank CSP^[10]

3.3. $L_{1}$ Norm of CSP Filter

4.1. Support-Vector Machine Recursive Feature Elimination (SVM-RFE)^[21,22]

4.2. $γ^{2}$ Value Based Channel Selection

4.3. Correlation Based Channel Selection^[24,25]

Genetic Algorithm For Selecting Optimal Channels for P300 Based Classification ^[26,27]