Lab 3: Multivariate Methods

Lab 3: Multivariate Methods#

Parameter Estimation#

Let \(𝑋={𝑋_1,…,𝑋_𝑃}\) be a \(p\)-dimension point. The mean vector is \(\mu=𝐸(𝑋)\) and the covariance matrix is \(Σ_{𝑝×𝑝}\)

Two normal distributions with different mean and covariance matrix#

library(MASS)
sigma1 = matrix(2,2,2)
sigma1[1,2] = sigma1[2,1]=0.5
bvn1 = mvrnorm(100, mu=c(3,4), Sigma=sigma1)

sigma2 = matrix(1,2,2)
sigma2[1,2] = sigma2[2,1] = 0.5
bvn2 = mvrnorm(100, mu=c(7,8), Sigma=sigma2)

plot(bvn1,xlim=c(0,12),ylim=c(0,12),col="blue",xlab="X",ylab="Y")
points(bvn2,col="red")

_images/3dacae72b44bc9cc6a078e0cd51e1fc147471c5f9a383ab8175569d436172a23.png

The mean vector \(\mu\) can be estimated by the sample average

bvn1_average = apply(bvn1,2,mean)
bvn2_average = apply(bvn2,2,mean)
print("the first group")
bvn1_average
print("the second group")
bvn2_average

[1] "the first group"

2.87161703821864
3.94169121104734

[1] "the second group"

7.11777570362168
7.99345609581267

The covariance matrix can be estimated by the sample covariance matrix

bvn1_cov = cov(bvn1)
bvn2_cov = cov(bvn2)

print("the first group")
bvn1_cov
print("the second group")
bvn2_cov

[1] "the first group"

A matrix: 2 × 2 of type dbl
2.1131742	0.3302313
0.3302313	1.7735052

[1] "the second group"

A matrix: 2 × 2 of type dbl
0.9037220	0.2699806
0.2699806	0.8289249

Two classes may have a common covaraince matrix#

sigma = matrix(1,2,2)
sigma[1,2] = sigma[2,1]=0.5

bvn1 = mvrnorm(100, mu=c(3,4), sigma)
bvn2 = mvrnorm(100, mu=c(7,8), sigma)

plot(bvn1,xlim=c(0,12),ylim=c(0,12),col="blue",xlab="X",ylab="Y")
points(bvn2,col="red")

_images/92522c7f5a4f230ac0894665585368c6486aa4f4ae477ac840dd5247dc96c026.png

The covariance matrix is estimated by the sample covariance of the pooled data

pooldata = rbind(bvn1-mean(bvn1),bvn2-mean(bvn2))
bvn1_cov = cov(pooldata)

print("The pooled covariance matrix")
bvn1_cov

[1] "The pooled covariance matrix"

A matrix: 2 × 2 of type dbl
0.9475637	0.4548899
0.4548899	1.0334350

Diagnal covariance matrix#

In this case, the coordinate random variables \(X_1,...X_p\) are independently distributed with a normal distribution

sigma = matrix(1,2,2)
sigma[1,2] = sigma[2,1]= 0
sigma[2,2] = 4

bvn1 = mvrnorm(100, mu=c(3,4), sigma)
bvn2 = mvrnorm(100, mu=c(7,8), sigma)

plot(bvn1,xlim=c(0,12),ylim=c(0,12),col="blue",xlab="X",ylab="Y")
points(bvn2,col="red")

_images/6f9fdf00465879642b9920021b8061ca7b5f200b4e4fa1468284feb95c0c79cd.png

Independent random variables with a common variance#

sigma = matrix(1,2,2)
sigma[1,2] = sigma[2,1]= 0

bvn1 = mvrnorm(100, mu=c(3,4), sigma)
bvn2 = mvrnorm(100, mu=c(7,8), sigma)

plot(bvn1,xlim=c(0,12),ylim=c(0,12),col="blue",xlab="X",ylab="Y")
points(bvn2,col="red")

_images/be3203d6f04e4c6ae090c5eefe9d7e5bf6ad964b978acd856f71801c27cd4c9b.png

Estimation of Missing Values#

Values of certain variables may be missing in data. For example, the first 10 values of the first column of bvn1 are missing

bvn1[1:10,1] = NA
bvn1

Show code cell output Hide code cell output

A matrix: 100 × 2 of type dbl
NA	3.342038
NA	5.142741
NA	4.449138
NA	3.683020
NA	4.620892
NA	4.383494
NA	3.983260
NA	4.329717
NA	3.719771
NA	3.890577
4.1620446	3.286794
2.4238473	4.550390
1.9159840	4.585138
3.3144763	4.771536
2.2444711	4.388768
3.7997688	3.138331
3.0989902	3.879268
4.2320545	4.605499
0.4487672	3.878384
4.7314205	4.514621
3.1831526	4.133918
2.4739274	4.764042
2.7914091	3.385800
4.4837058	2.690229
3.7726170	5.139406
3.6277893	4.410551
2.5915736	2.754170
4.2250190	3.730195
3.6903095	3.366985
1.7657510	2.472587
⋮	⋮
1.784163	4.0136187
3.154825	3.8608814
3.187552	3.4497028
4.759820	3.6720317
4.121890	4.8942886
2.517772	4.1546984
3.073266	0.8286488
3.476263	3.2605749
3.751818	5.5526505
1.557806	5.5481571
3.496160	4.6919553
2.897365	4.7821198
3.828848	5.7259362
3.983346	3.6790456
3.397573	5.4021710
3.359194	2.4725090
2.317347	2.4182551
3.714664	3.8142191
4.180993	4.1766792
3.911623	2.7437576
3.292095	3.2170732
3.176502	5.6899334
3.394305	4.9115720
1.907226	2.6100076
2.726091	5.1797545
4.049758	4.6647227
3.082837	4.6455573
3.619845	1.7354289
1.507648	0.7690181
2.384671	4.7626383

We fill in the missing entries by estimating them, i.e., imputation. In the main imputation, missing values are replaced by the average of the available data

bvn1[1:10,1] = mean(bvn1[,1],na.rm=T)
bvn1[1:10,1]

3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298
3.05401879991298

In imputation by regression, missing values are predicted by linear regression

x = bvn1[-(1:10),]
reg = lm(x[,1]~x[,2])
bvn1[1:10,] = reg$coef[1]+bvn1[1:10,2]*reg$coef[2]
bvn1[1:10,1]

2.96677099935777
3.23452732602819
3.13139174297157
3.01747345299547
3.15693075435671
3.12163079969472
3.06211776135899
3.1136344182391
3.02293823082497
3.04833624202013

Multivariate Classification#

Let \(\{C_i: i=1,...,k\}\) be the \(k\) classes. The points in the class \(C_i\) follow the multivariate normal distribution with mean vector \(\mu_i\) and covariance matrix \(\Sigma_i\).

Given the training data \(X_i\) in class \(C_i\), the mean vector and covariance matrix can be estimated by the sample average \(\bar{X}_i\) and sample covariance matrix \(S_i\)

sigma1 = matrix(2,2,2)
sigma1[1,2] = sigma1[2,1]=0.5
bvn1 = mvrnorm(100, mu=c(3,4), Sigma=sigma1)

sigma2 = matrix(1,2,2)
sigma2[1,2] = sigma2[2,1] = 0.5
bvn2 = mvrnorm(100, mu=c(7,8), Sigma=sigma2)

bvn1_average = apply(bvn1,2,mean)
bvn2_average = apply(bvn2,2,mean)

bvn1_cov = cov(bvn1)
bvn2_cov = cov(bvn2)

Let \(P(C_i): i=1,...k\) be the prior probabilities of the k classes. Given the training data \(X\), the probablity \(P(C_i)\) can be estimated by the proportion of points in the class \(C_i\)

The Bayes classifier is given by the posterior probability \(g_i(x) = logf(x|C_i) + log(C_i)\). We substitute the mean vector, covariance matrix, and prior probabilties with their estimates. The posterior probability of the class \(C_i\) is

\[g_i(x) = -\frac{1}{2}(x-\bar{X}_i)^TS_i^{-1}(x-\bar{X}_i)+\hat{P}(C_i)\]

The Bayes classification is that \(x\in C_i\) if \(g_i(x) > g_j(x)\) for \(i,j = 1,...,k\) and \(j\ne i\)

x = rbind(bvn1,bvn2)
g1 = -0.5*diag((x-bvn1_average)%*%solve(bvn1_cov)%*%t(x-bvn1_average))+0.5
g2 = -0.5*diag((x-bvn2_average)%*%solve(bvn2_cov)%*%t(x-bvn2_average))+0.5
print("first class")
which(g1>g2)
print("second class")
which(g1<g2)

print("two misclassified points")
x[which(g1[101:200]>g2[101:200]),]

[1] "first class"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
92
94
95
96
97
98
99
100
126
133
140
146
166
170
172
180

[1] "second class"

91
93
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
127
128
129
130
131
132
134
135
136
137
138
139
141
142
143
144
145
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
167
168
169
171
173
174
175
176
177
178
179
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

[1] "two misclassified points"

A matrix: 8 × 2 of type dbl
4.212089	4.406357
5.986891	2.606797
3.452428	2.750365
5.175380	1.959169
3.253416	4.969929
4.398661	5.682706
1.928165	3.643036
3.856544	5.994505

Two classes have a common covariance matrix#

If two classes have a common covariance matrix \(S\), the posterior probability of the class \(C_i\) is

\[g_i(x) = -\frac{1}{2}(x-\bar{X}_i)^TS^{-1}(x-\bar{X}_i)+\hat{P}(C_i)\]

When \(g_i(x)\) is compared with \(g_j(x)\), the quadratic term \(x^TS^{-1}x\) cancels because it is common in all posterior probabilities of classes. Thus, it becomes a linear discriminant

\[g_i(x) = \bar{X}_i^TS^{-1}x -\frac{1}{2}\bar{X}_i^TS^{-1}\bar{X}_i + \hat{P}(C_i)\]

The Bayes classification is that \(x\in C_i\) if \(g_i(x) > g_j(x)\) for \(i,j = 1,...,k\) and \(j\ne i\)

pooldata = rbind(bvn1-mean(bvn1),bvn2-mean(bvn2))
bvn1_cov = bvn2_cov = cov(pooldata)

m1 = 0.5*bvn1_average%*%solve(bvn1_cov)%*%bvn1_average
m2 = 0.5*bvn2_average%*%solve(bvn1_cov)%*%bvn2_average

x = rbind(bvn1,bvn2)
g1 = bvn1_average%*%solve(bvn1_cov)%*%t(x) - c(m1,m1) + 0.5
g2 = bvn2_average%*%solve(bvn2_cov)%*%t(x) - c(m2,m2) + 0.5
print("first class")
which(g1>g2)
print("second class")
which(g1<g2)

print("two misclassified points")
print(x[which(g1[101:200]>g2[101:200]),])

[1] "first class"

2
3
4
5
6
7
8
9
10
11
12
13
15
16
17
19
20
21
22
23
24
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
92
94
95
96
97
98
99
100
163
170
172

[1] "second class"

1
14
18
25
91
93
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
164
165
166
167
168
169
171
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

[1] "two misclassified points"

         [,1]     [,2]
[1,] 4.104538 2.824454
[2,] 4.398661 5.682706
[3,] 1.928165 3.643036

Regularized discriminant analysis#

Let \(S_i\) be the sample covaraince matrix for class \(i\) and let \(S\) be the covariance matrix of the pool data. The covariance matrix is written as a weighted average of the three special cases

\[w(\lambda) = \lambda S + (1-\lambda) S_i\]

\[v(\lambda,\gamma) = (1-\gamma)w(\lambda) + \gamma\frac{1}{p}tr(w(\lambda))I\]

When \(\lambda=\gamma=0\), it is a quadratic classifier.

When \(\lambda=1\) and \(\gamma=0\), it is a linear classifier.

When \(\lambda=0\) and \(\gamma=1\), the covariance matrices are diagonal with \(\sigma^2\) and it is the nearest mean classifier.

When \(\lambda=1\) and \(\gamma=1\), the covariance matrices are diagonal with the same variance.

The choice of \(\lambda\) and \(\gamma\) can be optimized by cross-validation

library(mlbench)
library(caret)
library(glmnet)
library(klaR)

data(Sonar)
Sonar