-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTWOMBLI_PCA.Rmd
139 lines (102 loc) · 7.24 KB
/
TWOMBLI_PCA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: "Analysis on Matrix Characteristics"
output: html_notebook
---
This code is the analysis used to perform Principal Component Analysis (PCA) on TWOMBLI analyzed, RGB Trichrome stained mouse skin. The analysis was performed on five groups of data in two sets:
- p42 ctrl, 21d Wnt Act, 21d Wnt act Dpp4-/-
- p63 ctrl, 21d reversal
Here we are using each image as a datapoint. This is important to note because greater number of datapoints allows us to make better predictions, gives additional confidence, and is statistically stronger.
Ensure PCA is the right tool for the job. There are 5 assumptions to check:
1. Continuous variables √
2. Linear relationship - pearson correlation (checked in the covariance step) √
3. Large enough sample size 5-10 per case (Ideally we would want more, but this is the bare minimum) √
4. Data is suitable for reduction (high pearson correlation values) √
5. No significant outliers (removed from imported table) √
```{r Library the Necessary Packages}
# for correlation matrix
library(PerformanceAnalytics)
# import libraries necessary for simple PCA analysis
library(devtools)
library(ggbiplot)
```
IMPORTING THE DATA
Not very intensive, importing the data and then separating it into its two groups
```{r filename}
filename <- "/Users/studentaccount/Documents/Sakin_exp/RGB Trichrome/RGB_all (2)/TWOMBLI_Results_img.csv"
```
```{r Importing the Data}
# imports the main dataset containing p42 ctrl, 21d wnt act, 21d wnt act dpp4-/-, p63 ctrl, and 21d reversal mice
rgb <- read.csv(file=filename, header=TRUE, sep=",")
# data is then split into 2 groups, 'onset+dpp4 rescue' and 'reversal'
rgb_resc <- rgb[ rgb$Treatment=="p42 ctrl" | rgb$Treatment=="21d Wnt act" | rgb$Treatment=="21d Wnt act Dpp4-/-", ] # onset+rescue dataset
rgb_rev <- rgb[ rgb$Treatment=="p63 ctrl" | rgb$Treatment=="21d reversal", ] # reversal dataset
```
CHECK COVARIANCE/MULTICOLINEARITY BETWEEN DATA
Check covariance between variables. This is an important step because it can give us a sense of what to expect from PCA. Highly correlated vars do not need to be removed because PCA will handle this automatically. Instead, lower correlation values may be able to be removed
Greater multicolinearity allows for greater clustering in PCA. Also the number of variables should not be greater than the number of samples (general rule +/- 1 sample shouldn't make much of a difference)
We use 10 variables so the minimum number of samples should be 10.
```{r Check Covariance}
# outputs the correlation matrix values, but is difficult to read
cor(rgb[,c(2:8,19,20,22)], method="pearson")
# visualize correlation matrix between variables
rel_vars <- rgb[, c(2:8,19,20,22)]
chart.Correlation(rel_vars, histogram=TRUE, method="pearson")
```
PCA SETUP AND BIPLOT GENERATION
First set up PCA Analysis with all variables (excluding curvature. curvature should be looked at separately in Prism). Then graph in a biplot
Keep in mind the variation between images vs variation between mice
```{r PCA Setup}
# create PCA variables for analysis
# use with relative data - rgb_resc.pca <- prcomp( rgb_resc[ ,c(2,3,6,8,19,23) ], center=TRUE, scale.=TRUE ) # onset+rescue 8,6,19,23,2
rgb_resc.pca <- prcomp( rgb_resc[ ,c(2:8,19,20,22) ], center=TRUE, scale.=TRUE )
rgb_rev.pca <- prcomp( rgb_rev[ ,c(2:8,19,20,22) ], center=TRUE, scale.=TRUE ) # reversal
# summarize and view PCA variables
summary(rgb_resc.pca) # for onset + rescue
str(rgb_resc.pca)
summary(rgb_rev.pca) # for reversal
str(rgb_rev.pca)
# graphing both onset+rescue and reversal datasets
ggbiplot(rgb_resc.pca, ellipse=TRUE, groups=sort(rgb_resc$Treatment), var.axes=FALSE, labels=rgb_resc$Image) + theme_bw() # onset+rescue
ggbiplot(rgb_rev.pca, ellipse=TRUE, groups=sort(rgb_rev$Treatment), var.axes=FALSE, labels=rgb_rev$Image) + theme_bw() # reversal
```
FILTERING PCA CHARACTERISTICS
The previous PCA biplot was generated using all the vars that TWOMBLI outputs. Now, reduction in the number of variables can be done by crossing the variables from TWOMBLI with the variables in Mascharak et. al, 2019 (Longacker Lab collagen structure measurements)
The Longacker Lab measurements are listed below alongside their equivalent from TWOMBLI
LONGACKER LAB TWOMBLI OUTPUT | LONGACKER LAB TWOMBLI OUTPUT
Area - Area | Number of Fibers - Endpoints
Major Axis Length - NA | Fiber Length - Total Length
Minor Axis Length - NA | Fiber Width - Fiber Thickness
Eccentricity - NA | Fiber Persistence - NOT SURE
Convex Area - NA | Angle Randomness - Curvature?
Circularity - NA | Number of Branchpoints - Branchpoints
Filled Area - NA | NA - Lacunarity
Euler Number - NA | NA - Hyphal Growth Units
Equiv Diameter - NA | NA - Fractal Dimension
Solidity - NA | NA - Alignment
Extent - NA |
Perimeter - NA |
Mean Intensity - High Density Matrix |
Max Intensity - High Density Matrix |
Min Intensity - High Density Matrix |
Min Feret Diameter - NA |
Max Feret Diameter - NA |
Min Feret Angle - NA |
Max Feret Angle - NA |
To exclude variables, need to check 2 boxes
1. They are not included in Longacker Lab analysis
2. They have lower correlation values with other variables in the correlation matrix (< 0.5)
Vars with low multicolinearity from TWOMBLI (√ denotes absence from Longacker Lab model)
Alignment √
HGU √
Both can be removed from thick skin model
```{r Filtering PCA Characteristics}
# filtering characteristics for onset+rescue data
# Alignment and HGU contain low levels of multicolinearity and they are not included in the Longacker model, so they can be removed
rgb_resc_v2.pca <- prcomp( rgb_resc[ ,c(2:5,7,8,19,22) ], center=TRUE, scale.=TRUE ) # v2 w/o alignment and HGU
ggbiplot( rgb_resc_v2.pca, ellipse=TRUE, groups=rgb_resc$Treatment, var.axes=FALSE ) + theme_bw()
rgb_rev_v2.pca <- prcomp( rgb_rev[ ,c(2:5,7,8,19,22) ], center=TRUE, scale.=TRUE ) # v2 w/o alignment and HGU
ggbiplot( rgb_rev_v2.pca, ellipse=TRUE, groups=rgb_rev$Treatment, var.axes=FALSE ) + theme_bw()
```
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Cmd+Option+I*.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Cmd+Shift+K* to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.