In Friday’s codeRclub, we had a problem which involved finding the row and column names for items in a matrix greater than a specified value (e.g. finding the names of the pairs of samples in a correlation matrix with correlation coefficient greater than 0.5). The problem is that using standard sub-setting methods you are able to find the locations/ values of the cells within the matrix, but not the row or column names. We solved the problem using an argument in the which
command in R. We wrote a function to do this, returning the row and column names and the correlation coefficients in a data.frame
.
First simulate a correlation matrix and set our correlation cut-off value:
x <- matrix(c(1,.8,.2, .8,1,.7, .2,.7,1),nrow=3, dimnames = list(c("a", "b", "c"), c("a", "b", "c"))) # Simulate the 3x3 matric and give the matrix row and column names of the samples
Then make a function, which.names.matrix
, to return the row and column names of interest. x
is a correlation matrix, cutVal
is your correlation cut-off value.
which.names.matrix <- function(x, cutVal = 0.5){ x[lower.tri(x)] <- NA # Because it's a correlation matrix, we are only interested in one half of it, so set the lower triangle to NA. diag(x) <- NA # Set the diagonals to NA locs <- which(x>cutVal,arr.ind=TRUE) # Find the locations of the cells in the matrix > than cutVal scores <- na.omit(x[x>cutVal]) # Get the scores of the cells > cutVal data.frame(row = rownames(x)[locs[,1]], col = colnames(x)[locs[,2]], value = scores) # Return the data.frame with the row and column names, plus the scores } which.names.matrix(x)