t-stat for feature selection

If you don’t want to worry about speed (and with 155 columns you probably don’t care) you can use the t.test function and apply it to every column.

Simulate some data first

set.seed(1)
DF <- data.frame(y=rep(1:2, 50), x1=rnorm(100), x2=rnorm(100), x3=rnorm(100))
head(DF)

  y         x1          x2         x3
1 1 -0.6264538 -0.62036668  0.4094018
2 2  0.1836433  0.04211587  1.6888733
3 1 -0.8356286 -0.91092165  1.5865884
4 2  1.5952808  0.15802877 -0.3309078
5 1  0.3295078 -0.65458464 -2.2852355
6 2 -0.8204684  1.76728727  2.4976616

Then we can apply the t.test function to all but the first column using the formula argument.

group <- DF$y
lapply(DF[,-1], function(x) { t.test(x ~ group)$statistic })

which returns the test statistic for each column.

t.test computes a lot of extra information that you don’t need so you can speed this up substantially by doing the computations directly, but it really isn’t necessary here

R Hex to RGB converter
Error in file(file, “rt”) : cannot open the connection [duplicate]
Error in : object of type ‘closure’ is not subsettable
Could not find function “%<>%” with dplyr loaded
How to coerce a list object to type ‘double’
R Error in x$ed : $ operator is invalid for atomic vectors
Plotting with ggplot2: “Error: Discrete value supplied to continuous scale” on categorical y-axis
ggplot2 error : Discrete value supplied to continuous scale
ggplot2 line chart gives “geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?”
ggplot2 line chart gives “geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?”
How to open CSV file in R when R says “no such file or directory”?
Error: could not find function “%>%”
R programming: How do I get Euler’s number?
What does na.rm=TRUE actually means?
Counting the number of elements with the values of x in a vector
What are the “standard unambiguous date” formats for string-to-date conversion in R?
‘x’ and ‘y’ lengths differ ERROR when plotting
What are the “standard unambiguous date” formats for string-to-date conversion in R?
Remove duplicated rows
Error in Confusion Matrix : the data and reference factors must have the same number of levels
Error in plot.window(…) : need finite ‘xlim’ values
Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric, What should I do in this situation?
Principal Components Analysis:Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric
Error in lm.fit(x,y,offset = offset, singular.ok,…) 0 non-NA cases with boxcox formula
Extract year from date
Persistent invalid graphics state error when using ggplot2
incorrect number of dimensions and incorrect number of subscripts in array
What does %*% mean in R [duplicate]
Why am I getting “algorithm did not converge” and “fitted prob numerically 0 or 1” warnings with glm?
Add text to ggplot
How to debug “contrasts can be applied only to factors with 2 or more levels” error?
R t-test Grouping factor must have exactly 2 levels error
What does c do in R?
Opposite of %in%: exclude rows with values specified in a vector
Understanding the order() function
Using Caret Package but Getting Error in library(e1071)
plot.new has not been called yet
Subscript out of bounds – general definition and solution?
Error in lis[[i]] : attempt to select less than one element
How to compute weighted mean in R?
Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code
Remove NA values from a vector
character string is not in a standard unambiguous format
Remove rows with all or some NAs (missing values) in data.frame
duplicate ‘row.names’ are not allowed error
Remove all of x axis labels in ggplot
Sum rows in data.frame or matrix
Error: unexpected ‘}’ in “}” in R [duplicate]
Error in table “all arguments must have the same length”
How to not run an example using roxygen2?
Error with apply function
`fread` with headers with special characters (latin1) and unusual nested quotes
How do you create vectors with specific intervals in R?
Linear model function lm() error: NA/NaN/Inf in foreign function call (arg 1)
Increase number of axis ticks
Converting data frame column from character to numeric
How to assign colors to categorical variables in ggplot2 that have stable mapping?
Singularity in backsolve at level 0, block 1 in LME model
“installation of package ‘FILE_PATH’ had non-zero exit status” in R
Is there a dictionary functionality in R
Plot two graphs in same plot in R
reshape2 melt warning message
ggplot2 manually specifying colour with geom_line
dcast warning: ‘Aggregation function missing: defaulting to length’
remove all variables except functions
How to update a package in R?
Run R script from command line
Non-conformable arrays error in code
object of type ‘builtin’ is not subsettable
What causes an R script to get Killed?
What’s the difference between facet_wrap() and facet_grid() in ggplot2?
ggplot2, facet_grid, free scales?
dplyr mutate with conditional values
How do I install an R package from source?
How can I convert Json to data frame in R
lmer error: grouping factor must be < number of observations
R Error (from NA’s to 0): duplicate subscripts for column in Data Frame
incorrect number of subscripts on matrix in R
Error in rbind(deparse.level, …) : numbers of columns of arguments do not match R
Creating box plot on exercise
What is the meaning of the dollar sign “$” in R function()?
How to split data into training/testing sets using sample function
How to sort a data frame by alphabetic order of a character variable in R?
Error in na.fail.default(as.ts(x)) : missing values in object in time series forecasting
Argument “No” is missing, with no default
R apply function with multiple parameters
Geometric Mean: is there a built-in?
Error in na.fail.default: missing values in object – but no missing values
Error: Invalid number of ‘breaks’ in R
Convert the values in a column into row names in an existing data frame
Deleting rows that are duplicated in one column based on the conditions of another column
Replace all 0 values to NA
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
Merge error : negative length vectors are not allowed
Error Error in storage.mode(x) <- "double" : 'list' object cannot be coerced to type 'double'
why nrow(dataframe) and length(dataframe) in r give different results?
Meaning of objects being masked by the global environment
Subset and ggplot2
Create a matrix of scatterplots (pairs() equivalent) in ggplot2
cbind warnings : row names were found from a short variable and have been discarded

Related Posts:

Leave a Comment Cancel reply