One functionality I want when using the EN model is an ability to evaluate the relative-contributions of each epigenetic layer (miRNA, lncRNA, and DNA methylation). However, when all 3 are combined into a single predictor set, I’m concerned that my ability to parse their relative contributions (e.g.,by examining the “type” of stably selected predictors) is unduly influenced by dimensional differences (e.g., having ~15,000 lncRNA predictors v. ~50 miRNAs).
One option to supplement layer comparisons is to have the model try to predict expression from single layers, e.g., from lncRNAs alone. However, the way the EN script is structured, passing in any empty data (e.g., passing in empty files for the miRNA and methyl fields to use lncRNA as a predictor), breaks the code. For example, the code would try to variance-stabilize the empty methylation set, which would throw an error.
This should be a pretty simple fix, though, since the only real “break” arises during the data preprocessing, before the three layers are combined into a single predictor set.
Basically, I added an additional option to the script to specify layers (if desired), then added if statements throughout the data preprocessing, so that an input df was only handled if included. To avoid a major script restructuring, the command will still need to take in all 3 predictor files as input, but only the specified layer will be used (if the specification option is included)