We’ve identified putative miRNA-mRNA interactions by using miRanda to predict miRNA-mRNA binding, then using Pearson’s correlation coefficient to evaluate coexpression of putatively binding miRNA-mRNA pairs.
This code performs functional enrichment analysis to investigate the function(s) of these putative miRNA targets
Code
Rendered Code
Output files
Results
miRNA targets with significant negative correlation:
GO ID | Term | Fisher p_val | Type |
---|---|---|---|
GO:0016980 | creatinase activity | 0.0035 | Molecular Function |
GO:0004126 | cytidine deaminase activity | 0.0175 | Molecular Function |
GO:0003924 | GTPase activity | 0.0218 | Molecular Function |
GO:0004966 | galanin receptor activity | 0.0415 | Molecular Function |
GO:0000272 | polysaccharide catabolic process | 0.038 | Biological Process |
GO:0000082 | G1/S transition of mitotic cell cycle | 0.046 | Biological Process |
miRNA targets with significant positive correlation:
GO ID | Term | Fisher p_val | Type |
---|---|---|---|
GO:0003924 | GTPase activity | 0.0218 | Molecular Function |
Code
mRNA-miRNA interactions functional enrichment
Code used below was created by Jill Ashey
, modified for use with A.pulchra datasets by Kathleen Durkin
1 Format topGO files
1.1 Read in and format annotation files
# Read in Apul annotations
<- read.delim("../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-IDmapping-2024_12_12.tab")
annot_locations # Remove unneeded columns
<- annot_locations %>% dplyr::select(-X, -V13)
annot_locations # Ensure there are no duplicate rows
<- annot_locations %>% distinct()
annot_locations
head(annot_locations)
## V1 V3 Protein.names
## 1 ntLink_4:1155-1537 P35061 Histone H2A
## 2 ntLink_4:2660-3441 P84239 Histone H3
## 3 ntLink_4:4515-6830 P35061 Histone H2A
## 4 ntLink_4:7096-7859 P84239 Histone H3
## 5 ntLink_4:8474-9669 P35061 Histone H2A
## 6 ntLink_4:11162-11925 P84239 Histone H3
## Organism Gene.Ontology..biological.process.
## 1 Acropora formosa (Staghorn coral)
## 2 Urechis caupo (Innkeeper worm) (Spoonworm)
## 3 Acropora formosa (Staghorn coral)
## 4 Urechis caupo (Innkeeper worm) (Spoonworm)
## 5 Acropora formosa (Staghorn coral)
## 6 Urechis caupo (Innkeeper worm) (Spoonworm)
## Gene.Ontology.IDs
## 1 GO:0000786; GO:0003677; GO:0005634; GO:0030527; GO:0046982
## 2 GO:0000786; GO:0003677; GO:0005634; GO:0030527; GO:0046982
## 3 GO:0000786; GO:0003677; GO:0005634; GO:0030527; GO:0046982
## 4 GO:0000786; GO:0003677; GO:0005634; GO:0030527; GO:0046982
## 5 GO:0000786; GO:0003677; GO:0005634; GO:0030527; GO:0046982
## 6 GO:0000786; GO:0003677; GO:0005634; GO:0030527; GO:0046982
# Looks good!
This file shows each gene as it’s genomic location. We want to use gene IDs to associate genes, so add gene IDs to this annotation table
Read in file that associates each mRNA genomic location with corresponding gene ID
<- read.table("../output/15-Apul-annotate-UTRs/Apul-mRNA-FUNids.txt", header=FALSE, col.names=c("location", "type", "mRNA_ID", "gene_ID", "product"), sep="\t")
mRNA_FUNids
# Remove unwanted text from parent column
$gene_ID <- gsub("Parent=", "", mRNA_FUNids$gene_ID)
mRNA_FUNids# Only need to keep mRNA location and gene ID
<- mRNA_FUNids %>% dplyr::select(location, gene_ID) mRNA_FUNids
join with annotation file
# join
<- left_join(annot_locations, mRNA_FUNids, by = c("V1" = "location"))
annot
# ensure there are no duplicate rows
<- annot %>% distinct() annot
1.2 Set up gene2GO object
Want to isolate a list of GO terms per gene
<- annot %>% filter(!is.na(Gene.Ontology.IDs)) %>% dplyr::select(gene_ID, Gene.Ontology.IDs)
gene2go <- gene2go %>% dplyr::rename(GO.ID = Gene.Ontology.IDs)
gene2go
<- setNames(
gene2go_list strsplit(as.character(gene2go$GO.ID), ";"),
$gene_ID
gene2go )
Note: I think this means genes that had a Uniprot ID but no GO terms are excluded from this analysis
1.3 Define reference set
Define reference set of genes. This should be all genes found in our samples, NOT all genes in the A.pulchra genome. Some genes (e.g., reproduction pathways) may not be found/expected in our samples for valid biological reasons.
# Read in counts matrix
<- read.csv("../output/07-Apul-Hisat/Apul-gene_count_matrix.csv")
Apul_counts # Exclude genes with all 0 counts
<- Apul_counts[rowSums(Apul_counts[, 2:6]) != 0, ]
Apul_counts
# Select gene IDs of the genes present in our samples
<- Apul_counts$gene_id
all_genes length(all_genes)
## [1] 33624
So we have a reference set of 33624 genes present in our samples.
1.4 Read in PCC/miranda data
This is a table of all putative miRNA-mRNA binding predicted by miRanda, plus Pearsons correlation coefficients for coexpression of each putative binding pair.
<- read.csv("../output/09-Apul-mRNA-miRNA-interactions/miranda_PCC_miRNA_mRNA.csv")
data head(data)
## X miRNA mRNA PCC.cor p_value adjusted_p_value score energy
## 1 1 Cluster_5981 FUN_028147 0.6825537 0.2041707 0.9986496 146 -22.19
## 2 2 Cluster_15340 FUN_013332 0.6371070 0.2476393 0.9986496 158 -23.15
## 3 3 Cluster_5981 FUN_041253 -0.2250869 0.7158492 0.9986496 153 -20.50
## 4 4 Cluster_3366 FUN_010827 0.3671005 0.5433145 0.9986496 163 -22.14
## 5 5 Cluster_3367 FUN_010827 0.5369304 0.3507987 0.9986496 163 -22.14
## 6 6 Cluster_15340 FUN_003342 0.1096213 0.8607058 0.9986496 154 -20.65
## query_start_end subject_start_end total_bp_shared query_similar
## 1 2 21 185 209 21 66.67%
## 2 2 20 198 220 19 68.42%
## 3 2 21 699 719 19 73.68%
## 4 2 18 346 368 16 81.25%
## 5 2 18 346 368 16 81.25%
## 6 2 20 562 585 20 65.00%
## subject_similar
## 1 71.43%
## 2 84.21%
## 3 73.68%
## 4 93.75%
## 5 93.75%
## 6 80.00%
Set function to select genes of interest (ie those that have pvalue < 0.05)
<- function(allScore) {
topDiffGenes return(allScore < 0.05)}
2 FE of all targets, as predicted by miranda
TO DO