sRNA Expression Atlas (SEA) is a web application that allows for the search of known and novel small RNAs across ten organisms using standardized search terms and ontologies. SEA contains re-analyzed sRNA expression information for over 4200 published samples, including many disease datasets and over 769 novel, high quality predicted miRNAs. In addition SEA also stores sRNA differential expression, sRNA based classification, pathogenic sRNA signatures from bacteria and viruses and pathogen differential expression. Furthermore, SEA contains gene targets and diseases associated with a miRNA. These 4235 samples are systematically annotated with metadata. For instance, biological metadata includes standardized information about the organism, cell line, cell type, tissue type, potential diseases and more. Additional annotations include experimental details about instrument models and library strategies. The raw data from all datasets was analysed with the Oasis 2 pipelines to achieve comparable small RNA expression across many studies. In summary, SEA supports interactive result visualization on all levels, from querying and displaying of sRNA expression information to the mapping and quality information for each sample.
Small RNAs Small RNAs (sRNAs) are a class of short, non-coding RNAs with important biological functions in nearly all aspects of organismal development in health and disease. sRNAs are the type of ncRNAs whose length is less than 200 nucleotides (nt). Based on their biogenesis and biological functions major types of sRNAs include: micro-RNA (miRNA), PIWI-interacting RNAs (piRNAs), small interfering RNA (siRNAs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). MicroRNAs MicroRNAs (miRNAs) are around 22 nt in length and play an important role in gene regulation by targeting messenger RNA (mRNAs) for cleavage or translational repression. miRNAs are the most abundant class of sRNAs and they effect the regulation of many protein-coding. PIWI-interacting RNAs PIWI-interacting RNAs (piRNAs) are small noncoding RNAs that function as guardians of the genome. piRNAs protect the genome from the invasive transposable elements (DNA sequences in the genome, which can change their position) in the germline. PiRNAs, are around 24-32 nt long, are mostly expressed in the germline. They bind to the PIWI proteins which play a major roles in the maintenance of the genome stability in the germline cells. Small nucleolar RNAs Small nucleolar RNA (snoRNA) is a class of sRNAs that are responsible for the post-transcriptional modification of ribosomal RNAs (rRNAs). They are usually 60-150 nt long. They are a part of the small nucleolar ribonucleoproteins (snoRNPs), protein complexes that plays role in the pseudouridylation, and also in the sequence- specific 2’-O-methylation of the ribosomal RNA (rRNA). Small interfering RNA Small interfering RNA (siRNAs) are around 20-25 nt double-stranded RNA molecules that can target mRNAs based on perfect complementarity. The RNA interference (RNAi) silencing complex uses the antisense strand of the siRNA for mRNA cleavage and hence promoting mRNA degradation. Small nuclear RNAs Small nuclear RNAs (snRNAs) are mostly found in eukaryotic cells and are also called as U-RNA. They are known to have an important role in the splicing of introns from primary genomic transcripts. The average length of snRNA is around 150 nt. Each snRNA has an association with a set of proteins called as ribonucleoproteins. The complex of snRNA and ribonucleoproteins is called as small nuclear ribonucleoproteins (snRNP or snurps). Prominent components of these snRNA complexes are spliceosomal RNA such as U1, U2, U4, U5 and U6, that plays a major role in the maturation of the eukaryotic precursor messenger RNA.
SEA can be searched for:
All searches for experiments or pathogen(s)/sRNA(s) start by using the search bar on the top of SEA web pages. Search suggestions are popping up when typing in the search bar. The figure shows what it looks like when "Human herpesvirus 3" is entered into the search bar. Be aware that this autocomplete functionality restricts pasting to the search bar to one term at a time. SEA search will show suggestions for all different categories. Overall, the following search categories are supported: sRNA ID, Pathogen, Organism, Cell type, Tissue, Cell line, Disease, Dataset
Please note that SEA only allows searches based on suggested terms. And all terms that are suggested (while typing) are guaranteed to be in the database. If you are looking for a specific disease/tissue/organism/etc. and no matching terms are suggested, then there is no dataset with your criteria in the database.
When using several search terms, datasets are found according to the following rules:
Each experiment in the SEA database is annotated with terms that come from ontologies. In simple words, an ontology is a list of relationships between words. For example, if we take the words human and mammal, we can say that a human is a mammal. And not only humans are mammals, but mice, dogs, dolphins and pigs are mammals too. But it does not end there. All mammals are also vertebrates. And all vertebrates are chordates. Ontologies are not only restricted to organisms. Many more ontologies have been defined by independent organisations. When you use the search in SEA, all datasets will be found that match the search term but also all subterms as they are defined in the ontologies. For example, if you search for neurodegenerative disease, you will get search results from Alzheimer's and Huntington's disease. If you search for murinae you will get datasets from mice as well as from rats. This way you can be as broad or as specific with your search as you wish.
Working with SEA, you will be most likely in one of the following situations:
Violin plots are shown for the expression of hsa-mir-xxx across all datasets. The figure shows hsa-mir-xxx has expression in 81 datasets. Each vioplot show reads per million (RPM) values of the requested sRNA in the samples, that belongs to current selection annotation (tissue in this case). For example the top vioplot muscle-GSE66334 shows the expression of 12 samples from the dataset. Zero expression values are excluded in the plots. The violins can be subset by organism and sorted by different metrices such as max RPM, median RPM, mean RPM, number of samples, name and dataset id. Hovering on the violins summarizes the information in a tooltip. Clicking on a violin will show an overview of all the samples with the expression of the queried sRNA in the dataset along with meta information such as tissue, cell-type, cell line, disease and more annotations. Moreover raw sequencing data analysis output from Oasis is also shown in the detailed view. In case no ontological term such tissue, cell-type, cell line or disease is queried with sRNA, tissue annotation is shown by default for vioplots and the user can change the labels from the dropdown menu. In case a dataset contains samples from different tissues (or selected annotation, respectively) it will appear several times, a violin for each set of samples will be shown. Note : The same holds true for pathogen search
If the search contains two or more entities (sRNA or pathogens), a violinplot is shown with distinct colors per sRNA or pathogen in the same panel.
This table shows all the diseases associated with a sRNA (only applicable if the sRNA was a micro RNA). It shows disease names and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like ontology lookup service (OLS) and pubmed respectively. Tabs are shown for each miRNA in case of multiple miRNAs.
This table only shows when searching for a disease. It shows disease names, miRNA IDs and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like ontology lookup service (OLS) and pubmed respectively. Tabs are shown for each disease that was searched. Since disease terms are connected to ontologies, all subtypes of the searched disease are presented in the table. (for example carcinoma, breast cancer and others when searching for the term cancer) This table has a maximal size of 50 entries (disease - miRNA combinations), so some associations might be missing. In order to get more results, a more specific term (e.g. breast cancer instead of cancer) should be searched.
Presented with the search results, there are several options to do some further analysis.
Overlap differential expression or classification results from different datasets The tables for differential expression results and comparison results provide a checkbox per experiment. By ticking these checkboxes, the experiment is selected for the SEA overlap analysis. The SEA overlap analysis, visualizes the overlap of the top differentially expressed pathogens or sRNAs between several differential expression analyses and respectively for the top features in sRNA classification. It is started by clicking the corresponding button underneath the tables (for example "sRNA DE Overlap"). User datasets can be compared with public datasets, but classification datasets cannot be compared with Differential Expression Compraisons. The following figure shows an example analysis for four different sets of sRNA being the features for classification of different samples. The top line still shows the sRNA and tissue that was searched, but in this page all sRNA are shown that play role in the selected experiments. The filter section provides fields to adjust the shown results. The minimum AUC of the used model and minimum mean decreased gini can be set and only the sRNA that have these values in a classification analysis are considered for the overlap. In case of differential expression you can filter for minimum p-adjusted value, minimum (absolute) log 2 fold change and direction of regulation. The first table on the page gives an overview of the selected experiments and has the "Set Name" in the first column which identifies the experiment in the other parts of the page. The table on the right shows all entities (in this case sRNA) that play role in at least one of the experiments according to the filter. The second column shows in which sets they play role. The filter functionality of this table might be useful when looking for a specific sRNA or experiment. The plot that is shown on this page is an upset-plot. On the x-axis all possible combinations of the four different sets are displayed. It is also encoded/visualized by the filled circles on the bottom part of the plot. The blue bar indicates the number of sRNA that are present in all experiments belonging to this intersection. Clicking on one of these bars results in displaying the IDs of the sRNAs that are present in all the exeriments of this intersection. Each sRNA is shown exactly once in this plot which means that the first 4 bars represent the sRNA that are unique to those experiments. The black bars on the left are not clickable. They are representing the sum of sRNA that play a role in the corresponding experiment.
Comparison details and resubmission to Oasis 2
Appendix Small RNA identifiers SEA uses standard small RNA identifiers for the search. The user should keep in mind that different types of small RNAs have different conventions when it comes to identifiers. For instance, microRNA IDs usually start with the species code that they are derived from. For example, a human microRNA usually starts with hsa-. The situation is similar to Piwi-interacting RNAs (piRNA). But instead of a dash the identifiers use an underscore: hsa_ Small nucleolar RNAs (snoRNAs) IDs tend to start with SNO and ribsomal RNA (rRNA) IDs usually start with a small r.