sRNA Expression Atlas (SEA) is a web application that allows for the search of known and novel small RNAs across ten organisms using standardized search terms and ontologies. SEA contains re-analyzed sRNA expression information for over 4,200 published samples, including many disease datasets and over 769 novel, high quality predicted miRNAs. In addition, SEA also stores sRNA differential expression, sRNA based classification, pathogenic sRNA signatures from bacteria and viruses, and pathogen differential expression. Furthermore, SEA contains gene targets and diseases associated with a miRNA. These 4,235 samples are systematically annotated with metadata such as standardized information about the organism, cell line, cell type, tissue type, potential diseases and more. Additional annotations include experimental details about instrument models and library strategies. The raw data from all datasets were analysed with the Oasis 2 pipelines to achieve comparable small RNA expression across many studies. In summary, SEA supports interactive result visualization on all levels, from the querying and displaying of sRNA expression information to the mapping and quality information for each sample.
Small RNAs Small RNAs (sRNAs) are a class of short, non-coding RNAs with important biological functions in nearly all aspects of organismal development regarding health and disease. sRNAs are the type of ncRNAs with length less than 200 nucleotides (nt). Based on their biogenesis and biological functions the major types of sRNAs include: micro-RNA (miRNA), PIWI-interacting RNAs (piRNAs), small interfering RNA (siRNAs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). MicroRNAs MicroRNAs (miRNAs) are around 22 nt in length and play an important role in gene regulation by targeting messenger RNA (mRNA) for cleavage or translational repression. miRNAs are the most abundant class of sRNAs and they affect the regulation of many protein-coding genes. PIWI-interacting RNAs PIWI-interacting RNAs (piRNAs) are small (around 24-32 nt long) noncoding RNAs mostly expressed in the germline. PIWI proteins protect and maintain genomic stability in germline cells by binding to invasive transposable elements (transposons). Small nucleolar RNAs Small nucleolar RNA (snoRNA) is a class of sRNA, usually 60-150 nt long, responsible for the post-transcriptional modification of ribosomal RNAs (rRNAs). snoRNAs are a part of the small nucleolar ribonucleoproteins (snoRNPs), protein complexes which play a role in the pseudouridylation, and also in the sequence-specific 2’-O-methylation of the rRNA. Small interfering RNA Small interfering RNA (siRNAs) are around 20-25 nt double-stranded RNA molecules that can target mRNAs based on perfect complementarity. The RNA interference (RNAi) silencing complex uses the antisense strand of the siRNA for mRNA cleavage and hence promotes mRNA degradation. Small nuclear RNAs Small nuclear RNAs (snRNAs) are mostly found in eukaryotic cells and are also called as U-RNA. snRNAs are known to have an important role in the splicing of introns from primary genomic transcripts. The average length of snRNA is around 150 nt. Each snRNA has an association with a set of proteins called as ribonucleoproteins. The complex of snRNA and ribonucleoproteins is refered to as small nuclear ribonucleoproteins (snRNP or snurps). Prominent components of these snRNA complexes are spliceosomal RNA such as U1, U2, U4, U5 and U6, that plays a major role in the maturation of the eukaryotic precursor messenger RNA.
SEA can be searched for:
All searches for experiments or pathogen(s)/sRNA(s) start by using the search bar on the top of SEA web pages. Search suggestions will autocomplete when typing in the search bar. The adjacent figure (Fig 2) shows what it looks like when the partial entry "human her" is entered into the search bar. Note that this autocomplete functionality restricts pasting to the search bar to one term at a time. SEA's search bar will show suggestions across all categories. Currently, the following search categories are supported: sRNA ID, Pathogen, Organism, Cell type, Tissue, Cell line, Disease, and Dataset
Please note that SEA only allows searches based on suggested terms. And all terms that are suggested (while typing) are guaranteed to be in the database. If you are looking for a specific disease/tissue/organism/etc. and no matching terms are suggested, then there is no dataset with your criteria in the database.
When using several search terms, datasets are found according to the following rules:
Each experiment in the SEA database is annotated with ontological terms. On can think of an ontology as a list of relationships between words. For example, if we take the words "human" and "mammal", we can say that a human is a mammal. Not only humans are mammals, but mice, dogs, dolphins and pigs are mammals too. Taking this futher, all mammals are also vertebrates, all vertebrates are chordates, etc. Ontologies are not restricted to organisms. Many more ontologies have been defined by independent organisations. When using SEA's search all datasets will be found that match the search term, as well as all subterms as defined in these ontologies. For example, the query "neurodegenerative disease", will yield results from Alzheimer's and Huntington's disease. Likewise, the search query "murinae" will provide datasets from both mice and rats. SEA's ontological search enables flexible specificity, be it broad or fine-grain, allowing users to search as they wish.
When working with SEA, users will most likely find themselves:
Fig 3: Violin plots are shown for the expression of hsa-miR-133a-3p across datasets with muscle annotation. The violin plots show distribution of the reads per million (RPM) values for the requested sRNA in the samples, which belong to current selection annotation (tissue in this case). For example the top violin muscle-GSE66334 shows the expression of 12 samples from the dataset. Zero expression values are excluded in the plots. The violins can be subset by organism and sorted by different metrices such as max RPM, median RPM, mean RPM, number of samples, name and dataset id. Hovering on the violins summarizes the information in a tooltip. Clicking on a violin will navigates the user to an overview of all the samples with the expression of the queried sRNA in the dataset along with the corresponding metadata such as tissue, cell-type, cell line, disease and more annotations. Moreover raw sequencing data analysis output from Oasis is provided here. In the case that no ontological term such as tissue, cell-type, cell line or disease is queried with sRNA, tissue annotation is shown by default for violins and the user can change the labels from the dropdown menu. In the case a dataset contains samples from different tissues (or selected annotation, respectively) it will appear several times, i.e. a violin for each set of samples will be shown. Note: The same holds true for pathogen search.
Fig 4: If the search contains two or more entities (sRNA or pathogens), a violinplot is shown with distinct colors per sRNA or pathogen in the same panel.
The disease association table shows all the diseases associated with a sRNA (only applicable if the sRNA is a micro RNA). The table shows disease names and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like the ontology lookup service (OLS) and pubmed. Tabs are shown for each miRNA in case of multiple miRNAs.
This table only appears when searching with a disease ontology term. The table shows disease names, miRNA IDs and associations (publications from where this relation was obtained). All of these values are clickable links corresponding to their sources like the ontology lookup service (OLS) and pubmed. Tabs are shown for each disease that was searched. Since disease terms are connected to ontologies, all subtypes of the searched disease are presented in the table. For example carcinoma, breast cancer and others when searching for the term "cancer". This table has a maximal size of 50 entries (disease - miRNA combinations), therefore some associations might be missing. In order to get more results, a more specific term (e.g. "breast cancer" instead of "cancer") should be searched.
Presented with the search results are options to do further analysis:
Overlap differential expression or classification results from different datasets The tables for differential expression results and comparison results provide a checkbox per experiment. By ticking these checkboxes, the experiment is selected for the SEA overlap analysis. The SEA overlap analysis, visualizes the overlap of the top differentially expressed pathogens or sRNAs between several differential expression analyses and respectively for the top features in sRNA classification. To do this analysis, users may click the corresponding button underneath the tables (for example "sRNA DE Overlap"). User datasets can be compared with public datasets; however, classification datasets cannot be compared with Differential Expression Compraisons. The following figure shows an example analysis for four different sets of sRNA being the features for classification of different samples.
Fig 11.The top line still shows the sRNA and the tissue that was searched, however in this page all sRNA which play role in the selected experiments are shown. The filter section provides fields to adjust the shown results. The minimum AUC of the used model and minimum mean decreased gini score can be set and only the sRNA that have these values in a classification analysis are considered for the overlap. In case of differential expression, users can filter for minimum adjusted-p value, minimum (absolute) log 2 fold change and the direction of regulation. The first table on the page gives an overview of the selected experiments and has the "Set Name" in the first column which identifies the experiment in the other parts of the page. The table on the right shows all entities (in this case sRNA) which play role in at least one of the experiments according to the filter. The second column shows in which sets these sRNAs play a role. The filter functionality of this table might be useful when looking for a specific sRNA or experiment. The plot that is shown on this page is known as UpSet plot. While it may look foreign or complex, UpSet plots are another way of visualizing venn diagrams at larger scales. There are three main components of the UpSet plot: (a) the leftward vertical black bar plot, (b) the upward horizontal blue bar plot, and (c) the indicator grid. To avoid confusion, here "vertical" bar plot refers to the fact that each bar is placed either above of below another, and the bars extend horizontally. Part (a) of the UpSet plot shows the cardinatlity (size) of the set (experiment) i.e. how many elements the set in total contains. Part (b) of the UpSet plot shows how many elements uniquely belong to the indicated region (sRNAs). Part (c) of the UpSet plot identifies the indicated region. The black circles indicate that the bar above it belongs to the corresponding sets to the left. In other words, a column with one black circle means these elements are wholely unique to that set, whereas the column with all black circles are the elements shared by all sets. Thus the indicator grid (part c shows all possible regions of venn diagram corresponding to the four different sets displayed in this example. Clicking on one of the blue bars of part (b) results in the IDs of the sRNAs that are present in all the exeriments of this region displayed. Each sRNA is shown exactly once in this plot. Note, the black bars on the left are not clickable.
Comparison details and resubmission to Oasis 2
SEA is a publically available data repository and a webserver and users can use it without an account or login. In case users want to upload and compare their own data to the data in SEA they need to create an account. Users have an option to sign in with their google account or they can register in the SEA system directly with a valid email address, choosing a username and password for their account. We have created User-DB to store their account information as well as sRNA-seq data uploaded by the users. Moreover, user-uploaded data is only accessible from the user’s account. Users have the option to include their data in SEA for 30 days. For the data protection, security, and storage space reasons, we currently do not allow users to add data permanently to SEA.
In order to keep SEA up to-date with the current small RNA sequencing data or future data that will be published to GEO, SEA automatically searchs GEO and SRA databases once every two weeks (as GEO updates its repository regularly after two weeks). When performing these automatic searches for new data, SEA will download raw fastq files, submit these files to Oasis 2, and assign enqueue these new data in the semi-automatic annotation pipeline for tissue, cell line, cell type and other meta-data available for each of the downloaded samples. Once partially annotated, manual curators ensure accuracy and correct mistakes from the automatic annotation. To facilitate these human-curated annotations, we have developed a friendly user-interface for the curators. Undoubtly, this process of semi-automatic annotation can be imporved. For example, in the case of missing annotation curators must read original articles to provide the annotations which is time-consuming (and possibly human-error prone). To overcome these limitations and keep updates of the new sequencing data more uniform, we are actively exploring the improvements to automatic annotation that can be provided by using deep learning. Preliminary results of these efforts can be found online.
If you are having questions which we could not answer in this documentation page, do not hesitate to contact us via eMail firstname.lastname@example.org.