Omicseq Tutorial
1. Tutorial Video
2. What is Omicseq?

Omicseq is a web portal that serves as an omics data explorer and a search engine. It retrieves and processes various types of omics data (RNA-seq, ChIP-seq, ATAC-seq) from multiple major genomic data repositories such as ENCODE,TCGA, etc. Unlike most of the existing dataset search tools which exclusively relies on metadata, our search engine is powered by a ranking algorithm that fully utilize numerical values in the dataset. When query a gene or a pathway name, metadata will not be much helpful, but the numerical values inside the dataset about the gene or the pathway say a lot about whether the dataset is relevant to the query. The trackRank algorithm we developed, harnessing the numerical information of all the genes or pathways to come up with a rank order for each dataset. And the ones rank on top will be output to the users.

It provides two search services:

  • Gene Search

    Gene can be searched by its name or NCBI RefSeq number. Related datasets are returned and ranked.

  • Pathway Search

    Gene Pathway can be search by its name, and datasets of gene involved in the pathway are returned and ranked.

3.1 Gene Search Interface

1) Basic Interface

This figure shows the basic interface of gene search which is activated by clicking the "Gene" tab on the search bar. There are two required input parameters: 1. Gene name; 2. Database (human or mouse). The search bar can give hints on the gene name when entering the first character(s).

2) Advanced Settings

Advanced parameters can be specified by clicking the "Setting" button which includes:

  • Experiments Types: Datasets under search can be filtered by setting a filter on experiments types such as ChIP-seq TF, Dnase-seq, GWAS, etc.

  • Data Source: Datasets can also be filtered by setting a filter on datasoruces such as ArrayExpress, TCGA, GEA, etc.

3.2 An Example of Gene Search Result

This figure shows the search results on gene KLK3, the gene coded for the prostate antigen (PSA) protein. Relevant datasets are returned and ranked as rows including following fields:

  • Rank: This field shows the relevance of the gene in the dataset. The higher the rank , the more relevant the dataset is considered to the query gene.

  • DatasetID: The identifier of the dataset.

  • DataType: The experiment type on the dataset.

  • Sample: The experimental sample.

  • Tissue/status/factor: The experimental tissue.

  • Order/Total: Order of the query gene in this dataset/total number of genes have scored in this dataset.

  • Percentile: The percentile of the query gene among all genes in the genome in terms of the scores in this dataset.

  • Study: The datasource.

  • Lab: The university/institution which the contributing lab(s) belong to.

  • More Info: Metainformation and links to related information on other websites.

For this example, as we can see, vast majority of the top ranked datasets are RNA-seq data collected from prostate cancer patients in the TCGA study. In almost all these datasets, klk3 gene shows up as the highest expressed gene in the entire genome, which speaks volume of its prominence as the biomarker for prostate cancer.

Note: colors are used on Data Type and Study columns to indicate categories. Please refer to the following color table

Color Data Types
ChIP-seq
RNA-seq
CNV
Methylation
Microarray
Dnase-seq
Summary Track
Somatic Mutations
Color Studies
ENCODE
TCGA
TCGA Firebrowse
ICGC
SRA
Epigenome Roadmap
GEO
CCLE
SUMMARY
ArrayExpress
4.1 Pathway Search Interface

1) Basic Interface

This figure shows the basic interface of pathway search which is activated by clicking the "Pathway" button on the search bar. Datasets related to a pathway of interest can be searched by pathway name on hg19 database. The search bar also gives hints on pathway names given first character(s).

2) Advanced Settings

Advanced parameters can be specified by clicking the "Setting" button which includes:

  • Experiments Types: Datasets under search can be filtered by setting a filter on experiments types such as ChIP-seq TF, Dnase-seq, GWAS, etc.

  • Data Source: Datasets can also be filtered by setting a filter on datasoruces such as ArrayExpress, TCGA, GEA, etc.

4.2 An Example of Pathway Search Result

This figure shows the search results on apoptotic program. Relevant datasets are returned and ranked as rows including following fields:

  • Rank: This field shows the relevance of the pathway in the dataset. The higher the rank , the more relevant the dataset is considered to the query pathway.

  • DatasetID: The identifier of the dataset.

  • DataType: The experiment type on the dataset.

  • Sample: The experimental sample.

  • Tissue/status/factor: The experimental tissue.

  • Average: Average score of all the genes in this pathway.

  • Cumulative: Total scores of all the genes in this pathway.

  • Percentile: Percentile of this pathway among all pathways considered in terms of the cumulative scores.

  • Study: The datasource.

  • Lab: The university/institution which the contributing lab(s) belong to.

  • More Info: Metainformation and links to related information on other websites.

Note: colors are used on Data Type and Study columns to indicate categories. Please refer to the following color table

Color Data Types
ChIP-seq
RNA-seq
CNV
Methylation
Microarray
Dnase-seq
Summary Track
Somatic Mutations
Color Studies
ENCODE
TCGA
ICGC
SRA
Epigenome Roadmap
GEO
CCLE
-->