Product & Service   Integrated Marketing & Services   Our teams  Contact US   

Ultra Long-Read Genome De Novo Sequencing and Assembly
Animal, Bacterial, Genome, Gene component, GC-Depth, GC-Content, Coverage Depth, Annotation, Prediction, Synteny, Evolution, Pathway, .....

Sample sequence showing how a sequence assembler would take fragments and match by overlaps. Image also shows the potential problem of repeats in the sequence.
Genome Assembly (Sequence Assembly)
In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 30000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).

De-novo vs. mapping assembly
In sequence assembly, two different types can be distinguished:

  1. de-novo: assembling short reads to create full-length (sometimes novel) sequences, without using a template (see de novo sequence assemblers, de novo transcriptome assembly)
  2. mapping: assembling reads against an existing backbone sequence, building a sequence that is similar but not necessarily identical to the backbone sequence
In terms of complexity and time requirements, de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm needs to compare every read with every other read (an operation that has a naive time complexity of O(n2).

GC-Depth, GC-Content Distribution

In molecular biology and genetics, GC-content (or guanine-cytosine content) is the percentage of nitrogenous baseson a DNA or RNA molecule that are either guanine or cytosine (from a possibility of four different ones, also including adenine and thymine in DNA and adenine and uracil in RNA).[1] This may refer to a certain fragment of DNA or RNA, or that of the whole genome. When it refers to a fragment of the genetic material, it may denote the GC-content of section of a gene (domain), single gene, group of genes (or gene clusters), or even a non-coding region.
Genomic content
Within-genome variation
The GC ratio within a genome is found to be markedly variable. These variations in GC ratio within the genomes of more complex organisms result in a mosaic-like formation with islet regions called isochores.
[11] This results in the variations in staining intensity in the chromosomes.[12] GC-rich isochores include in them many protein coding genes, and thus determination of ratio of these specific regions contributes in mapping gene-rich regions of the genome.[13][14]

Coding sequences

Within a long region of genomic sequence, genes are often characterised by having a higher GC-content in contrast to the background GC-content for the entire genome. Evidence of GC ratio with that of length of the coding region of a gene has shown that the length of the coding sequence is directly proportional to higher G+C content.[15] This has been pointed to the fact that the stop codon has a bias towards A and T nucleotides, and, thus, the shorter the sequence the higher the AT bias.[16]

Comparison of more than 1,000 orthologous genes in mammals showed marked within-genome variations of the third-codon position GC content, with a range from less than 30% to more than 80%.[17]

Among-genome variation

GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in selection, mutational bias, and biased recombination-associated DNA repair.[18]

Nucleotide bonds showing AT and GC pairs. Arrows point to the hydrogen bonds.

Influence of GC Content on Mean Read Depth in WGS

The average GC-content in human genomes ranges from 35% to 60% across 100-Kb fragments, with a mean of 46.1%.[17] The GC-content of Yeast (Saccharomyces cerevisiae) is 38%,[19] and that of another common model organism, thale cress (Arabidopsis thaliana), is 36%.[20] Because of the nature of the genetic code, it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. However, a species with an extremely low GC-content is Plasmodium falciparum (GC% = ~20%),[21] and it is usually common to refer to such examples as being AT-rich instead of GC-poor.[22]

Several mammalian species (e.g., shrew, microbat, tenrec, rabbit) have independently undergone a marked increase in the GC-content of their genes. These GC-content changes are correlated with species life-history traits (e.g., body mass or longevity) and genome size,[17] and might be linked to a molecular phenomenon called the GC-biased gene conversion.[23]

An overlap of the product of three sequencing runs, with the read depth at each point indicated.

Sequence depth (Coverage)
Coverage (or depth) in DNA sequencing is the number of unique reads that include a given nucleotide in the reconstructed sequence.[1][2] Deep sequencingrefers to the general concept of aiming for high number of unique reads of each region of a sequence.[3]

Even though the sequencing accuracy for each individual nucleotide is very high, the very large number of nucleotides in the genome means that if an individual genome is only sequenced once, there will be a significant number of sequencing errors. Furthermore, many positions in a genome contain rare single-nucleotide polymorphisms (SNPs). Hence to distinguish between sequencing errors and true SNPs, it is necessary to increase the sequencing accuracy even further by sequencing individual genomes a large number of times.

Ultra-deep sequencing
The term "ultra-deep" can sometimes also refer to higher coverage (>100-fold), which allows for detection of sequence variants in mixed populations.
[4][5][6] In the extreme, error-corrected sequencing approaches such as Maximum-Depth Sequencing can make it so that coverage of a given region approaches the throughput of a sequencing machine, allowing coverages of >10^8.[7]

Transcriptome sequencing
Deep sequencing of transcriptomes, also known as RNA-Seq, provides both the sequence and frequency of RNA molecules that are present at any particular time in a specific cell type, tissue or organ.
[8] Counting the number of mRNAs that are encoded by individual genes provides an indicator of protein-coding potential, a major contributor to phenotype.[9] Improving methods for RNA sequencing is an active area of research both in terms of experimental and computational methods.[10]

Ten steps to get started in Genome Assembly and Annotation

DNA (Genome) Annotation

DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated to make sense of it.[1]

For DNA annotation, a previously unknown sequence representation of genetic material is enriched with information relating genomic position to intron-exon boundaries, regulatory sequences, repeats, gene names and protein products. This annotation is stored in genomic databases such as Mouse Genome Informatics, FlyBase, and WormBase. Educational materials on some aspects of biological annotation from the 2006 Gene Ontology annotation camp and similar events are available at the Gene Ontology website.[2]

The National Center for Biomedical Ontology ( develops tools for automated annotation[3] of database records based on the textual descriptions of those records.

As a general method, dcGO [4] has an automated procedure for statistically inferring associations between ontology terms and protein domains or combinations of domains from the existing gene/protein-level annotations.

Genome annotation consists of three main steps:.[5]

  1. identifying portions of the genome that do not code for proteins
  2. identifying elements on the genome, a process called gene prediction
  3. attaching biological information to these elements

Automatic annotation tools attempt to perform these steps via computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

A simple method of gene annotation relies on homology based search tools, like BLAST, to search for homologous genes in specific databases, the resulting information is then used to annotate genes and genomes.[6] However, as information is added to the annotation platform, manual annotators become capable of deconvoluting discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl) rely on curated data sources as well as a range of different software tools in their automated genome annotation pipeline.[7]

Structural annotation consists of the identification of genomic elements.

  • ORFs and their localization
  • gene structure
  • coding regions
  • location of regulatory motifs

Functional annotation consists of attaching biological information to genomic elements.

  • biochemical function
  • biological function
  • involved regulation and interactions
  • expression

These steps may involve both biological experiments and in silico analysis. Proteogenomics based approaches utilize information from expressed proteins, often derived from mass spectrometry, to improve genomics annotations.[8]

Gene(ORF) Prediction
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

Structure of a eukaryotic gene

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain the information necessary for living cells to survive and reproduce.[1][2] In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional.[2] This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

Much of gene structure is broadly similar between eukaryotes and prokaryotes. These common elements largely result from the shared ancestry of cellular life in organisms over 2 billion years ago.[3] Key differences in gene structure between eukaryotes and prokaryotes reflect their divergent transcription and translation machinery.[4][5] Understanding gene structure is the foundation of understanding gene annotation, expression, and function.[6]

The structure of a eukaryotic protein-coding gene. Regulatory sequence controls when and where expression occurs for the protein coding region (red). Promoter and enhancer regions (yellow) regulate the transcription of the gene into a pre-mRNA which is modified to remove introns (light grey) and add a 5' cap and poly-A tail (dark grey). The mRNA 5' and 3' untranslated regions (blue) regulate translation into the final protein product.[17]

Empirical methods
In empirical (similarity, homology or evidence-based) gene finding systems, the target genome is searched for sequences that are similar to extrinsic evidence in the form of the known expressed sequence tags, messenger RNA (mRNA), protein products, and homologous or orthologous sequences. Given an mRNA sequence, it is trivial to derive a unique genomic DNA sequence from which it had to have been transcribed. Given a protein sequence, a family of possible coding DNA sequences can be derived by reverse translation of the genetic code. Once candidate DNA sequences have been determined, it is a relatively straightforward algorithmic problem to efficiently search a target genome for matches, complete or partial, and exact or inexact. Given a sequence, local alignment algorithms such as BLAST, FASTA and Smith-Waterman look for regions of similarity between the target sequence and possible candidate matches. Matches can be complete or partial, and exact or inexact. The success of this approach is limited by the contents and accuracy of the sequence database.
Ab initio methods
Ab Initio gene prediction is an intrinsic method based on gene content and signal detection. Because of the inherent expense and difficulty in obtaining extrinsic evidence for many genes, it is also necessary to resort to ab initio gene finding, in which the genomic DNA sequence alone is systematically searched for certain tell-tale signs of protein-coding genes. These signs can be broadly categorized as either signals, specific sequences that indicate the presence of a gene nearby, or content, statistical properties of the protein-coding sequence itself. Ab initio gene finding might be more accurately characterized as gene prediction, since extrinsic evidence is generally required to conclusively establish that a putative gene is functional.

Searching for a new ORFs along a stretch of the chromosome

Searching for genes in a particular pathway

Searching for components of a particular cellular structure

Gene Function Annotations

Functional annotation consists of attaching biological information to genomic elements.

  • biochemical function
  • biological function
  • involved regulation and interactions
  • expression

These steps may involve both biological experiments and in silico analysis. Proteogenomics based approaches utilize information from expressed proteins, often derived from mass spectrometry, to improve genomics annotations.[8]

A variety of software tools have been developed to permit scientists to view and share genome annotations;Example maker

Genome annotation remains a major challenge for scientists investigating the human genome, now that the genome sequences of more than a thousand human individuals (The 100,000 Genomes Project,UK)and several model organisms are largely complete.[9][10] Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism.[6] Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".[11]

Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:

At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot that harvests gene data from research databases and creates gene stubs on that basis.[12]

Non-Coding RNA Annotation

A non-coding RNA (ncRNA) is an RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.

The number of non-coding RNAs within the human genome is unknown; however, recent transcriptomic and bioinformatic studies suggest that there are thousands of them.[1][2][3][4][5][6] Many of the newly identified ncRNAs have not been validated for their function.[7] It is also likely that many ncRNAs are non functional (sometimes referred to as junk RNA), and are the product of spurious transcription.[8][9]

Non-coding RNAs contribute to diseases including cancer and Alzheimer's.

The roles of non-coding RNAs in the central dogma of molecular biology: Ribonucleoproteins are shown in red, non-coding RNAs in blue

Synteny between human and mouse chromosomes. Colors indicate homologous regions. For instance, sequences homologous to mouse chromosome 1 are primarily on human chromosomes 1 and 2, but also 6,8, and 18. The X chromosome is almost completely syntenic in both species.[1]
Synteny (Structural Variation)
In classical genetics, Synteny describes the physical co-localization of genetic loci on the same chromosome within an individual or species.
Today, however, biologists usually refer to synteny as the conservation of blocks of order within two sets of chromosomes that are being compared with each other. This concept can also be referred to as shared synteny

Structural variation detection using next-generation sequencing data: A comparative technical review
Structural Variation(also genomic structural variation) is the variation in structure of an organism's chromosome. It consists of many kinds of variation in the genome of one species, and usually includes microscopic and submicroscopic types, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length about 1Kb to 3Mb, which is larger than SNPs and smaller than chromosome abnormality (though the definitions have some overlap).[1] However, the operational range of structural variants has widened to include events >50bp[2]. The definition of structural variation does not imply anything about frequency or phenotypical effects. Many structural variants are associated with genetic diseases, however many are not.[3][4] Recent research about SVs indicates that SVs are more difficult to detect than SNPs. Approximately 13% of the human genome are defined as structurally variant in the normal population, and there are at least 240 genes that exist as homozygous deletion polymorphisms in human populations, suggesting these genes are dispensable in humans.[4] Rapidly accumulating evidence indicates that structural variations can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.

Species Evolution (Phylogenetic Tree)

Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within lineages.[1][2][3] Charles Darwin was the first to describe the role of natural selection in speciation in his 1859 book The Origin of Species.[4] He also identified sexual selection as a likely mechanism, but found it problematic.

There are four geographic modes of speciation in nature, based on the extent to which speciating populations are isolated from one another: allopatric, peripatric, parapatric, and sympatric. Speciation may also be induced artificially, through animal husbandry, agriculture, or laboratory experiments. Whether genetic drift is a minor or major contributor to speciation is the subject matter of much ongoing discussion.
Rapid sympatric speciation can take place through polyploidy, such as by doubling of chromosome number; the result is progeny which are immediately reproductively isolated from the parent population. New species can also be created through hybridisation followed, if the hybrid is favoured by natural selection, by reproductive isolation.

Structure of a gene regulatory network

Control process of a gene regulatory network
Gene Regulatory Network (Genetic Pathway)

A gene (or genetic) regulatory network (GRN) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression levels of mRNA and proteins. These play a central role in morphogenesis, the creation of body structures, which in turn is central to evolutionary developmental biology (evo-devo).

The regulator can be DNA, RNA, protein and complexes of these. The interaction can be direct or indirect (through transcribed RNA or translated protein). In general, each mRNA molecule goes on to make a specific protein (or set of proteins). In some cases this protein will be structural, and will accumulate at the cell membrane or within the cell to give it particular structural properties. In other cases the protein will be an enzyme, i.e., a micro-machine that catalyses a certain reaction, such as the breakdown of a food source or toxin. Some proteins though serve only to activate other genes, and these are the transcription factors that are the main players in regulatory networks or cascades. By binding to the promoter region at the start of other genes they turn them on, initiating the production of another protein, and so on. Some transcription factors are inhibitory. .[1]

Bacterial regulatory networks

Regulatory networks allow bacteria to adapt to almost every environmental niche on earth.[24][25] A network of interactions among diverse types of molecules including DNA, RNA, proteins and metabolites, is utilised by the bacteria to achieve regulation of gene expression. In bacteria, the principal function of regulatory networks is to control the response to environmental changes, for example nutritional status and environmental stress.[26] A complex organization of networks permits the microorganism to coordinate and integrate multiple environmental signals.[24]

Structure and evolution

Citric acid cycle with aconitate 2

Overview of the Calvin Cycle pathway
Pathways databases

  • KEGG Pathway database is a popular pathway search database highly used by biologists.
  • WikiPathways is a community curated pathway database using the "wiki" concept. All pathways have an open license and can be freely used.
  • Reactome is a free and manually curated online database of biological pathways.
  • NCI-Nature_Pathway_Interaction_Database is a free biomedical database of human cellular signaling pathways (new official name: NCI Nature Pathway Interaction Database: Pathway, synonym: PID).
  • PhosphoSitePlus is a database of observed post-translational modifications in human and mouse proteins; an online systems biology resource providing comprehensive information and tools for the study of protein post-translational modifications (PTMs) including phosphorylation, ubiquitination, acetylation and methylation.
  • BioCyc_database_collection is an assortment of organism specific Pathway/Genome Databases.
  • Human_Protein_Reference_Database is a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome (the last release was #9 in 2010).
  • PANTHER (Protein ANalysis THrough Evolutionary Relationships) is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products.
  • TRANSFAC (TRANScription FACtor database) is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles (provided by geneXplain GmbH).
  • MiRTarBase is a curated database of MicroRNA-Target Interactions.
  • DrugBank is a comprehensive, high-quality, freely accessible, online database containing information on drugs and drug targets.
  • esyN is a network viewer and builder that allows to import pathways from the biomodels database or from biogrid, flybase pombase and see what drugs interact with the proteins in your network.
  • Comparative_Toxicogenomics_Database (CTD) is a public website and research tool that curates scientific data describing relationships between chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, GO annotations, pathways, and interaction modules; CTD illuminates how environmental chemicals affect human health.
  • Pathway_commons is a project and database that uses BioPAX language to convert, integrate and query other biological pathway and interaction databases.

Nanopore Sequencing

Nanopore sequencing is a third generation[1] approach used in the sequencing of biopolymers- specifically, polynucleotides in the form of DNA or RNA.

Using nanopore sequencing, a single molecule of DNA or RNA can be sequenced without the need for PCR amplification or chemical labeling of the sample. At least one of these aforementioned steps is necessary in the procedure of any previously developed sequencing approach. Nanopore sequencing has the potential to offer relatively low-cost genotyping, high mobility for testing, and rapid processing of samples with the ability to display results in real-time. Publications on the method outline its use in rapid identification of viral pathogens,[2] monitoring ebola,[3] environmental monitoring,[4] food safety monitoring, human genome sequencing,[5] plant genome sequencing,[6] monitoring of antibiotic resistance,[7] haplotyping[8] and other applications.

Principles for detection and base identification
Nanopore sequencing uses electrophoresis to transport an unknown sample through an orifice of 10−9 meters in diameter. A nanopore system always contains an electrolytic solution- when a constant electric field is applied, an electric current can be observed in the system. The magnitude of the electric current density across a nanopore surface depends on the nanopore's dimensions and the composition of DNA or RNA that is occupying the nanopore. Sequencing is made possible because, when close enough to nanopores, samples cause characteristic changes in electric current density across nanopore surfaces. The total charge flowing through a nanopore channel is equal to the surface integral of electric current density flux across the nanopore unit normal surfaces between times t1 and t2.

On the left is a drawing of the complex formed between alpha-hemolysin and dsDNA with linkage through an oligomer. On the right, movement of this complex in relation to a nanopore channel is shown sequentially in two steps (I) and (II). Once the complex is inserted into the nanopore, the alpha-hemolysin protein will be functional in the newly formed hybrid, biological and solid state, nanopore system.
  • Biological
    Biological nanopore sequencing relies on the use of transmembrane proteins, called porins, that are embedded in lipid membranes so as to create size dependent porous surfaces- with nanometer scale "holes" distributed across the membranes. Sufficiently low translocation velocity can be attained through the incorporation of various proteins that facilitate the movement of DNA or RNA through the pores of the lipid membranes.[9]
    • Alpha hemolysin
    • Alpha hemolysin (αHL), a nanopore from bacteria that causes lysis of red blood cells, has been studied for over 15 years.[10] To this point, studies have shown that all four basescan be identified using ionic current measured across the αHL pore.[11][12] The structure of αHL is advantageous to identify specific bases moving through the pore. The αHL pore is ~10 nm long, with two distinct 5 nm sections. The upper section consists of a larger, vestibule-like structure and the lower section consists of three possible recognition sites (R1, R2, R3), and is able to discriminate between each base.[11][12]
    • ..... A recent study has pointed to the ability of αHL to detect nucleotides at two separate sites in the lower half of the pore.[15] The R1 and R2 sites enable each base to be monitored twice as it moves through the pore, creating 16 different measurable ionic current values instead of 4. This method improves upon the single read through the nanopore by doubling the sites that the sequence is read per nanopore.
      Mycobacterium smegmatis porin A (MspA) is the second biological nanopore currently being investigated for DNA sequencing. The MspA pore has been identified as a potential improvement over αHL due to a more favorable structure.[16] The pore is described as a goblet with a thick rim and a diameter of 1.2 nm at the bottom of the pore.[17] A natural MspA, while favorable for DNA sequencing because of shape and diameter, has a negative core that prohibited single stranded DNA(ssDNA) translocation. The natural nanopore was modified to improve translocation by replacing three negatively charged aspartic acids with neutral asparagines.[18]
  •  Solid state
    Solid state nanopore sequencing approaches, unlike biological nanopore sequencing, do not incorporate proteins into their systems. Instead, solid state nanopore technology uses various metal or metal alloy substrates with nanometer sized pores that allow DNA or RNA to pass through. These substrates most often serve integral roles in the sequence recognition of nucleic acids as they translocate through the channels along the substrates.[20]

Comparison between types
Comparison of Biological and Solid State Nanopore Sequencing Systems Based on Major Constraints
Biological Solid State
Low Translocation Velocity
Dimensional Reproducibility
Stress Tolerance
Ease of Fabrication

Alpha-hemolysin pore (made up of 7 identical subunits in 7 colors) and 12-mer single-stranded DNA (in white) on the same scale to illustrate DNA effects on conductance when moving through a nanopore. Below is an orthogonal view of the same molecules.

Figure showing the theoretical movement of ssDNA through a tunneling current nanopore system. Detection is made possible by the incorporation of electrodes along the nanopore channel walls- perpendicular to the ssDNA's velocity vector.

The only technology that offers :

    Direct DNA/RNA sequencing
    REAL Real-time
No capital cost required
    Ultra-long reads - up to 2 Mb
Scalable to portable or desktop
Simple & rapid, or automated, library prep
High yields for large genomes .

Applications - Offers advantages in all areas of research :

    Microbiology Environmental research
Basic genome research
Human genetics
    Cancer research
Clinical research
Plant research
Transcriptome analysis
Population-scale genomics
Animal research

Delivering a range of biological analysis techniques.....

Whole genome sequencing
    De novo assembly
    Scaffolding and finishing
    Variant analysis: structural variation
    Variant analysis: SNVs, phasing

Targeted sequencing
    Panels – amplicons, sequence capture, exome
    Variant analysis: structural variation
    Variant analysis: SNVs, phasing
    16S rRNA analysis

RNA sequencing
    Splice variant analysis
    Transcriptome / gene expression
    Fusion transcript analysis

    Real-time, unbiased analysis of mixed samples

    Histone modification
    Non-coding RNA activity

Leading with Quality, Performance and Cost
We work locally and internationally together with Our partners of outstanding academic and industry experts deidcated in Biochip testing, Genetic Sequencing, and Bioinformatics research, that is based on the core of Sample Analysis Services of De novo Sequencing, bioinformatics analysis Services, and related biotechnology R&D and appication services.
We can provide most professional services for the genetic technology applications, as well as the more helpful services in the research and development of genomics and life sciences fields! Among them, we are able to provide special assistance in R&D and application integration services at De Novo Genome Sequencing and Assembly, including but not limited to the following :

Genome De Novo Assembly
I. Animal Genome

I-I: Standard Bioinformatics Analysis

    1 Data filtering
        1.1 Filtering adapters and low quality data;
        1.2 Data production and quality control.

    2 Genome assembly
        2.1 Assembly
        2.2 Analysis of GC-Depth distribution
        2.3 Analysis of GC-Content distribution
        2.4 Analysis of sequence depth
        2.5 Evaluation of autosomal regional coverage(BAC or Fosmid sequence should be provided)
        2.6 Evaluation of gene region coverage (EST or transcriptome data should be provided)

    3 Annotation
        3.1 Repeat annotation
        3.2 Gene prediction (ORF prediction)
        3.3 Gene function Annotation
        3.4 Non-coding RNA annotation (tRNA, snoRNA, etc)

I-II: Advanced Bioinformatics Analysis
    1.1 Structural Variation (Synteny analysis)
    1.2 Species Evolution (phylogenetic tree)

II. Bacterial de novo sequencing

II-I: Bioinformatics Analysis
    1 Data Statistics
    1.1 Filtering adapters and low quality data
    1.2 Data production and quality control.

    2 Summary of assembly

    3 Genome Component
        3.1 Gene component
        3.2 Repeat Sequence
        3.3 Non-codingRNA

    4 Gene function
        4.1 Gene functional annotation (GO, KEGG, COG, etc)
        4.2 Pathway annotation

II-II: Advanced
    5 Comparative genomic analysis
        5.1 Structural Variation (Synteny analysis)
         5.2 Evolution analysis

GridIONS - X5
Now, with Nanopore technology, we provide you genome assembly data with higher quality.

• Direct DNA/RNA sequencing

• REAL Real-time Real-Time Base modification reading

• Ultra-long reads - up to 2 Mb The world's longest DNA sequencing read length 2.272,580 bp

• Scalable to portable or desktop Test and/or Stop at any time with flexible capacity

 • Simple & rapid, or automated, library prep From DNA to sequencing readings up to 30 minutes

• High yields for large Genomes


Links for references:  (Welcome for recommending helpful application links)

Gene ( )


·       1. History, 2. Molecular basis, 3, Structure and function, 4. Gene expression, 5. Inheritance, 6. Molecular evolution, 7. Genome

What Is A Gene? A Complete Guide To Genes


1. Introduction, 2. What is A Gene?, 3. What is a gene made up of?, 4. What is a gene mutation?, 5. The function of a gene, 6. Summary


HyperLink  LinkedIn ( Biotechnology related )