DNA sequencing is a technique for determining the nucleotide sequence of deoxyribonucleic acid (DNA). The nucleotide sequence is the most basic level of knowledge about a gene or genome.

It is the blueprint that carries the instructions for creating an organism, and no knowledge of genetic function or evolution would be complete without this information.

First-Generation Sequencing Technology

Allam Maxam

Walter Gilbert

The Maxam-Gilbert method, named after American molecular biologists Allan M. Maxam and Walter Gilbert, and the Sanger method (or dideoxy method), discovered by English biochemist Frederick Sanger, were among the first-generation sequencing technologies to emerge in the 1970s.

The Sanger method, which became the more widely used of the two methods, synthesized DNA chains on a template strand, but chain growth was halted when one of four possible dideoxy nucleotides, which lack a 3' hydroxyl group, was incorporated, preventing the addition of another nucleotide.

A computer computed the expected nucleotide sequence after the molecules were sorted by size using a technique known as electrophoresis. Later, automated sequencing machines were used to distinguish shortened DNA molecules tagged with fluorescent tags by size within narrow glass capillaries and identify them using laser excitation.

Next-Generation Sequencing Technology

Next-generation (massively parallel, or second-generation) sequencing technologies have essentially replaced first-generation technology. These newer methodologies allow for the sequencing of a large number of DNA fragments (often millions) at once, and they are less expensive and quicker than first-generation technology.

Advances in bioinformatics expanded the usability of next-generation technologies by allowing for larger data storage and making it easier to analyze and manipulate very huge data sets, sometimes in the gigabase range (1 gigabase = 1,000,000,000 base pairs of DNA).

Applications of DNA Sequencing Technologies

Knowing the sequence of a DNA fragment has several applications. First, it may be used to identify genes, which are DNA segments that code for a certain protein or trait. A portion of DNA that has been sequenced can be examined for gene-specific characteristics.

For example, open reading frames (ORFs)—long sequences that begin with a start codon (three adjacent nucleotides; the sequence of a codon determines amino acid synthesis) and are interrupted by stop codons (save for one at their termination)—indicate a protein-coding area.

Furthermore, human genes are typically located near so-called CpG islands, which are clusters of cytosine and guanine, two of the nucleotides that comprise DNA. If a gene with a known phenotype (such as a disease gene in humans) is found in the sequenced chromosomal region, all unassigned genes in the region will be considered candidates for that function.

The applications of next-generation sequencing technologies are numerous, thanks to their low cost and large-scale high-throughput capability. Using these tools, scientists have been able to efficiently sequence complete genomes (whole genome sequencing) of organisms, find genes involved in disease, and get a better understanding of genomic organization and diversity among species in general.