Sequence analysis is the study and interpretation of the order of monomers in biological polymers. In molecular biology it most commonly refers to determining the order of nucleotides in DNA or RNA and the order of amino acids in peptides and proteins. Practical work begins with a sample and yields raw outputs from instruments; those outputs are then processed computationally and interpreted by researchers. For a general introduction see sequence analysis resources.
What is analyzed and why
The basic units differ by molecule: for nucleic acids the units are nucleotides such as adenine, cytosine, guanine and thymine/uracil, while for polypeptides they are amino acids. Sequence analysis reveals composition, order and variation within these polymers and enables identification, annotation, and comparison. For primer definitions and nucleotide concepts see nucleotides and for the broader concept of a nucleic acid see nucleic acids.
Common methods and analytical steps
- Data generation: laboratory methods produce raw reads — classical Sanger sequencing, high-throughput (next-generation) sequencing, or protein sequencing approaches.
- Preprocessing: base-calling, quality trimming and filtering of raw instrument output.
- Assembly and alignment: building contiguous sequences from reads or aligning sequences to references to detect differences.
- Annotation and interpretation: identifying genes, functional sites, variants, or motifs and placing results in biological context.
Protein sequence work uses related but distinct techniques; mass spectrometry and peptide sequencing support analysis of amino acid orders and post-translational modifications. For peptide-level approaches see peptide sequencing and for protein-level resources see protein analysis.
Analysis produces several practical outputs: lists of variants (single-nucleotide polymorphisms, insertions/deletions), gene models, predicted protein sequences, and similarity scores used for database searches or phylogenetic inference.
Historically, methods evolved from labor-intensive chemical and enzymatic techniques to automated machines and massively parallel instruments. Computational tools and databases grew alongside laboratory advances, making sequence analysis a heavily bioinformatics-driven field that bridges wet lab data and biological interpretation.
Applications are broad: reconstructing evolutionary relationships, diagnosing genetic disease, tracking pathogens in public health, guiding conservation genetics, and informing biotechnology and drug discovery. Limitations include sequencing errors, assembly ambiguity in repetitive regions, and challenges in interpreting noncoding variation or complex structural changes. Combining laboratory rigor with transparent computational workflows is key to reliable results.