Molecular genetics aims to study the structure, regulation and function of genes at a molecular level. Along with determining the pattern of inheritance, molecular genetics helps us understand the molecular causes of diseases. Modern genetics traces its roots back to Gregor Mendel who was one of the first scientists to study segregation of heritable traits in pea plants in the late 19th century. Since those days there has been an explosion of knowledge in the field of genetics. The mere observational research has grown into a profound understanding of underlying molecules (DNA, RNA, Proteins) and cellular information processing. Today’s field of molecular genetic research has a large array of different investigatory methods available to serve various objectives, ranging from the selective analyses of single point mutations to whole genome scans.
DNA isolation is an essential and routine procedure for collecting nucleic acids for subsequent molecular analysis. The nucleic acid has to be released from the cell. For genetic analysis peripheral white blood cells are the most widely used source of DNA, however any other tissue containing nucleic acids, e.g. saliva, cell culture or biopsy material, can be used as well. The three basic steps are:
- Cells are “lysed” or broken apart to make DNA accessible.
- By adding detergents membrane lipids are removed, and subsequently, proteins are eliminated by adding a protease.
- The DNA is isolated as a precipitate in alcohol. This step also removes alcohol-soluble salts.
RNA extraction is complicated by the ubiquitous presence of ribonuclease enzymes in cells and tissues, which can rapidly degrade RNA and therefore needs special treatment.
The polymerase chain reaction (PCR) is a simple technique for selectively amplifying a single or few copies of DNA across several orders of magnitude, generating between thousands and millions of copies of a particular DNA sequence. To allow selective amplification, some prior DNA sequence information from the target sequence is required to design two oligonucleotide primers, optimally about 18-25 nucleotides long, which are specific for sequences flanking the target sequence.
The PCR commonly requires a mixture of reagents containing, basically, DNA template, deoxynucleoside triphosphates (dNTPs), a heat stable DNA polymerase (Taq, Thermophilus aquaticus), buffer solution and cations. The PCR reaction consists of a series of cycles of three successive reactions performed in a thermal cycler: denaturation of DNA (+93-95°C), primer annealing (+50°C) and DNA synthesis (+70-75°C). Since the first reports that described this new technology in the mid 1980s, PCR has been extensively modified to provide a large array of applications, e.g. Touch Down and Nested PCR to increase the specificity of a PCR reaction, Multiplex PCR to target multiple regions of interest in a single PCR reaction, Reverse Transcriptase (RT)-PCR to produce total cDNA of isolated RNA, quantitative Real-Time PCR to amplify and simultaneously quantify a targeted DNA molecule (see also Gene dosage analysis).
Gel electrophoresis is a procedure for separating a mixture of molecules - in molecular genetics it is mainly used to separate negatively charged DNA (e.g. PCR products) or RNA molecules through a stationary material (gel containing agarose) in an electrical field. The gel matrix acts as a sieve for DNA molecules. Large molecules have difficulty getting through the holes in the matrix, whereas small molecules move easily through the holes resulting in a size-dependent separation.
All individuals carry genetic differences, many of which have important consequences (e.g. determining phenotypic features or disease susceptibility). In clinical research the investigation of known disease-associated genes for mutation detection, as well as the detection of new disease causing genes, is the focus of interest. The process of determining the respective genotype is facilitated by different methods as outlined below.
Sequencing is a technique that allows us to read the exact nucleotide sequence of a DNA molecule. The vast majority of current DNA sequencing uses an enzymatic method – the chain-termination DNA sequencing – first developed by Fred Sanger.
Here we describe the modified protocol for the widely-used automated capillary DNA sequencing method: basically, DNA is provided in a single-stranded form and acts as a template for a new complementary DNA strand. The sequencing reaction is performed by adding the four dNTPs plus a small portion of four analogous dideoxynucleotides (ddNTPs). These molecules serve as chain terminators by lacking the 3'hydroxyl group, which is normally needed for further strand elongation. Additionally, the ddNTPs are base-specifically labelled with different flourescenting dyes. Due to a random incorporation of ddNTPs in the growing complementary DNA strand, the sequencing reaction generates a collection of differently-sized, fluorescence-labeled DNA fragments.
In the consecutive electrophoresis run, the labelled molecules migrate through very long, thin glass capillary tubes filled with gel. A laser beam is focused on the gel at a specific constant position. As the individual DNA fragments migrate past this position, the laser causes the dye to fluoresce. Maximum fluorescence occurs at different wavelengths for the four dyes. The information is stored electronically and the output can be translated into a polypeptide sequences as shown below.
Single nucleotide polymorphisms (SNP) are the most common type of genetic variation. A SNP is a single base pair mutation at a specific locus, usually consisting of two alleles (rare allele frequency <1%). A subset of SNPs creates or abolishes the recognition site of a restriction enzyme, called restriction site polymorphisms (RSP). These RSPs can be typed by restriction digest resulting in different sized DNA fragments depending on presence or absence of RSPs, followed by gel electrophoresis and Southern Blot hybridization.
A PCR-based approach initially selectively amplifies the target DNA sequence containing the RSP. The amplified product then is cut with the appropriate restriction enzyme and size-fractioned by agarose gel electrophoresis. In this approach no further blotting is necessary.
Variable number of tandem repeat (VNTR) polymorphisms arise because of instability in an array of tandem repeats causing the number of repeat units to change. Depending on their size they are subclassified in micro- and minisatellites. Minisatellites form clusters up to 20 kb in length, with repeat units up to 25 bp; microsatellite clusters are shorter, usually <150 bp, and the repeat unit is usually 13 bp or less). These polymorphisms can be typed in an analogous manner to RSPs by a size dependent PCR-based or Southern Blot approach.
High resolution melting (HRM) is a time- and cost-efficient way for genotyping double stranded DNA molecules. Applications include SNP genotyping and point mutation detection (e.g. for screening a large cohort of patients preceeding detailed Sanger sequencing). After selective PCR amplification of the DNA template, the amplicon is precisely warmed from +50°C to +95°C. At some point during this process the melting temperature of the amplicon is reached and the two strands of DNA separate (="melt"). The reaction is performed in presence of a fluorescent dye which only fluorescents brightly in the presence of double stranded DNA (= intercalating). Owing to this characteristic, the point of strand separation can be precisely determined by a decrease of fluorescence signal. In the case of a sequence variation the melting curve will change due to different binding forces of molecules.
Genomic linkage analysis pursues two major goals: (1) the localisation and identification of new disease-causing genes and (2) to study genetic heterogeneity of diseases. To this end, linkage analysis tests for cosegregation of chromosomal regions, that is - in principle - to detect how often two loci are separated by meiotic recombination. This frequency, called recombination fraction, is a measure of genetic distance. If two loci are on different chromosomes, they will segregate independently. In contrast, if loci lie together on one chromosome they might be expected always to segregate together.
However, due to the events of crossing over during meiosis recombination occurs even on the same chromosome with a frequency/recombination fraction depending on the virtual distance on the chromosome. Recombination will therefore rarely separate loci that lie very close together on one chromosome. These sets of alleles on the same small chromosome segment tend to be transmitted together as a block through a pedigree and constitute the individual haplotype. Though the use of naturally occurring DNA sequence polymorphisms as generic markers it is possible to create a human genetic map and systematically trace the transmission of chromosomal regions in families.
Microsatellites - a subtype of VNTR polymorphisms - are very frequent in human genomes, and are present every few thousand base pairs. One of the most important attributes of microsatellite loci is their high level of allelic diversity, making them valuable as genetic markers. The unique sequences bordering the microsatellite motifs provide templates for specific primers to amplify the polymorphism via the polymerase chain reaction (PCR). Referred to as simple sequence length polymorphisms (SSLP), allelic differences are usually the result of variable numbers of repeat units within the microsatellite structure and can therefore be readily analyzed via PCR.
In general, SNPs are distributed randomly across the genome with most SNPs being located in non-coding regions. SNPs in non-coding regions, although they do not alter encoded proteins, serve as important genetic or physical markers for genomics studies. To perform a genome wide scan / genotyping in a single analysis it is important to have an approach where a tremendous number of SNPs can be analyzed simultaneously. DNA microarrays and chips are designed to allow the parallel detection of thousands of SNPs by the underlying concept of massively parallel hybridization. Therefore a large number of DNA probes, each one with a unique nucleotide probe sequences and designed to bind to a specific target DNA subsequence, are immobilized at defined positions on a solid surface. The array is incubated with fragmented and labeled target DNA to allow hybridization taking place. If the conditions are just right, an oligonucleotide will hybridize with another DNA molecule only if the oligonucleotide forms a completely base-paired structure with the second molecule. If there is a single mismatch - a single position within the oligonucleotide that does not form a base pair - hybridization does not occur. Multiple probes differing at a single position are used for analyzing each SNP to increase genotyping accuracy.
Which oligonucleotides have hybridized to the target DNA is determined by scanning the surface of the array and recording the positions at which the signal emitted by the label is detectable, mainly using a fluorescent dye. The fluorescent signal is detected by laser scanning or, more routinely, by fluorescent confocal microscopy.
Hybridization approaches have been implemented on high-throughput platforms such as the new Affymetrix Genome-Wide Human SNP Array 6.0 featuring 1.8 million genetic markers, including more than 906,600 single nucleotide polymorphisms (SNPs) in a single analysis.
A widely used application for genome wide SNP arrays is homozygosity mapping. In kindred with an apparently recessive disorder, particularly one where parental consanguinity is suspected, this approach can be used to map regions of extended homozygosity with high resolution and essentially complete genomic coverage. Because all tracts of disease segregating homozygous will be identified and all heterozygous regions/nonsegregating homozygous tracts excluded, one can be confident that the region harbouring the genetic lesion underlying disease has been identified. Whether this technique will identify single or multiple regions of interest and the size of these regions depends on several factors: the degree of parental consanguinity, the number of informative family members, and the relatively stochastic nature of recombination.
If a segment of DNA differs in copy-number from the reference sequence it is called a copy number variant (CNV). The segment may range in size from single to several exons, entire genes or even distinct chromosomal regions. CNVs can be caused by genomic rearrangements such as deletions, duplications, inversions, and translocations and are a frequent cause of various human diseases (e.g. hereditary neuropathy with liability to pressure palsy (HNPP) and hereditary motor and sensory neuropathy type 1a (CMT1a) caused by duplication and deletion of the PMP22 gene, respectively). To measure copy numbers different applications in human genetics are available.
In qPCR a target DNA molecule can be amplified and simultaneously quantified in a single reaction. It therefore enables both detection and quantification (as absolute number of copies or relative amount when normalized to DNA input or additional normalizing genes) of one or more specific sequences in a DNA sample. The basic principle of quantitative real time PCR has not changed since first application in the early 90ies. Following every PCR cycle, a fluorescence signal of an intercalating dye (e.g. SYBR green) is detected proportional to the amplified DNA. The quantification is based on comparison with a defined standard (normally constantly expressed and not regulated reference genes). qPCR measures the kinetic of product accumulation in each PCR tube.
MLPA is used to establish the copy number of up to 45 nucleic acid sequences in one single multiplex reaction. The method can be used for genomic DNA (including both copy number detection and methylation quantification) as well as for mRNA profiling. Each MLPA probe consists of a two oligonucleotides which recognise immediately adjacent target sites on the DNA (figure below - step 1). One probe oligonucleotide contains the sequence recognized by the forward primer, the other the sequence recognised by the reverse primer. Only when the two probe oligonucleotides are both hybridised to their adjacent targets can they be ligated during the ligation reaction (figure below - step 2). Because only ligated probes will be exponentially amplified during the subsequent PCR reaction (figure below - step 3), the number of probe ligation products is a measure for the number of target sequences in the sample. The target sequences are simultaneously amplified with the use of only one primer pair, resulting in a mixture of amplification products, in which each PCR product of each MLPA probe has a unique length.
ne PCR primer is fluorescently or isotopically labelled so that the MLPA reaction products can be visualized when electrophoresed on a capillary sequencer or a gel. Resulting chromatograms show size-separated fragments ranging from 130 to 490 bp (figure below - step 4). The peak area or peak height of each amplification product reflects the relative copy number of that target sequence. Comparison of the electrophoresis profile of the tested sample to that obtained with a control sample enables the detection of deletions or duplications of genomic regions of interest.
High-density single nucleotide polymorphism (SNP) array is a recently introduced technology that genotypes more than 10,000 human SNPs on a single array (see above). It has been shown that SNP arrays can be used not only for genotyping but also to determine DNA copy number (DCN) aberration. By measuring the locus specific hybridization intensity the copy number of individual loci can be determined.
In contrast to other “gene-centric” microarray approaches, which are typically designed to be complementary to known (annotated) genes or expressed sequence tags (ESTs), tiling arrays interrogate a whole genome or large genomic region with probe features tiling the sequence of interest with a regular spacing. Due to their design, they are a very versatile experimental tool for studying an organism’s genome or transcriptome in a manner that is not biased by the current state of its genome annotation. Beside the main application of studying genome transcription, tiling arrays have been applied to e.g. discover new genes, analyze alternative splicing.
Analog to the microarray approach described above, tiling arrays consist of short fragment probes (25-1000 bp) specifically designed to cover contiguous regions of or even the entire genome, which have been immobilized at a specific position and quantity on a solid surface. Depending on probe lengths and spacing different degrees of resolution can be achieved.
Commonly, RNA is used as a template, which is converted into cDNA using reverse transcriptase, (fluorescent-)labelled and hybridized to the arrays (see above “Whole genome scan by SNP arrays”). After washing away the non-targeted cDNA the fluorescence signal intensity represents transcription level of hybridized templates.
Alternatively, tiling arrays may be used to screen for macrodeletions. In this case templates of genomic DNA are hybridized to the array and the intensity of the signal indicates copy numbers of the respective DNA fragment (comparative genomic hybridization – CGH).
Expression profiling is a logical next step after sequencing a genome: the sequence tells us what the cell could possibly do, while the expression profile tells us what it is actually doing now. Genes contain the instructions for making messenger RNA (mRNA), but at any moment each cell makes mRNA from only a fraction of the genes it carries. Gene expression profiling is the measurement of the activity (the expression) of thousands of genes at once, to create a global picture of cellular function. In contrast to tiling arrays however, expression profiling only measures the relative activity of previously identified target genes.
Over the past few years, there has been a fundamental shift away from the application of automated Sanger sequencing (considered as ‘first-generation’ technology) for genome analysis. The newer techniques - so called “next generation sequencing technologies (NGS)” - enable us to produce an enormous volume of data rather cheaply — in some cases in excess of one billion short reads per instrument run. This feature expands the realm of experimentation beyond just determining the order of bases. For example, in gene-expression studies microarrays are now being replaced by seq-based methods, which can identify and quantify rare transcripts without prior knowledge of a particular gene and can provide information regarding alternative splicing and sequence variation in identified genes. The broadest application of NGS, however, may be the resequencing of human genomes to enhance our understanding of how genetic differences affect health and disease.
Todays commercially available technologies from Roche/454, Illumina/Solexa, Life/APG and Helicos BioSciences constitute different strategies of high throughput sequencing. Although these platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA templates or single DNA molecules is performed in a flow cell in a massively parallel manner. This design is a paradigm shift from that of Sanger sequencing, which is based on the electrophoretic separation of chain-termination products produced in individual sequencing reactions. Through iterative cycles of polymerase-mediated nucleotide extensions or, in one approach, through successive oligonucleotide ligations, sequence outputs in the range of hundreds of megabases to gigabases are obtained routinely.
Despite the substantial cost reductions associated with NGS technologies in comparison with the automated Sanger method, whole-genome sequencing is still an expensive endeavour. An interim solution to this problem may be to use NGS platforms to target specific regions of interest. This strategy can be used to examine all exons in the genome (Exome sequencing) or specific gene families associated with specific phenotypes or diseases (Exon enrichment).
The concept of targeting specific regions of the genome is well established, with PCR being the most widely used method, albeit on a small scale. Coupling PCR with high-throughput NGS platforms for targeting strategies is not practical, as sample preparation would require handling tens of thousands of primers individually or in large multiplex groups to meet the needs of a single instrument run. Newer PCR applications answer this problem by the use of microfluidics techniques as a preparative technology to perform thousands of individual PCRs at a time. These technologies sequester DNA in tiny reaction vessels for PCR amplification, then collecting the amplicons for subsequent analysis. This technique therefore enables to tremendously increase PCR throughput by - at the same time – reducing reagent costs for the individual PCR reaction by radically reduced reaction volumes. Recently launched platforms using the microfluidics technology are commercially available from Fluidigm (Access Array) and Raindance.
Custom-designed oligonucleotide microarrays (solid phase) and solution-based hybridization capture strategies have also been used for targeting regions of interest.
Both techniques involve hybridization of shotgun libraries of genomic DNA to individually designed and commercially available target-specific sequences on a microarray. For exome captures, both solution and array perform equivalently.