Sunday, December 28, 2014

Library construction and Illumina sequencing


This will be the first of a series of posts that will detail the process I have used to analyze a set of RNA seq data for our lab. 

Library Preparation

We’ll start by assuming that RNA has been successfully extracted from some biological source and ribosomal RNA has been removed. 

Step 1A: RNA -> cDNA 
To create a DNA library that can be sequenced, it is first necessary to convert our pool of RNA into complementary DNA (cDNA). One technique is to use a small DNA primer to target the poly-A tail on strands of mRNA and extend the primer to create cDNA.

For species that don’t attach a poly-A tail to mRNA, a strategy called random priming is used. The RNA is combined with synthetic DNA hexamer primers (444444 = 4096 different sequences) and the cDNA is constructed by extending off wherever a primer anneals to the RNA molecule.

Step 1B: Strand specificity (optional)
During the creation of the cDNA library, the initial strand cDNA is synthesized from a primer annealed to an RNA strand. During the synthesis of the complementary cDNA strand, it is possible to use dUTP instead of dTTP to mark the second strand with “U”s. This method results in double stranded cDNA where each strand is distinguishable from each other (where one contains T and the other contains U). The second strand can be targeted and degraded leaving a single strand that can be sequenced. The benefit of this method is that it allows each sequenced fragment to be mapped uniquely to one specific strand of the reference genome. For instance, if two genes overlap on opposite strands of a genome then this strand-specific sequencing strategy will be able to determine which strand (and therefore gene) each RNA fragment was derived from.


Step 2: Size selection

At this point, some protocol is used to break down all the cDNA fragments to a common size. Typically, this is some physical process such as sonication. The ends of these fragments are repaired and additional segments of synthetic DNA are ligated on the ends of the fragments. These include primers required to initiate sequencing and barcode sequences to identify which fragments belong to which sample (crucial for multiplexed experiments where multiple samples are sequenced at the same time). 


Step 3: Amplification

Finally, the library is PCR amplified and is ready to be sequenced.

Illumina Sequencing

For those out there (including myself) that are visually-orientated, this video will demonstrate the basic steps: 
http://youtu.be/77r5p8IBwJk?t=45s

After the library is prepared, the first step is to wash and mount the fragments on a flow cell. Fragments will be randomly distributed over this surface. Next, a technique called bridge amplification duplicates the annealed fragments to create a monoclonal cluster of DNA. 

This is where the sequencing begins. First a primer is annealed to every DNA segment attached to the flow cell. A series of cycles is preformed where a single (reversible terminators) fluorescent nucleotide is added to each growing DNA fragment. The each cluster of fragments emits a specific wavelength corresponding to one of four nucleotides. By measuring the wavelength of light emitted from each cluster after each cycle, the sequence of each fragment can be determined.

Finally, a technique known as paired-end sequencing can be used to double the sequencing information derived from each sample. The general idea is that each strand of DNA is sequenced from 5’ to 3’.  This results in two sets of sequences, one corresponding to each DNA strand. Each pair may or may not overlap depending on the size of the fragment and the length of each read.

(For further details and figures, please check out this page: http://nextgen.mgh.harvard.edu/IlluminaChemistry.html)

Once all this is done, the Illumina platform analyzes a huge set of images to derive the sequence of all the fragments. Our RNA-seq experiment uses random priming, pair-end sequencing, and is not strand specific. The blog posts that follow this will be written for this type of analysis.

In the next post, I will explain the output of this sequencing and how to analyze its quality.

No comments:

Post a Comment