12 Dec 2020

Notes on genetics.

This is quite far from my area of expertise and I only have a casual interest in these topics, so my notes are not super valuable.

The first human genome was sequenced in 2003.

In the 1990s, scientists estimated that the Human Genome Project would unveil 100,000 human genes. But the sequencing results revealed a stark reality: humans only have 20,000 genes—10,000 fewer than a water flea.

The technology to sequence genes has gotten cheaper and faster. The first human genome sequence cost three billion dollars, involved thousands of scientists, and took 13 years to complete. Today, sequencing machines can read a human genome in less than a day for a few hundred dollars.

About 500,000 human genomes have been sequenced since 2003.

Approach to find which genes regulate which

We begin by literally mail ordering short sequences of DNA —the regions immediately in front of a gene— where transcription factors typically bind. A computer helps us design mutated versions of each DNA sequence, randomly changing the letters until we have thousands of variants for each sequence. A company in San Francisco takes our digital letters, creates physical copies, and ships them in a little plastic tube to our laboratory. We then place these synthetic pieces of DNA inside of E. coli cells, and use a modified version of DNA sequencing to determine whether each “letter” change made a gene produce more or fewer Xerox copies.

If a DNA sequence produces very little RNA inside of the cell, this suggests that the letter change (or mutation) blocked transcription; it choked up the Xerox machine. It also indicates that an activator was probably binding to that DNA sequence, and the mutation prevented it from doing its job. Some mutations, however, increase the amount of RNA produced from a DNA sequence, suggesting that the mutation is preventing a repressor from binding.

By analyzing this data and running it through mathematical models, we can determine whether each gene is regulated by an activator or repressor, how many transcription factors regulate each gene, and where those transcription factors actually bind.

Notes from Decoding the Language of Genomes, Caltech Letters