A gene is a segment of the long thin molecule of DNA. Each gene encodes a protein or part of a protein. The proteins perform all the numerous activities of life. They are directly or indirectly responsible for all the cell's structures, its organisation, its metabolism, its transport processes, and the co-ordination of these: in short, its internal state. Genes themselves do nothing except encode proteins. To make any given protein, the appropriate gene is copied on to a messenger RNA and then this copy is read by ribosomes, which translate the coded instructions. When a cell reproduces, the replicating machinery has to ensure that each of the two daughter cells receives an exactly identical copy of the parent cell's DNA. Both daughter cells must have all and only the same genes as the parent cell so that they are potentially capable of making all (and only) the same proteins as the parent cell.
The phrase "potentially capable" is our point of departure for this chapter. The various different cells in a multicellular organism contain (with very few exceptions) exactly the same DNA, the same genes. Cells in the same organism that differ in function and appearance - i.e. have different internal states - necessarily contain different proteins. Therefore, although they contain the same genes they express different ones. Changes in gene expression alter a cell's complement of proteins and consequently the internal state.
In chapter 2 we compared a gene to a master document in a secure library. The first step in making a protein is to "photocopy the document". The
"photocopy" is a messenger RNA molecule11, which the ribosomes then "read" and translate to make the protein. We now need to examine the "photocopying" process more closely.
The "photocopier" is an enzyme, RNA polymerase II12 ("polymerase" for short), which copies the relevant part of a DNA strand (the gene) base by base to make a faithful replica. This "copying" process is technically known as transcription (trans = cross; scriptus = written). The message written on the DNA is written out again in the form of RNA. The polymerase starts transcribing at the beginning of the gene and stops at the end.
11 Strictly speaking, matters are a little more complicated. In prokaryotes, several successive genes are sometimes copied on to a single long messenger. In eukaryotes, the RNA copy of the DNA needs to be processed before it becomes a mature messenger; for example, non-coding regions known as introns interrupt the sequences that code for the protein, and they have to be cut out of the RNA copy. Important though they are in molecular biology, these matters need not concern us here.
12 Almost all enzyme names end in -ase. In the name of this particular enzyme, the "II" is included because there are other sorts of RNA polymerase, numbered in an arbitrary sequence. The rest of the name indicates that RNA, like DNA, is a polymer, a long molecule made by joining together a lot of short molecules (of which the bases A, G, T and C that make up the code in DNA are parts). RNA polymerase joins together some short molecules (containing bases) to form a polymer, RNA, which is a replica of the gene.
The polymerase runs along the DNA like a toy engine along its tracks, transcribing as it goes. It starts from where it is placed on the rails and stops when it hits the buffers. How is it placed on the rails at the right place, i.e. the start of the gene; and what are the buffers at the end? The answers are, yet again, provided by specialised proteins, which are designed to bind to particular DNA sequences.
Using the standard four-letter code of DNA representing the four bases (A, G, T and C), suppose a short piece of DNA had the sequence TTGTCCCAGTTGGCAAATCTTTT.
Consider two DNA-binding proteins. Suppose one binds only to the sequence CCAGT and the other to the sequence CTTT. In the fragment of DNA we have shown here, the former will bind at site 1 and the latter at site 2, but neither protein will bind anywhere else. The sequences CCAGT and CTTT are the recognition sequences for these two proteins.
Proteins that bind specifically to DNA sequences at the ends of genes serve as buffers. They stop the polymerase and knock it off the rails. Because the binding is specific, the "buffer" cannot bind to the wrong piece of DNA and jam the polymerase in mid-gene. As for starting the transcription process, the simplest design would have the same polymerase recognition sequence at the start of every gene. The polymerase would bind to this sequence, so it would always be placed on the DNA rails in the right place.
This is more or less what happens in prokaryotes. In eukaryotes, however, the situation is a little more complicated. There is so much more DNA in a eukaryotic cell that there is a far greater chance that a 4-5 base recognition sequence, to which the polymerase might bind, will turn up in an inappropriate place. If that happened, the polymerase would waste time and energy transcribing chunks of DNA that are not complete genes. In principle, the solution to this difficulty is to use a longer recognition sequence. The longer the sequence, the less chance it has of turning up at random13, so the more reliably it can be used to mark the beginnings of genes. Unfortunately, a sequence sufficiently long to meet this criterion for eukaryotic DNA would be too long for any protein to recognise and bind specifically. Even RNA polymerase II, a very large enzyme, does not have such a big DNA binding site.
The practical solution is to have several proteins binding to different parts of a long recognition sequence, and then make the polymerase bind to these proteins. The cluster of proteins that binds to this long recognition sequence (the promoter) is called the initiation complex. Its role is akin to that of a child's hands placing a toy engine on the track at the desired place.
Eukaryotic transcription is started and stopped at the beginning and end of a gene by clusters of proteins bound at the promoter and the termination sites. These clusters cause the polymerase to start and to stop in all and only the right places. But this does not tell us how transcription is controlled. Why is a particular gene expressed (transcribed) at some times but not at
13 Since there are four bases, the chances of a particular base turning up in a given position are one in four. The chances of a particular two-base sequence are one in sixteen; of a three-base sequence one in sixty-four; and so on. A five-base sequence has a probability of one in 1024; it is likely to turn up about 6000 times more frequently in human DNA than in a prokaryote. The shortest sequence that is statistically likely to be unique in human DNA is about 17 bases long.
others - switched on and off? And how can transcription be speeded up or slowed down? If there were no practical answers to these questions, there would be no way, for example, of making one type of human cell differentiate from another or adapt to changing needs.
If a gene is not switched off then it will be expressed (= transcribed) - but only slowly. The initiation complex launches a polymerase molecule along the gene once every so often. Slow transcription is not a problem so long as the cell needs only small quantities of the protein encoded in this particular gene. This is the case, for example, for major metabolic pathway enzymes. However, other proteins are needed quickly and in large amounts, and are needed at some times but not others. To meet such needs, the cell must be able to de-suppress the right genes at the right times. Moreover, it must be able to accelerate the transcription of those genes. De-suppression is simple in principle, so long as the gene is switched off reversibly; all the cell needs to do is to reconstruct the initiation complex or remove the repressor. But how can transcription be accelerated? How is the initiation complex persuaded to launch polymerase molecules along the gene faster than usual?
This is done by proteins known as transcription factors, which bind to regions of the DNA (enhancers) that are often very distant from the gene. This sounds like "action at a distance", or even magic. But remember, to fit the thread representing the DNA into the matchbox model of the cell (chapter 2), you had to tangle it. This tangling might bring two points on the thread a metre or more apart into close contact. Imagine a few grains of salt stuck to one of these two points. Let these grains of salt represent the initiation complex at the promoter (start) of a gene. Now imagine a single grain of salt stuck to the "distant" point. The protein represented by this single grain is the transcription factor attached to the enhancer site. Because of the tangling of the thread, the transcription factor has been brought into immediate contact with the initiation complex. This enables it to speed up the binding and launching of the polymerase. In practice, a gene with a controllable expression rate usually has many enhancers that bind different transcription factors, and their effects are additive. Working together, they accelerate transcription very markedly. When some of them operate and some do not, a more moderate acceleration is achieved14.
14 A few transcription factors inhibit transcription rather than accelerate it. They are normally outnumbered by the positive factors but they are useful because they make subtle changes in the transcription rate possible.
In short: some genes, such as those for metabolic pathway enzymes, tend to "tick over", transcribing at a constant slow rate. But the outputs of other genes can be varied from zero (when the gene is switched off) to a very high rate (when all the transcription factors on all the enhancers work in concert).
Was this article helpful?