Functional Genomics Applications

In principle, any cell biology technique can be automated and resultant datastreams integrated and interpreted in order to focus precise manual experimentation. However, pathway mapping and target validation ultimately requires synthesis of information from many sources (biochemistry, structural biology, mRNA and protein expression, tissue distribution, immunohistochemistry, gene knockout, regulated transgenics, model organism genetics etc) not all of which are readily scaled in a meaningful way. This review will concentrate on examples from gene expression analysis, the first functional genomic technology to be robustly scaled, and from recent advances in the proteome-level identification of protein complexes by mass spectrometry.

DNA Microarravs - One of the first convincing uses of genome-wide DNA microarrays was for monitoring downstream signalling during the yeast pheromone response (19). This pathway involves the archetypal G-protein coupled receptor/MAP kinase cascade, which in turn was found by expression analysis to be linked to three other MAPK pathways activated by cell surface stress, high osmolarity and filamentous growth. A series of 46 gene deletion/overexpression experiments were correlated with significant expression changes in 383 transcripts, and the results clustered to reveal functional relationships. As well as pathway-specific expression of sets of genes, higher-order relationships between processes were revealed, for example sequential activation of the pheromone and protein kinase C regulated pathways. Extension of this approach to human cells has clear implications for pathway expansion, cross-talk and mechanistic studies of drug action (20).

An example of pathway cross-talk elucidated by the use of a 9984-element cDNA microarray is provided by a study of TRAIL-mediated gene expression in breast carcinoma cells (21). TRAIL is an apoptosis-inducing member of the tumour necrosis factor family, which was found to induce three sets of genes (early, middle and late) spanning a 24h period. The early set includes a wide range of proteins not previously known to be associated with TRAIL; the middle set includes known members associated with the TNF pathway as well as novel observations, and the late set strongly correlates unexpectedly with induction of the interferon pathway. Subsequent combination of TRAIL and interferon-beta in vitro synergistically induced apoptosis and caspase activation in breast cancer cells. The authors concluded from this study that multiple levels of cross-talk exist between these two diverse cytokine pathways, which has implications for target discovery and combination therapy.

Microarrays have also been used to describe the effects of overexpression of the transcription factor EGR1 on prostate carcinoma cells (22). EGR1 is naturally overexpressed in prostate cancer, and its target genes were found in the cell line to include several growth factors (including insulin-like growth factor and platelet-derived growth factor) as well as neuroendocrine genes, proteases and signalling proteins.

Examples of specific identification of potential drug targets using DNA microarrays that have reached the literature are sparse, but one clear case is that of superoxide dismutase (SOD) in cancer cell killing (23). SOD eliminates superoxide and so protects cells from free-radical mediated damage; it also has abnormally low levels of activity in cancer cells, rendering them particularly sensitive to free-radical induced damage. Oestrogen derivatives were found serendipitously to kill human leukaemia cells, but not normal lymphocytes; 2-methoxyoestradiol caused this effect, and in a search for candidate targets CuZnSOD mRNA was found to be 2-fold increased. Biochemical investigation showed this to be due to a decrease in cellular SOD activity and consequent feedback upregulation of SOD expression. Limited SAR and further functional studies confirmed the view that methoxyoestradiol selectively kills cancer cells through SOD inhibition.

Proteome analysis - Protein mass spectrometry is now able to detect very low levels of protein and, when combined with proteolytic digestion and genome database searching, can unambiguously identify proteins at very high throughput (24). When applied to immunoprecipitates or other affinity purified protein complexes, this technique has led to the assignment of cellular function to hundreds of human proteins over the past few years, including novel drug targets such as caspase-8 and l-kappa kinase 2 in the tumour necrosis factor (25, 26). It has proved possible to scale up this approach by coupling high-throughput cellular expression and affinity purification to gel electrophoresis, liquid chromatography and mass spectrometry.

Two large-scale analyses in yeast exemplify this technology (27,28). Gavin et al, using affinity tagged genes (1,739 in total) under control of their natural promoters, isolated 232 distinct multi-protein complexes. Ninety-eight of these were already known and present in the Yeast Protein Database ; the remaining 134 complexes were novel. Complexes ranged in size between 2 and 82 different protein components, with a typical size of around 5. Because the novel complexes generally contain some proteins of known function, it was possible to propose functional roles for associated members within the complex based on circumstantial evidence; 231 proteins with no previous annotation were assigned a function by this process. For example, the complex that polyadenylates mRNA comprised 21 proteins, 4 of which had no previous annotation of any kind. The protein complexes observed frequently contained common components that point to interconnections between them, indicating that a complex network of functional relationships exists through which a variety of cellular processes are effected. This network is dynamic, in that complexes can assemble and disassemble and any given complex may show variable composition, enabling coupling through multiple pathways. For example the protein phosphatase PP2A was found to be bound to cell-cycle regulators, and in a separate complex, to proteins involved in cellular morphogenesis. Higher-level (complex-to-complex) mapping suggested grouping of complexes that belong to similar biological processes, for example intermediary metabolism or cell cycling. The study suggested the presence of orthologous complexes (not just orthologous proteins) between yeast and man, consistent with conservation of key functional units defining a "core proteome". As discussed above for gene expression pathways, there is already evidence for variations in the composition of complexes (modules) in metazoans depending on cellular context, effectively "paralogous" complexes, trends that are likely to be reinforced as studies such as this are extended. Not all proteins tested could be assigned to complexes - around 20% could not, probably in part due to interference by the affinity tags used as apart of the purification method.

In a similar study, Ho et al., starting with 725 "bait" proteins, detected 3,617 associated proteins corresponding to 25% of the yeast proteome (28). The average success-rate in identifying known complexes was 3-fold higher than for large-scale yeast two-hybrid experiments (Gavin et al. found around a 5-fold improvement), probably due to greater physiological relevance and the cooperative stability of multiprotein complexes (yeast two-hybrid approaches detect only pairwise interactions, although this technique has the virtues of very high scalability at relatively low cost and detection of weak interactions). Their general findings were similar to Gavin et al. They additionally showed that 275 of the detected complexes contained two or more interaction partners within the same biological process as defined by Gene Ontology, reinforcing the concordance of physical and functional networks (29). They also found that the network conformed to the expected power-law distribution referred to above (11). The authors focussed on signalling proteins (kinases, phosphatases and regulatory subunits) which enabled them to identify many novel connections of possible regulatory significance. For example an extensive network was assembled around the cyclin-dependant kinase Cdc28 with negative and positive regulators and links to other pathways. The DNA-damage response network was described in depth, revealing many known and new pathways that dictate cell cycle progression, transcription, protein degradation and DNA repair; members of a yeast E3 ubiquitin ligase complex not evident from simple bioinformatics were assigned based on comparison with mammalian orthologues and more detailed bioinformatics analysis. Finally, the probable upstream regulators, substrates and downstream effects of the protein kinase Dun-1, a known member of the DNA damage response process but of previously unclear role, were identified as part of this single global analysis. Extension of this approach to human cells, either by orthology or direct application, is bound to reveal many potential targets for drug intervention; in many cases they will be once or twice removed from the "disease pathway" and so will not be detected by more linear approaches.

Even without the use of parallelised expression and purification, substantial progress can be made by focussed use of biochemical isolation combined with mass spectrometry to build up a comprehensive picture of signalling complexes. This approach is readily applicable to mammalian cells. Grant and Husi took this approach to describe the multiprotein complexes that process neural information and encode memory (30). It was postulated that signaling complexes influence learning, and that the simple model of the N-methyl-D-Aspartate receptor (NR), a membrane protein sitting at the post-synaptlc side of the synapse, injecting Ca2+ into the dendrite to activate a number of cytosolic enzymes, is too simplistic. The first evidence came from isolating a variant form of a protein PSD-95 that binds directly to the NR, and produces marked changes in synaptic plasticity and learning in a mouse model. A large-scale isolation of NR-PSD-95 complexes was carried out by biochemical methods, and the proteins identified by a combination of immunoblotting and mass spectrometry. Over 75 proteins were identified that could be joined into a meaningful picture consistent with genetic observations. This assemblage of proteins provides a firm framework for understanding the molecular basis of learning, and for how it might be regulated naturally or through intervention.

0 0

Post a comment