Bioinformatic analysis of xenobiotic reactive metabolite target proteins and their interacting partners
© Fang et al; licensee BioMed Central Ltd. 2009
Received: 02 February 2009
Accepted: 12 June 2009
Published: 12 June 2009
Protein covalent binding by reactive metabolites of drugs, chemicals and natural products can lead to acute cytotoxicity. Recent rapid progress in reactive metabolite target protein identification has shown that adduction is surprisingly selective and inspired the hope that analysis of target proteins might reveal protein factors that differentiate target- vs. non-target proteins and illuminate mechanisms connecting covalent binding to cytotoxicity.
Sorting 171 known reactive metabolite target proteins revealed a number of GO categories and KEGG pathways to be significantly enriched in targets, but in most cases the classes were too large, and the "percent coverage" too small, to allow meaningful conclusions about mechanisms of toxicity. However, a similar analysis of the directlyinteracting partners of 28 common targets of multiple reactive metabolites revealed highly significant enrichments in terms likely to be highly relevant to cytotoxicity (e.g., MAP kinase pathways, apoptosis, response to unfolded protein). Machine learning was used to rank the contribution of 211 computed protein features to determining protein susceptibility to adduction. Protein lysine (but not cysteine) content and protein instability index (i.e., rate of turnover in vivo) were among the features most important to determining susceptibility.
As yet there is no good explanation for why some low-abundance proteins become heavily adducted while some abundant proteins become only lightly adducted in vivo. Analyzing the directly interacting partners of target proteins appears to yield greater insight into mechanisms of toxicity than analyzing target proteins per se. The insights provided can readily be formulated as hypotheses to test in future experimental studies.
More than half a century has passed since the discovery that reactive electrophilic metabolites derived from xenobiotic agents covalently modify endogenous cellular proteins [1, 2]. Since then such covalent binding by reactive metabolites has been strongly correlated with, and is widely believed to be responsible for, the acute organ-damaging effects of a wide range of xenobiotic agents including drugs and natural products [3–5]. Tissue injury is a complex phenomenon. Most tissues are comprised of more than one cell type, and macroscopic tissue injury often involves various secreted chemical mediators such as cytokines, tumor necrosis factor, nitric oxide and pro-inflammatory cells such as leukocytes and macrophages or Kupffer cells. Thus it is important to note that the same compounds that can cause organ damage in vivo can also cause acute cytotoxicity, correlated to protein covalent binding, in isolated metabolically-competent cells in vitro.
It has generally been presumed that protein adduction impairs protein function, leading to disruption of metabolic or signaling pathways, organelle failure, loss of cellular homeostasis, aberrant cell-cell interactions, macroscopic tissue damage through necrosis and/or apoptosis, and in the extreme, organ failure and death. We know that some compounds that give rise to covalently bound residues do not cause toxicity, i.e., none of their adducts trigger events leading to cytotoxicity. Likewise some, perhaps many, of the adducts of toxic compounds are ineffective at causing toxicity, but some of the adducts of toxic compounds do trigger events leading to cytotoxicity (and eventually tissue damage and organ injury in vivo). The challenge is to identify which adduct structure(s) on which amino acid residue(s) of which protein(s) are important for toxicity, and the mechanism(s) by which their appearance triggers toxicity while others do not.
Despite extensive investigation, the mechanisms by which covalent binding events trigger cytotoxic outcomes remain largely unclear [6, 7]. A major reason for this gap is that only recently has it become technically feasible to identify numbers of individual proteins targeted by xenobiotic reactive metabolites. Early target protein identifications were based on isolating individual adduct-bearing proteins, one at a time, using traditional protein separation methods. By 1997, only 28 proteins targeted by xenobiotic reactive metabolites had been isolated and identified, largely by N-terminal sequencing . In 1998, however, the coupling of 2D gel electrophoresis with mass spectrometric methods of protein identification literally revolutionized the field . Since then both the number of known target proteins and the number of small-molecule adduct-forming xenobiotics studied have increased nearly ten-fold. To help keep track of this information and to facilitate its analysis, we recently built the Reactive Metabolite Target Protein Database (TPDB) [6, 9, 10]. All proteins listed in this database came from studies using live animals or intact living cells.
As the number of known target proteins grew, so did the hope of elucidating mechanistic pathways connecting covalent adduction events to the observed cytotoxic outcomes. For example, we and others [7, 11–14] have attempted to make sense of lists of target proteins by arbitrarily grouping them into categories according to function, but this approach has done little to reveal to a unifying mechanism of toxicity caused by a variety of reactive metabolites [6, 7]. Another way to analyze target proteins is to sort them into Gene Ontology (GO) categories  and determine whether any show an over-abundance of target proteins relative to statistical expectations. Similarly, KEGG biological pathway analysis  can identify metabolic pathways in which target proteins are over-represented. Such analyses could potentially indicate a functional connection between target protein adduction and biological consequence in the context of systems biology .
In living cells, proteins interact extensively with other proteins, forming protein-protein interaction (PPI) networks that sense and respond to the abundance or status of other network proteins. Since endogenous post-translational modifications of proteins are well-known to perturb PPI networks comprising intracellular signaling cascades [17–20], it is plausible to hypothesize that protein modification through adduction by xenobiotic reactive metabolites could constitute an aberrant form of signaling leading to cytotoxic consequences. Thus, inspection of the interacting partners of reactive metabolite target proteins might shed new light on the path from protein adduction to cytotoxicity.
In addition to the question of how protein adduction leads to cytotoxicity, it is also of interest to know what features of a protein, beyond simple abundance, determine whether or not it is likely to be a target for reactive metabolites. Electrophilic xenobiotic metabolites can be classified broadly as having acylating or alkylating activity [6, 10]. The former tend to attack lysine side chains, while the latter tend to attack predominantly cysteine, histidine and lysine side chains. Despite the commonplace occurrence of these side chains in most proteins, it is well established that protein adduction in living cells is remarkably selective, with some abundant proteins experiencing little adduction while some low-abundance proteins experience high levels of adduction [7, 13]. Nevertheless, the protein features that determine susceptibility to adduction are almost completely unknown . Since many protein features can now be calculated or predicted using software programs, the analysis of target proteins using feature selection algorithms could potentially shed light on this important question . Feature selection algorithms operate differently from conventional statistical (correlative) studies of individual features considered independently from each other. Such algorithms can identify which features among many contribute the most to determining a complex behavior such as relative susceptibility to adduction by electrophiles.
In this paper we report our efforts to use bioinformatics approaches to elucidate the interactions between reactive metabolites, their cellular target proteins, and the other proteins that interact directly with the target proteins. In brief, we first analyzed 171 proteins targeted by reactive metabolites from one or more of 18 different protoxins and found that a number of GO categories and KEGG pathways were significantly enriched with some of these target proteins. We then selected 28 proteins known to be adducted by reactive metabolites of at least 3 different protoxins and found that 21 of them had a total of 165 directly interacting partners. GO and KEGG pathway analysis of the combined 186 proteins revealed several categories to be highly significantly enriched by target proteins and/or their directly interacting partner proteins. Finally, we applied machine learning methods to analyze the properties of 62 rat liver proteins targeted by reactive metabolites of thiobenzamide and 45 rat liver proteins targeted by reactive metabolites of bromobenzene, in an effort to identify properties that help to distinguish whether a protein is likely to be a target of reactive metabolites.
The identities of the target proteins used in this study, and the structures and names of the protoxins whose metabolites bind to them, are freely available from the reactive metabolite target protein database [6, 9, 10]. For the Gene Ontology and KEGG Pathway analyses we used all 171 proteins listed in the TPDB as of February 2008. For the protein-protein interaction analysis we used the "rank by number of hits" function within the TPDB to select the 28 proteins most commonly targeted by different reactive metabolites. For the machine learning study we used 37 proteins targeted by thiobenzamide metabolites, 20 proteins targeted by bromobenzene metabolites, and 25 proteins targeted by both (82 proteins total). To create a negative learning dataset, all rat proteins in UniProtKB  were downloaded on May 5, 2008. A software program CD-HIT  was used to filter redundant sequences in the dataset at the level of 80% identity. We also eliminated all sequences with lengths less than 40 residues and subtracted all known target proteins to arrive at a set of 11482 proteins in the negative (non-target) learning dataset. Although the list of target proteins is incomplete, the percentage of all proteins in living cells that actually become detectably adducted by reactive metabolites is relatively small (< 10% based on comparisons of 2D gels vs. their autoradiograms; [7, 13]). Thus the vast majority of proteins in the negative dataset are considered as non-target proteins. In addition, the Random Forest algorithm used to analyze target vs. non-target proteins (see below) is relatively tolerant of small amounts of "noise" in the data.
Software programs and databases used in the study
The Gene Ontology project  is a collaborative effort to develop standard vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. The controlled vocabularies are hierarchically structured (multiple parent levels are permitted) so that they can be queried at different levels. KEGG  is a widely used database of biological pathways. The January 2008 version of the Human Protein Reference Database (HPRD) was downloaded from the HPRD website . The binary protein-protein interactions (8919 proteins in 34364 distinct PPIs) were then imported into Cytoscape, an open source bioinformatics software platform for visualizing and analyzing biological interaction networks (http://www.cytoscape.org/). The Cytoscape plugin BiNGO was used to determine and visualize statistically overpopulated GO categories in a set of genes .
Random forest models and feature extraction
Number of features
Remarks and references
Amino acid residue composition
Numbers and percentages of 20 amino acid residues
Numbers and percentages of positive residues, negative residues, all charged residues, net charges
Predicted isoelectric point
Number of C/H/N/O/S atoms
Predicted based on dipeptide composition 
Proteins with predicted in vivo half life ≥ 40 hrs considered as stable 
Gravy hydropathy index
Predicted percentage of sheet, helix and coil
Predicted percentage of buried/exposed residues
Predicted disordered regions including length of the longest coil, length of all coils, percentage of the longest coil, percentage of all coils; and corresponding features for rem465 and hotloop
(5 * 5 * 5)
The dipeptide or tripeptide composition of proteins has been found to correlate to a number of protein properties. For example, Guruprasad et al. found a correlation between the dipeptide composition and the stability of proteins . More recently, tripeptide composition was applied in a machine learning model to predict protein-protein interactions . In this study, the 20 amino acids were sorted into just five groups according to their physicochemical properties: hydrophobic, GAVLIMPFW; polar, YTSNQ; positive, RKH; negative, DE; the fifth group contained only cysteine because of its unique ability to form disulfide bonds. Therefore the number of possible tripeptide features according to this classification scheme is 5*5*5 or 125.
Ranking feature importance
The random forest approach offers several methods to assess the importance of features based on their contributions to the correctness of the resulting classification. These methods include i) the mean decrease in accuracy for each class, ii) the mean decrease in accuracy over all classes, and iii) the mean decrease in the Gini impurity criterion . In this study we used the mean decease in accuracy for the minor class (target proteins) because in previous work we found it to give more accurate results than other measures for ranking feature importance .
For imbalanced data (i.e., comparisons of small vs. large data sets), the overall classification accuracy is not an appropriate measure of performance because very high accuracy can be achieved simply by predicting every case as the majority class (i.e. the control proteins) as opposed to predicting the minority class (i.e. target proteins). In this study, we used three indicators of performance: sensitivity, specificity, and area under the curve (AUC) for the receiver operating characteristic (ROC) curve. We ranked all cases in the dataset according to the predicted likelihood of positive status and then employed that rank order to identify correctly classified true positives. We then used these results to generate an ROC curve (i.e., a plot of the true positive rate (sensitivity) against the false positive rate (1 – specificity)). The area under the ROC curve represents the trade-off between sensitivity and specificity over the whole range of data. An AUC of 1 represents a perfect prediction model while an AUC ≥ 0.9 is considered excellent, an AUC between 0.8 and 0.9 is considered good, and an AUC in the range of 0.7–0.8 is fair.
Results and discussion
Bioinformatic analysis of known target proteins
Representative Gene Ontology terms significantly enriched in target proteins.a
Number of target proteins selected
Total number of proteins in category
Fraction of population as targets
oxidoreductase activity, acting on peroxide as acceptor
glutathione transferase activity
thioredoxin peroxidase activity
unfolded protein binding
fatty acid binding
carboxylic acid metabolic process
amino acid and derivative metabolic process
amine metabolic process
response to stress
xenobiotic metabolic process
response to xenobiotic stimulus
response to oxidative stress
endoplasmic reticulum lumen
ER-Golgi intermediate compartment
Of the 171 known targets, 115 sort into the broad Molecular Function subcategory of catalytic activity, while only 11 sort into the much more specifically-defined subcategory of antioxidant activity. Nevertheless, both results are highly significant statistically. Likewise, 125 of the 171 target proteins sort into the broad Biological Process subcategory of "metabolic process" while only 8 sort into the subcategory of "xenobiotic metabolic process." Again, both of these results are highly significant statistically. However, the results of sorting target proteins into various Cellular Component subcategories must be viewed with some caution for several reasons. First, some target protein identification studies analyzed whole tissue samples whereas others analyzed only selected subcellular fractions of tissue homogenates. Second, a majority of the enzymes associated with bioactivation of xenobiotics to reactive metabolites are located in the endoplasmic reticulum (microsomal fraction) of the cell. For certain metabolites (e.g. trifluoroacetyl chloride formed by P450-catalyzed oxidation of halothane, CF3CHBrCl), the reactivity of the metabolite may limit its ability to diffuse from the site of formation to other sites within the cell.
KEGG pathways significantly enriched in target proteins.a
Number of target proteins selected
Total number of proteins in pathway
Fraction of population as targets
Urea cycle and metabolism of amino groups
Arginine and proline metabolism
Metabolism of xenobiotics by cytochrome P450
Limonene and pinene degradation
3-Chloroacrylic acid degradation
Fatty acid metabolism
Analyses such as those presented above are based on the principles of statistics; thus the observed enrichments of target proteins in certain GO categories and KEGG pathways are a priori significant, at least in the statistical sense. For example, 115 of the 171 target proteins in the Molecular Function GO category sorted into the catalytic activity subcategory (which contains 5085 genes overall; Table 2). Although this result is highly significant statistically (p = 1.45E-19), these 115 targets represent only 2% of all proteins in this broad subcategory. Thus, regardless of its statistical significance, this finding can at best hint at a biologically significant mechanistic connection to toxicology. In contrast, only 5 of the 171 target proteins sorted into the peroxiredoxin activity subcategory. This result too is highly significant statistically (p = 9.43E-08), but since this entire subcategory contains only 6 proteins, the finding that five of them (83%) are targeted by reactive metabolites suggests that perturbation of this Molecular Function by protein adduction could perhaps be significant biologically. A number of other Molecular Function and Biological Process GO terms that are overpopulated with target proteins have been individually implicated in the context of reactive metabolite toxicity (e.g., peroxidase activity, response to oxidative stress, antioxidant activity, etc.). Likewise 101 of the171 target proteins (59%) were also enriched in certain KEGG pathways (Table 3), some of which (e.g., glycolysis, urea cycle, glutamate metabolism, cysteine metabolism and, not surprisingly, metabolism of xenobiotics by cytochrome P450) have also been linked to potential mechanisms of chemical cytotoxicity.
Collectively, the highlighting of certain GO terms and KEGG pathways via the more independent and more comprehensive bioinformatics approach taken here is intriguing and encouraging. While findings such as this can help to formulate or reinforce hypotheses about mechanisms of reactive metabolite cytotoxicity, each such hypothesis will need more extensive testing before it can be taken seriously as a significant contributor to the overall mechanism of cytotoxicity. In the end, however, it must be conceded that the systematic global analyses of all known target proteins affords only a modest advance toward elucidation of mechanisms beyond that afforded by gazing at lists of arbitrarily-grouped target proteins of single protoxicants [6, 7]. To go beyond this limitation, we searched for and analyzed non-target proteins that interact directly with known target proteins, as described below.
Bioinformatic analysis of common target proteins and their interacting partners
Protein-protein interaction data analysis
Summary of interacting partner analysis for human orthologs of 28 rat or mouse proteins commonly targeted by multiple different reactive metabolites a
Human Protein name
Number of interacting partner proteins found
Aldehyde dehydrogenase 2
Carbonic anhydrase III
Fatty acid binding protein 1
Protein disulfide isomerase A3
Glutathione S-transferase Mu-1
Glutathione S-transferase Mu-2
Heat shock 70 kDa protein 8
Isocitrate dehydrogenase 1
Protein disulfide isomerase
Raf kinase inhibitor protein
Phosphoglycerate kinase 1
Selenium binding protein 1
Protein disulfide isomerase P5
Endoplasmic reticulum protein 29
Aldo keto reductase family 1, member C4
Translational inhibitor protein p14.5
GO and KEGG pathway analysis of common targets and their interacting partners
GO categories with an over-representation of target proteins or their interacting partners
Number of target proteins selected
Total number of proteins in category
Fraction of population as targets
unfolded protein binding
response to unfolded protein
KEGG pathways containing target proteins or their interacting partners
Number of target proteins selected
Total number of proteins in pathway
Fraction of population as targets
MAP kinase signaling pathway
Antigen processing and presentation
Arginine and proline metabolism
Adipocytokine signaling pathway
Apparently, looking beyond just target proteins and considering their first interacting partners may provide a deeper look into potential mechanisms of cytotoxicity. The potential indirectness of analyzing human orthologs of rat and mouse target proteins (necessitated by the paucity of information about rat and mouse vs. human PPIs) was offset by focusing on just 28 proteins known to be targeted by multiple different reactive metabolites. Comparing the results of Table 5 to those of Table 2 shows that the GO Molecular Function subcategory "unfolded protein binding" was again highlighted, but this time with much higher statistical significance (3.3E-09 vs. 9.0E-05) and much higher fractional coverage of the class (0.124 vs. 0.078). The significance and coverage of the Biological Process "protein folding" are greatly increased in Table 5 vs. Table 2. In addition, Table 5 also lists two terms having both high statistical significance and high fractional coverage that are absent from Table 2, namely, "apoptosis" and "response to unfolded protein," both of which have high relevance to cytotoxicity.
In terms of KEGG pathway analysis of targets and their partners (Table 6), the flagging of the MAP kinase pathway is particularly noteworthy. This finding is consistent with independent experimental evidence that also strongly supports a role for this pathway in apoptosis and other pathological cellular responses to toxic chemicals [37–40]. There is also considerable interest in the involvement of the immune system in some forms of reactive metabolite-mediated toxicity in vivo [4, 5]. The flagging of four pathways related to the nervous system was unexpected, and will require further analysis to evaluate. Nevertheless, it appears that our approach of considering not just target proteins but also the proteins with which they interact in the cell may provide clues to downstream events important to cytotoxicity and thereby help in elucidating mechanisms of cytotoxicity. Stated another way, it is perhaps essential PPIs rather than proteins per se that are the important targets of reactive metabolites in living cells.
Machine learning approaches to elucidate factors that differentiate target from non-target proteins
In addition to identifying the target proteins whose modification appears to initiate cytotoxic responses, we also examined them more specifically to elucidate protein features that predict susceptibility to reactive metabolites. Because the reactive metabolites of the 18 protoxins included in the TPDB span a wide range of chemical reactivity and hydrophobicity, and because for many of them only a few target proteins are known, we decided to use the set of 82 proteins that are targeted in vivo by thiobenzamide metabolites (n = 62), bromobenzene metabolites (n = 45) or both (n = 25) for this study. Thiobenzamide forms a reactive acylating metabolite with a very strong preference for reaction with amine side chains on lysine  or phosphatidyl-ethanolamine (PE) lipids . On the other hand, prior analyses of proteins adducted by bromobenzene metabolites indicates a strong preference for adduction on cysteine sulfhydryls [42, 43] as opposed to lysine or histidine  or PE lipids . Thus, one might expect the composition of the proteins targeted by these two metabolites to reflect their differing chemical selectivities to at least some extent.
To investigate this we used the random forest algorithm developed by Breiman . Random forest is an ensemble approach that combines many individual classifiers to formulate a robust composite classifier. It is particularly suitable in classifying high-dimensional and noisy data, and it can handle a mixture of both categorical and continuous predictors such as those in Table 1. Each of the classifiers is built on a bootstrap sample of the data and utilizes a random subset of the available variables (predictors) without pruning to obtain low-bias trees. The random forest algorithm has been applied to a broad range of classification tasks and has demonstrated superior performance compared to other classification algorithms .
We used five-fold cross validation to estimate the performance of the classification models. In brief, 82 target proteins and the 11482 rat proteins taken as a negative control set (see Methods) were randomly split into 5 equal portions. One portion was reserved as a test data set while the other four were pooled and used as a training set. A random forest classifier with 10,000 trees was built using 211 protein features (see Methods section) and the performance of the model was then evaluated using the reserved dataset. This process was repeated four times, each time starting with a different one of the five subsets of proteins as the test set. The results from all five repeats were then combined to afford an overall performance estimation.
Dennehy et al. investigated the reaction of two model cysteine-reactive electrophiles (IAB, a cytotoxic iodoacetamide derivative, and BMCC, a reactive but non-cytotoxic maleimide derivative) with proteins from the nuclear and cytosolic fractions of HEK293 cells . They found that 89% and 85%, respectively, of 539 proteins that became adducted were selectively adducted at only one or two cysteines per protein. The overlap of target proteins hit by IAB vs. BMCC is only about 20%. In an attempt to explain this selectivity they analyzed the frequency of occurrence of lysine, arginine, histidine or a second cysteine within the first five residues on either side of the adducted cysteine. Some differences in the frequencies were noted for target vs. non-target proteins, but unfortunately, the most likely reasons for the selective reactivity of only a few cysteines among many, namely, protection by intramolecular disulfide formation and/or steric occlusion within the protein, could not adequately be evaluated because such structural information is generally not available for a majority of proteins. In our analysis of features of target proteins we used only two large sets of targets, one from a lysine-selective metabolite and one from a cysteine-selective metabolite. The comparative unimportance of cysteine (vs. lysine) in differentiating target proteins from non-target proteins probably indicates that while a given reactive metabolite may require a specific type of target nucleophile, the target site is only one factor, and a possibly not a sufficient factor overall, for differentiating target from non-target proteins.
The second most important feature of target proteins is the percentage of aspartic acid. Aspartic acid residues are generally not considered to be nucleophilic target sites for most reactive electrophilic metabolites. It is possible that the higher content of aspartic acid provides negative charges to balance the positive charges resulting from a higher percentage of lysine in target proteins; however, the equally negatively charged glutamic acid did not play a significant role in distinguishing target proteins (Figure 2). Moreover, we find that the correlation between the content of lysine vs. aspartic or glutamic acid in the non-target set is insignificant (0.183 and 0.363 respectively), in agreement with the wide range of pI values that proteins span. Clearly more work will be required to understand why some abundant proteins receive little adduction while some low-abundance proteins become more heavily adducted by a given reactive metabolite.
The third most important feature of target proteins is their predicted index of instability (with respect to physiological degradation and turnover). If we consider proteins with a calculated instability index ≥ 40 as unstable, and those with an index < 40 as stable , then 57 out of the 82 target proteins (70%) are classified as stable while only 26% of the proteins in the negative learning set (i.e., 2970 out of 11482) are classified as stable (Fisher exact test p = 2.26E-16). The relative instability of a covalent metabolite-protein adduct can arise from 1) chemical instability of the covalent linkage, 2) a high intrinsic rate of turnover of the protein itself, or 3) an enhancement of protein turnover induced by adduction. A priori, the stability of a protein can influence both the detection and analysis of its adducted forms and the cellular consequences of its adduction (e.g. cytotoxicity). Surveys of target proteins are usually conducted at just a single time after dosing, chosen to maximize the total amount of covalent binding. Proteins whose turnover is rapid relative to the endpoint time will thus experience a lower exposure to reactive metabolite and a greater likelihood of removal by degradation, resulting in a lower apparent rate of adduction compared to proteins that turnover slowly. If adduction flags a protein for accelerated destruction, the result will be the same. This may be one reason that adduct densities vary so widely across a target proteome (for example, see ). If adduct density is lowered sufficiently (to the edge of detectability by phosphorinaging, for example), the protein may then be missed altogether as a target. On the other hand, since a relatively high proportion of target proteins turn over less rapidly than non-target proteins, their adduction may result in a more persistent perturbation to endogenous PPIs and signaling mechanisms and may therefore contribute more to cytotoxicity.
Interestingly, Lin et al.  recently found a major difference between IAB and BMCC in terms of adduct stability in vivo (as monitored by Western blotting of cell extracts). Both agents form covalent adducts that do not dissociate chemically or in cell-free fractions, but in living cells, the extent of protein adduction by IAB increases continuously over 6 hours, whereas adducts formed by BMCC peak at around 20 minutes after exposure and diminish rapidly thereafter. Their disappearance is strongly retarded at 4°C, suggesting that the disappearance may be enzyme mediated. Two other acetamide/maleimide pairs showed similar differences in protein adduct stability. Lin et al. noted that the rapid clearance of maleimide adducts coincided with their relative lack of toxicity and suggested that protein adduct stability is a critical requirement for the induction of cellular responses. Their observation of declining adduct levels over time can be explained by reversal of the adduction process, and/or by the degradation of protein molecules bearing adducts. Further work will clearly be required to address these possibilities, but it is encouraging that results of both bioinformatic and laboratory analysis of target proteins are pointing in the same direction with respect to the role(s) and importance of protein turnover and adduct persistence in eliciting cytotoxicity.
The covalent binding of reactive metabolites to cellular proteins has long been associated with the production of acutely cytotoxic effects. The past several years have witnessed much progress toward identifying their reactive metabolites and the specific intracellular proteins that become adducted, often highly selectively, by reactive metabolites from a number of different protoxicants. For the better-studied protoxicants, much is known about the structures and reactivities of their reactive metabolite(s), the structures of the adducts they form on proteins, and the identities of many of their protein targets . For technical reasons it has proven challenging to elucidate which specific residues on a given target protein become modified , but progress is being made in this area as well .
The challenge has been to use the available information to discover factors that govern the selectivity of adduction (both among and within different proteins), and to discover mechanisms that link protein adduction to cytotoxicity. As of this writing, the TPDB lists 32 compounds whose reactive metabolites present widely varying chemical reactivities and hydrophobicities, and a total of 268 proteins that are modified by reactive in animals or living cells in vitro . However, for only 9 of these 32 compounds, one of which is not cytotoxic, are 15 or more target proteins known, while for 19 other compounds fewer than 7 target proteins are known. Thus, considering the breadth of the phenomenon, the descriptive data are still rather sparse. Little commonality of targets among different protoxins is apparent, and analyses of target proteins per se have failed to illuminate mechanisms linking covalent binding to toxic outcomes.
Because proteins in cells interact specifically and extensively with other proteins, we hypothesized that xenobiotic adduction might disrupt endogenous PPIs and signaling pathways vital to cellular homeostasis and survival. In the current work we found that the human orthologs of 21 common rat or mouse target proteins have 165 direct-interacting partners that participate in a total of 529 PPIs. These 186 proteins are significantly concentrated in several GO categories and KEGG pathways that experimental studies by others have shown to be highly relevant to cell signaling and cell survival. This suggests that compared to direct analysis of target proteins, further bioinformatic analysis of proteins that interact with greater numbers of target proteins may be able to point toward more fruitful areas for generating and testing hypotheses about mechanisms of toxicity.
Finaly, given the chemical and structural diversity of cellular proteins, it seems unlikely that simple principles of chemical reactivity will in themselves play an important role in differentiating target from non-target proteins. As shown by studies with bromophenol, m-hydroxyacetanilide, and mycophenolic acid, not all protein adduction has toxic concequences. It seems equally likely that among adducts generated from toxic compounds, only some but not all will have toxic consequences. It will take time and much more detailed information about protein adduction in living cells to sort this out, and the outcome is likely to point to a number of mechanistic paths from protein modification to cellular impairment or death. In the meantime, one aspect of adduct chemistry that experiments and bioinformatic analysis both suggest may be important is the persistence of the adducted protein in the cell. Unstable adducts that dissociate, and adducted proteins that turnover rapidly, may be less effective at perturbing cellular homeostasis and injuring the cell than more durable adducts.
This work was supported by NIH grant GM-21784 (to R. P. H.) and the Kansas-INBRE (NIH grant RR016475).
- Miller EC, Miller JA: The presence and significance of bound aminoazo dyes in the livers of rats fed p-dimethylaminoazobenzene. Cancer Res. 1947, 7: 468-480.
- Miller JA: Carcinogenesis by chemicals – an overview. Cancer Res. 1970, 30: 559-PubMed
- Evans DC, Watt AP, Nicoll-Griffith DA, Baillie TA: Drug-protein covalent adducts: An industry perspective on minimizing the potential for drug bioactivation in drug discovery and development. Chem Res Toxicol. 2004, 17: 3-16. 10.1021/tx034170b.View ArticlePubMed
- Park BK, Kitteringham NR, Maggs JL, Pirmohamed M, Williams DP: The role of metabolic activation in drug-induced hepatotoxicity. Ann Rev Pharmacol Toxicol. 2005, 45: 177-202. 10.1146/annurev.pharmtox.45.120403.100058.View Article
- Uetrecht J: Idiosyncratic drug reactions: Past, present and future. Chem Res Toxicol. 2008, 19: 20-29.
- Hanzlik RP, Fang J, Koen YM: Filling and mining the reactive metabolite target protein database. Chem-Biol Interactions. 2009, 179: 38-44. 10.1016/j.cbi.2008.08.016.View Article
- Ikehata K, Duzhak TG, Galeva NA, Ji T, Koen YM, Hanzlik RP: Protein targets of reactive metabolites of thiobenzamide in rat liver in vivo. Chem Res Toxicol. 2008, 21: 1432-1442. 10.1021/tx800093k.PubMed CentralView ArticlePubMed
- Qiu YC, Benet LZ, Burlingame AL: Identification of the hepatic protein targets of reactive metabolites of acetaminophen in vivo in mice using two-dimensional gel electrophoresis and mass spectrometry. J Biol Chem. 1998, 273: 17940-17953. 10.1074/jbc.273.28.17940.View ArticlePubMed
- The reactive metabolite target protein database. [http://tpdb.medchem.ku.edu/tpdb.html]
- Hanzlik RP, Koen YM, Theertham B, Dong YH, Fang JW: The reactive metabolite target protein database (TPDB) – a web-accessible resource. Bmc Bioinformatics. 2007, 8:
- Asif AR, Armstrong VW, Voland A, Wieland E, Oellerich M, Shipkova M: Proteins identified as targets of the acyl glucuronide metabolite of mycophenolic acid in kidney tissue from mycophenolate mofetil treated rats. Biochemie. 2007, 89: 393-402. 10.1016/j.biochi.2006.09.016.View Article
- Druckova A, Mernaugh RL, Ham AL, Marnett LJ: Identification of the protein targets of the reactive metabolite of teucrin a in vivo in the rat. Chem Res Toxicol. 2007, 20: 1393-1408. 10.1021/tx7001405.View ArticlePubMed
- Koen YM, Gogichaeva NV, Alterman MA, Hanzlik RP: A proteomic analysis of bromobenzene reactive metabolite targets in rat liver cytosol in vivo. Chem Res Toxicol. 2007, 20: 511-519. 10.1021/tx6003166.View ArticlePubMed
- Meier BW, Gomez JD, Zhou A, Thompson JA: Immunochemical and proteomic analysis of covalent adducts formed by quinone methide tumor promoters in mouse lung epithelial cell lines. Chem Res Toxicol. 2005, 18: 1575-1585. 10.1021/tx050108y.View ArticlePubMed
- The gene ontology project. [http://www.geneontology.org/]
- Kyoto encyclopedia of genes and genomes. [http://www.genome.jp/kegg/]
- Boyault C, Sadouls K, Pabion M, Khochbin S: HDAC6, at the crossroads between cytoskeleton and cell signalling by acetylation and ubiquitination. Oncogene. 2007, 26: 5468-5476. 10.1038/sj.onc.1210614.View ArticlePubMed
- Tarassov K, Messier V, Landry CR, Radinovic S, Serna MM, Molina S, Shames I, Malitskaya Y, Vogel J, Bussey H, Michnick SW: An in vivo map of the yeast protein interactome. Science. 2008, 320: 1465-1470. 10.1126/science.1153878.View ArticlePubMed
- Yang X-J, Seto E: Lysine acetylation: Codified crosstalk with other posttrantlational modifications. Molecular Cell. 2008, 31: 449-461. 10.1016/j.molcel.2008.07.002.PubMed CentralView ArticlePubMed
- Yu H, Braun P, Yildirim MA, Lemmens I, Vankatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N: High-quality binary protein interaction map of the yeast interactome network. Science. 2008, 322: 104-110. 10.1126/science.1158684.PubMed CentralView ArticlePubMed
- Dennehy MK, Richards KAM, Wernke GR, Shyr Y, Liebler DC: Cytosolic and nuclear protein targets of thiol-reactive electrophiles. Chem Res Toxicol. 2006, 19: 20-29. 10.1021/tx050312l.View ArticlePubMed
- Fang JW, Dong YH, Williams TD, Lushington GH: Feature selection in validating mass spectrometry database search results. Journal of Bioinformatics and Computational Biology. 2008, 6: 223-240. 10.1142/S0219720008003345.View ArticlePubMed
- Uniprot (universal protein resource). [http://www.ebi.ac.uk/uniprot/]
- Li WZ, Godzik A: Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.View ArticlePubMed
- Human protein reference database. [http://www.hprd.org/]
- Bingo: A biological network gene ontology tool. [http://www.psb.ugent.be/cbd/papers/BiNGO/index.htm]
- Randomforest: Breiman and Cutler's random forests for classification and regression. [http://cran.r-project.org/web/packages/randomForest/index.html]
- The r project for statistical computing. [http://www.r-project.org/]
- Shen JW, Zhang J, Luo XM, Zhu WL, Yu KQ, Chen KX, Li YX, Jiang HL: Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci USA. 2007, 104: 4337-4341. 10.1073/pnas.0607879104.PubMed CentralView ArticlePubMed
- Slabinski L, Jaroszewski L, Rodrigues APC, Rychlewski L, Wilson IA, Lesley SA, Godzik A: The challenge of protein structure determination – lessons from structural genomics. Protein Science. 2007, 16: 2472-2482. 10.1110/ps.073037907.PubMed CentralView ArticlePubMed
- Guruprasad K, Reddy BVB, Pandit MW: Correlation between stability of a protein and its dipeptide composition: A novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering. 1990, 4: 155-161. 10.1093/protein/4.2.155.View ArticlePubMed
- Breiman L: Random forests. Machine Learning. 2001, 45: 5-32. 10.1023/A:1010933404324.View Article
- Maere S, Heymans K, Kuiper M: Bingo: A cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21: 3448-3449. 10.1093/bioinformatics/bti551.View ArticlePubMed
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMed CentralView ArticlePubMed
- Sherman BT, Huang DW, Tan QN, Guo YJ, Bour S, Liu D, Stephens R, Baseler MW, Lane HC, Lempicki RA: David knowledgebase: A gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. Bmc Bioinformatics. 2007, 8:
- Yu HY, Luscombe NM, Lu HX, Zhu XW, Xia Y, Han JDJ, Bertin N, Chung S, Vidal M, Gerstein M: Annotation transfer between genomes: Protein-protein interologs and protein-DNA regulogs. Genome Research. 2004, 14: 1107-1118. 10.1101/gr.1774904.PubMed CentralView ArticlePubMed
- Cuevas BD, Abell AN, Johnson GL: Role of mitogen-activated protein kinase kinase kinases in signal integration. Oncogene. 2007, 26: 3159-3171. 10.1038/sj.onc.1210409.View ArticlePubMed
- Czaja MJ: Cell signaling in oxidative stress-induced liver injury. Seminars in Liver Disease. 2007, 27: 378-389. 10.1055/s-2007-991514.View ArticlePubMed
- Genestra M: Oxyl radicals, redox-sensitive signalling cascades and antioxidants. Cellular Signalling. 2007, 19: 1807-1819. 10.1016/j.cellsig.2007.04.009.View ArticlePubMed
- Henderson NC, Pollock KJ, Frew J, Mackinnon AC, Flavell RA, Davis RJ, Sethi T, Simpson KJ: Critical role of c-jun (NH2) terminal kinase in paracetamol-induced acute liver failure. Gut. 2007, 56: 982-990. 10.1136/gut.2006.104372.PubMed CentralView ArticlePubMed
- Ji T, Ikehata K, Koen YM, Esch SW, Williams TD, Hanzlik RP: Covalent modification of microsomal lipids by thiobenzamide metabolites in vivo. Chem Res Toxicol. 2007, 20: 701-708. 10.1021/tx600362h.PubMed CentralView ArticlePubMed
- Slaughter DE, Hanzlik RP: Identification of epoxide- and quinone-derived bromobenzene adducts to protein sulfur nucleophiles. Chem Res Toxicol. 1991, 4: 349-359. 10.1021/tx00021a015.View ArticlePubMed
- Slaughter DE, Zheng J, Harriman S, Hanzlik RP: Identification of covalent adducts to protein sulfur nucleophiles by alkaline permethylation. Anal Biochem. 1993, 208: 288-295. 10.1006/abio.1993.1048.View ArticlePubMed
- Bambal RB, Hanzlik RP: Bromobenzene-3,4-oxide alkylates histidine and lysine side chains of rat liver proteins in vivo. Chem Res Toxicol. 1995, 8: 729-735. 10.1021/tx00047a013.View ArticlePubMed
- Lin D, Saleh S, Liebler DC: Reversibility of covalent electrophile-protein adducts and chemical toxicity. Chem Res Toxicol. 2008, 21: 2361-2369. 10.1021/tx800248x.PubMed CentralView ArticlePubMed
- Koen YM, Yue W, Galeva NA, Williams TD, Hanzlik RP: Site-specific arylation of rat glutathione s-transferase a1 and a2 by bromobenzene metabolites in vivo. Chem Res Toxicol. 2006, 19: 1426-1434. 10.1021/tx060142s.PubMed CentralView ArticlePubMed
- Cheng J, Randall AZ, Sweredoski MJ, Baldi P: Scratch: A protein structure and structural feature prediction server. Nucl Acids Res. 2005, 33: W72-W76. 10.1093/nar/gki396.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.