Partager

Publications

Sont listées ci-dessous, par année, les publications figurant dans l'archive ouverte HAL.

2009

  • Switching from an induced-fit to a lock-and-key mechanism in an aminoacyl-tRNA synthetase with modified specificity.
    • Schmitt Emmanuelle
    • Tanrikulu I Caglar
    • Yoo Tae Hyeon
    • Panvert Michel
    • Tirrell David A
    • Mechulam Yves
    Journal of Molecular Biology, Elsevier, 2009, 394 (5), pp.843-51. Methionyl-tRNA synthetase (MetRS) specifically binds its methionine substrate in an induced-fit mechanism, with methionine binding causing large rearrangements. Mutated MetRS able to efficiently aminoacylate the methionine (Met) analog azidonorleucine (Anl) have been identified by saturation mutagenesis combined with in vivo screening procedures. Here, the crystal structure of such a mutated MetRS was determined in the apo form as well as complexed with Met or Anl (1.4 to 1.7 A resolution) to reveal the structural basis for the altered specificity. The mutations result in both the loss of important contacts with Met and the creation of new contacts with Anl, thereby explaining the specificity shift. Surprisingly, the conformation induced by Met binding in wild-type MetRS already occurs in the apo form of the mutant enzyme. Therefore, the mutations cause the enzyme to switch from an induced-fit mechanism to a lock-and-key one, thereby enhancing its catalytic efficiency. (10.1016/j.jmb.2009.10.016)
    DOI : 10.1016/j.jmb.2009.10.016
  • Tetracycline-tet repressor binding specificity: insights from experiments and simulations.
    • Aleksandrov Alexey
    • Schuldt Linda
    • Hinrichs Winfried
    • Simonson Thomas
    Biophysical Journal, Biophysical Society, 2009, 97 (10), pp.2829-38. Tetracycline (Tc) antibiotics have been put to new uses in the construction of artificial gene regulation systems, where they bind to the Tet repressor protein (TetR) and modulate its affinity for DNA. Many Tc variants have been produced, both to overcome bacterial resistance and to achieve a broad range of binding strengths. To better understand TetR-Tc binding, we investigate a library of 16 tetracyclines, using fluorescence experiments and molecular dynamics free energy simulations (MDFE). The relative TetR binding free energies are computed by reversibly transforming one Tc variant into another during the simulation, with no adjustable parameters. The chemical variations involve polar and nonpolar substitutions along one entire edge of the elongated Tc structure, which provides many of the protein-ligand contacts. The binding constants span five orders of magnitude. The simulations reproduce the experimental binding free energies, when available, within the uncertainty of either method (+/-0.5 kcal/mol), and reveal many additional details. Contributions of individual Tc substituents are evaluated, along with their additivity and transferability among different positions on the Tc scaffold; differences between D- and B-class repressors are quantified. With increasing computer power, the MDFE approach provides an attractive complement to experiment and should play an increasing role in the understanding and engineering of protein-ligand recognition. (10.1016/j.bpj.2009.08.050)
    DOI : 10.1016/j.bpj.2009.08.050
  • Computational protein design as a tool for fold recognition.
    • Schmidt Am Busch Marcel
    • Mignon David
    • Simonson Thomas
    Proteins - Structure, Function and Bioinformatics, Wiley, 2009, 77 (1), pp.139-58. Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed. (10.1002/prot.22426)
    DOI : 10.1002/prot.22426
  • Discovery of Escherichia coli methionyl-tRNA synthetase mutants for efficient labeling of proteins with azidonorleucine in vivo.
    • Tanrikulu I Caglar
    • Schmitt Emmanuelle
    • Mechulam Yves
    • Goddard William A
    • Tirrell David A
    Proceedings of the National Academy of Sciences of the United States of America, National Academy of Sciences, 2009, 106 (36), pp.15285-90. Incorporation of noncanonical amino acids into cellular proteins often requires engineering new aminoacyl-tRNA synthetase activity into the cell. A screening strategy that relies on cell-surface display of reactive amino acid side-chains was used to identify a diverse set of methionyl-tRNA synthetase (MetRS) mutants that allow efficient incorporation of the methionine (Met) analog azidonorleucine (Anl). We demonstrate that the extent of cell-surface labeling in vivo is a good indicator of the rate of Anl activation by the MetRS variant harbored by the cell. By screening at low Anl concentrations in Met-supplemented media, MetRS variants with improved activities toward Anl and better discrimination against Met were identified. (10.1073/pnas.0905735106)
    DOI : 10.1073/pnas.0905735106
  • Challenges in the computational design of proteins.
    • Suarez Maria
    • Jaramillo A.
    Journal of the Royal Society Interface, the Royal Society, 2009, 6 Suppl 4 (SUPPL. 4), pp.S477-91. Protein design has many applications not only in biotechnology but also in basic science. It uses our current knowledge in structural biology to predict, by computer simulations, an amino acid sequence that would produce a protein with targeted properties. As in other examples of synthetic biology, this approach allows the testing of many hypotheses in biology. The recent development of automated computational methods to design proteins has enabled proteins to be designed that are very different from any known ones. Moreover, some of those methods mostly rely on a physical description of atomic interactions, which allows the designed sequences not to be biased towards known proteins. In this paper, we will describe the use of energy functions in computational protein design, the use of atomic models to evaluate the free energy in the unfolded and folded states, the exploration and optimization of amino acid sequences, the problem of negative design and the design of biomolecular function. We will also consider its use together with the experimental techniques such as directed evolution. We will end by discussing the challenges ahead in computational protein design and some of their future applications. (10.1098/rsif.2008.0508.focus)
    DOI : 10.1098/rsif.2008.0508.focus
  • Towards the automated engineering of a synthetic genome.
    • Carrera J.
    • Rodrigo G.
    • Jaramillo A.
    Molecular BioSystems, Royal Society of Chemistry, 2009, 5 (7), pp.733-43. The development of the technology to synthesize new genomes and to introduce them into hosts with inactivated wild-type chromosome opens the door to new horizons in synthetic biology. Here it is of outmost importance to harness the ability of using computational design to predict and optimize a synthetic genome before attempting its synthesis. The methodology to computationally design a genome is based on an optimization that computationally mimics genome evolution. The biggest bottleneck lies on the use of an appropriate fitness function. This fitness function, usually cell growth, relies on the ability to quantitatively model the biochemical networks of the cell at the genome scale using parameters inferred from high-throughput data. Computational methods integrating such models in a common multilayer design platform can be used to automatically engineer synthetic genomes under physiological specifications. We describe the current state-of-the-art on automated methods for engineering or re-engineering synthetic genomes. We restrict ourselves to global models of metabolism, transcription and DNA structure. Although we are still far from the de novo computational genome design, it is important to collect all relevant work towards this goal. Finally, we discuss future perspectives about the practicability of an automated methodology for such computational design of synthetic genomes. (10.1039/b904400k)
    DOI : 10.1039/b904400k
  • Modular model-based design for heterologous bioproduction in bacteria.
    • Landrain T.E.
    • Carrera J.
    • Kirov B.
    • Rodrigo G.
    • Jaramillo A.
    Current Opinion in Biotechnology, Elsevier, 2009, 20 (3), pp.272-9. We review the current status of expression of heterologous systems for bioenergy and bioproduction in bacteria using a model-based approach. As an aim for synthetic biology, it requires mathematical models of genetic modules that could be characterized independently of their context. This fastens the design of metabolic circuits using a combinatorial design approach, where given pathways could be optimized for maximal bioproduction, while being nontoxic for the chassis. We show how recent characterization of genetic parts, such as promoters, RBS or sRNAs could be used to fine-tune the expression of individual genes to achieve that goal. We also present lists of enzymes that are used for bioproduction, enlarging such set of biological parts. (10.1016/j.copbio.2009.06.003)
    DOI : 10.1016/j.copbio.2009.06.003
  • Contrast Enhancement of UV Absorption and Improved Biochip Imaging
    • Robin Kristelle
    • Reverchon J.L.
    • Brignon Arnaud
    • Mugherli Laurent
    • Fromant Michel
    • Plateau Pierre
    • Benisty Henri
    , 2009. Biochip using UV absorption for selective DNA or proteins imaging may take advantage of sensitivity enhancement thanks to either multilayer structures or grating structures. We discuss the interest of coupled angular and spectral illumination. (10.1364/CLEO.2009.CMG5)
    DOI : 10.1364/CLEO.2009.CMG5
  • Widespread distribution of cell defense against D-aminoacyl-tRNAs.
    • Wydau Sandra
    • van Der Rest G.
    • Aubard Caroline
    • Plateau Pierre
    • Blanquet Sylvain
    Journal of Biological Chemistry, American Society for Biochemistry and Molecular Biology, 2009, 284 (21), pp.14096-104. Several l-aminoacyl-tRNA synthetases can transfer a d-amino acid onto their cognate tRNA(s). This harmful reaction is counteracted by the enzyme d-aminoacyl-tRNA deacylase. Two distinct deacylases were already identified in bacteria (DTD1) and in archaea (DTD2), respectively. Evidence was given that DTD1 homologs also exist in nearly all eukaryotes, whereas DTD2 homologs occur in plants. On the other hand, several bacteria, including most cyanobacteria, lack genes encoding a DTD1 homolog. Here we show that Synechocystis sp. PCC6803 produces a third type of deacylase (DTD3). Inactivation of the corresponding gene (dtd3) renders the growth of Synechocystis sp. hypersensitive to the presence of d-tyrosine. Based on the available genomes, DTD3-like proteins are predicted to occur in all cyanobacteria. Moreover, one or several dtd3-like genes can be recognized in all cellular types, arguing in favor of the nearubiquity of an enzymatic function involved in the defense of translational systems against invasion by d-amino acids. (10.1074/jbc.M808173200)
    DOI : 10.1074/jbc.M808173200
  • Structural bases for 16 S rRNA methylation catalyzed by ArmA and RmtB methyltransferases.
    • Schmitt Emmanuelle
    • Galimand Marc
    • Panvert Michel
    • Courvalin Patrice
    • Mechulam Yves
    Journal of Molecular Biology, Elsevier, 2009, 388 (3), pp.570-82. Aminoglycosides are used extensively for the treatment of severe infections due to Gram-negative bacteria. However, certain species have become highly resistant after acquisition of genes for methyltransferases which catalyze post-transcriptional methylation of N7-G1405 in 16 S rRNA of 30 S ribosomal subunits. Inactivation of this enzymatic activity is therefore an important challenge for development of an effective therapy. The present work describes the crystallographic structures of methyltransferases RmtB and ArmA from clinical isolates. Together with biochemical experiments, the 3D structures indicate that the N-terminal domain specific for this family of methyltransferases is required for enzymatic activity. Site-directed mutagenesis has enabled important residues for catalysis and RNA binding to be identified. These high-resolution structures should underpin the design of potential inhibitors of these enzymes, which could be used to restore the activity of aminoglycosides against resistant pathogens. (10.1016/j.jmb.2009.03.034)
    DOI : 10.1016/j.jmb.2009.03.034
  • Protein design based on parallel dimensional reduction.
    • Moltó G.
    • Suarez Maria
    • Tortosa Pablo
    • Alonso J.M.
    • Hernández V.
    • Jaramillo A.
    Journal of Chemical Information and Modeling, American Chemical Society, 2009, 49 (5), pp.1261-71. The design of proteins with targeted properties is a computationally intensive task with large memory requirements. We have developed a novel approach that combines a dimensional reduction of the problem with a High Performance Computing platform to efficiently design large proteins. This tool overcomes the memory limits of the process, allowing the design of proteins whose requirements prevent them to be designed in traditional sequential platforms. We have applied our algorithm to the design of functional proteins, optimizing for both catalysis and stability. We have also studied the redesign of dimerization interfaces, taking simultaneously into account the stability of the subunits of the dimer. However, our methodology can be applied to any computational chemistry application requiring combinatorial optimization techniques. (10.1021/ci8004594)
    DOI : 10.1021/ci8004594
  • Model-based redesign of global transcription regulation.
    • Carrera J.
    • Rodrigo G.
    • Jaramillo A.
    Nucleic Acids Research, Oxford University Press, 2009, 37 (5), pp.e38. Synthetic biology aims to the design or redesign of biological systems. In particular, one possible goal could be the rewiring of the transcription regulation network by exchanging the endogenous promoters. To achieve this objective, we have adapted current methods to the inference of a model based on ordinary differential equations that is able to predict the network response after a major change in its topology. Our procedure utilizes microarray data for training. We have experimentally validated our inferred global regulatory model in Escherichia coli by predicting transcriptomic profiles under new perturbations. We have also tested our methodology in silico by providing accurate predictions of the underlying networks from expression data generated with artificial genomes. In addition, we have shown the predictive power of our methodology by obtaining the gene profile in experimental redesigns of the E. coli genome, where rewiring the transcriptional network by means of knockouts of master regulators or by upregulating transcription factors controlled by different promoters. Our approach is compatible with most network inference methods, allowing to explore computationally future genome-wide redesign experiments in synthetic biology. (10.1093/nar/gkp022)
    DOI : 10.1093/nar/gkp022
  • Molecular mechanics models for tetracycline analogs.
    • Aleksandrov Alexey
    • Simonson Thomas
    Journal of Computational Chemistry, Wiley, 2009, 30 (2), pp.243-55. Tetracyclines (Tcs) are an important family of antibiotics that bind to the ribosome and several proteins. To model Tc interactions with protein and RNA, we have developed a molecular mechanics force field for 12 tetracyclines, consistent with the CHARMM force field. We considered each Tc variant in its zwitterionic tautomer, with and without a bound Mg(2+). We used structures from the Cambridge Crystallographic Data Base to identify the conformations likely to be present in solution and in biomolecular complexes. A conformational search by simulated annealing was undertaken, using the MM3 force field, for tetracycline, anhydrotetracycline, doxycycline, and tigecycline. Resulting, low-energy structures were optimized with an ab initio method. We found that Tc and its analogs all adopt an extended conformation in the zwitterionic tautomer and a twisted one in the neutral tautomer, and the zwitterionic-extended state is the most stable in solution. Intermolecular force field parameters were derived from a standard supermolecule approach: we considered the ab initio energies and geometries of a water molecule interacting with each Tc analog at several different positions. The final, rms deviation between the ab initio and force field energies, averaged over all forms, was 0.35 kcal/mol. Intramolecular parameters were adopted from either the standard CHARMM force field, the ab initio structure, or the earlier, plain Tc force field. The model reproduces the ab initio geometry and flexibility of each Tc. As tests, we describe MD and free energy simulations of a solvated complex between three Tcs and the Tet repressor protein. (10.1002/jcc.21040)
    DOI : 10.1002/jcc.21040
  • Biodetection of DNA and proteins using enhanced UV absorption by structuration of the chip surface
    • Robin Kristelle
    • Reverchon Jean-Luc
    • Mugherli Laurent
    • Fromant Michel
    • Benisty Henri
    , 2009, 7188, pp.718804. DNA and protein absorption at 260 and 280 nm can be used to reveal theses species on a biochip UV image. A first study including the design and fabrication of UV reflective multilayer biochips designed for UV contrast enhancement (factor of 4.0) together with spectrally selective AlGaN detectors demonstrated the control of chip biological coating, or Antigen/Antibody complexation with fairly good signals for typical probe density of 4x1012 molecules/cm2. Detection of fractional monolayer molecular binding requires a higher contrast enhancement which can be obtained with structured chips. Grating structures enable, at resonance, a confinement of light at the biochip surface, and thus a large interaction between the biological molecule and the lightwave field. The highest sensitivity obtained with grating-based biochip usually concerns a resonance shift, in wavelength or diffraction angle. Diffraction efficiency is also affected by UV absorption, due to enhanced light-matter interaction, and this mechanism is equally able to produce biochip images in parallel. By adjusting grating parameters, we will see how a biochip that is highly sensitive to UV absorption at its surface can be obtained. Based on the Ewald construction and diffraction diagram, instrumental resolution and smarter experimental configurations are considered. Notably, in conjunction with the 2D UV-sensitive detectors recently developed in-house, we discuss the obtainment of large contrast and good signals in a diffraction order emerging around the sample normal. (10.1117/12.808124)
    DOI : 10.1117/12.808124
  • The universal Kae1 protein and the associated Bud32 kinase (PRPK), a mysterious protein couple probably essential for genome maintenance in Archaea and Eukarya
    • Hecker Arnaud
    • Graille Marc
    • Madec Edwige
    • Gadelle Danièle
    • Le Cam Eric
    • Van tilbergh Herman
    • Forterre Patrick
    Biochemical Society Transactions, Portland Press, 2009, 37 (1), pp.29-35. The similarities between essential molecular mechanisms in Archaea and Eukarya make it possible to discover, using comparative genomics, new fundamental mechanisms conserved between these two domains. We are studying a complex of two proteins conserved in Archaea and Eukarya whose precise biological role and biochemical function remain unknown. One of them is a universal protein known as Kae1 (kinase-asociated endopeptidase 1). The second protein is a serine/threonine kinase corresponding to the proteins Bud32 in Saccharomyces cerevisiae and PRPK (p53-related protein kinase) in humans. The genes encoding the archaeal orthologues of Kae1 and PRPK are either contiguous or even fused in many archaeal genomes. In S. cerevisiae, Kae1 and Bud32 (PRPK) belong to a chromatin-associated complex [KEOPS (kinase, endopeptidase and other proteins of small size)/EKC (endopeptidase-like kinase chromatin-associated)] that is essential for telomere elongation and transcription of essential genes. Although Kae1 is annotated as O-sialoglycoprotein endopeptidase in most genomes, we found that the Kae1 protein from Pyrococcus abyssi has no protease activity, but is an atypical DNA-binding protein with an AP (apurinic) lyase activity. The structure of the fusion protein from Methanocaldococcus jannaschii revealed that Kae1 maintains the ATP-binding site of Kae1 in an inactive configuration. We have in fact found that Kae1 inhibits the kinase activity of Bud32 (PRPK) in vitro. Understanding the precise biochemical function and biological role of these two proteins (which are probably essential for genome maintenance) remains a major challenge. (10.1042/bst0370029)
    DOI : 10.1042/bst0370029
  • Primary Structure Revision and Active Site Mapping of E. Coli Isoleucyl-tRNA Synthetase by Means of Maldi Mass Spectrometry.
    • Baouz S.
    • Schnitter J.-M.
    • Chenoune L.
    • Beauvallet C.
    • Blanquet Sylvain
    • Woisard A.
    • Houtondji C.
    The Open Biochemistry Journal, 2009, 3, pp.26-38. The correct amino acid sequence of E. coli isoleucyl-tRNA synthetase (IleRS) was established by means of peptide mapping by MALDI mass spectrometry, using a set of four endoproteases (trypsin, LysC, AspN and GluC). Thereafter, the active site of IleRS was mapped by affinity labeling with reactive analogs of the substrates. For the ATP binding site, the affinity labeling reagent was pyridoxal 5'-diphospho-5'-adenosine (ADP-PL), whereas periodate-oxidized tRNA(Ile), the 2',3'-dialdehyde derivative of tRNA(Ile) was used to label the binding site for the 3'-end of tRNA on the synthetase. Incubation of either reagent with IleRS resulted in a rapid loss of both the tRNA(Ile) aminoacylation and isoleucinedependent isotopic ATP-PPi exchange activities. The stoichiometries of IleRS labeling by ADP-PL or tRNA(Ile)ox corresponded to 1 mol of reagent incorporated per mol of enzyme. Altogether, the oxidized 3'-end of tRNA(Ile) and the pyridoxal moiety of the ATP analog ADP-PL react with the lysyl residues 601 and 604 of the consensus sequence (601)KMSKS(605). Identification of the binding site for L-isoleucine or for non cognate amino acids on E. coli IleRS was achieved by qualitative comparative labeling of the synthetase with bromomethyl ketone derivatives of L-isoleucine (IBMK) or of the non-cognate amino acids valine (VBMK), phenylalanine (FBMK) and norleucine (NleBMK). Labeling of the enzyme with IBMK resulted in a complete loss of isoleucine-dependent isotopic [(32)P]PPi-ATP exchange activity. VBMK, NleBMK and FBMK were also capable of abolishing the activity of IleRS, FBMK being the less efficient in inactivating the synthetase. Analysis by MALDI mass spectrometry designated cysteines-462 and -718 as the target residues of the substrate analog IBMK on E. coli IleRS, whereas VBMK, NleBMK and FBMK labeled in common His-394, His-478 and Cys-718. In addition, VBMK and NleBMK, which are chemically similar to IBMK, were found covalently bound to Cys-462, and VBMK was specifically attached to His-332 (or His-337) of the synthetase. The amino acid residues labeled by the substrate analogs are mainly distributed between three regions in the primary structure of E. coli IleRS: these are segments [325-394], [451-479] and [591-604]. In the 3-D structures of IleRS from T. thermophilus and S. aureus, the [325-394] stretch is part of the editing domain, while fragments [451-479] and [591-604] representing the isoleucine binding domain and the dinucleotide (or Rossmann) fold domain, respectively, are located in the catalytic core. His-332 of E. coli IleRS, that is strictly conserved among all the available IleRS sequences is located in the editing active site of the synthetase. It is proposed that His-332 of E. coli IleRS participates directly in hydrolysis, or helps to deprotonate the hydroxyl group of threonine at the hydrolytic site. (10.2174/1874091X00903010026)
    DOI : 10.2174/1874091X00903010026
  • Reverse-engineering the Arabidopsis thaliana transcriptional network under changing environmental conditions
    • Carrera J.
    • Rodrigo G.
    • Jaramillo A.
    • Elena S.F.
    Genome Biology, BioMed Central, 2009, 10 (9), pp.R96. BACKGROUND: Understanding the molecular mechanisms plants have evolved to adapt their biological activities to a constantly changing environment is an intriguing question and one that requires a systems biology approach. Here we present a network analysis of genome-wide expression data combined with reverse-engineering network modeling to dissect the transcriptional control of Arabidopsis thaliana. The regulatory network is inferred by using an assembly of microarray data containing steady-state RNA expression levels from several growth conditions, developmental stages, biotic and abiotic stresses, and a variety of mutant genotypes. RESULTS: We show that the A. thaliana regulatory network has the characteristic properties of hierarchical networks. We successfully applied our quantitative network model to predict the full transcriptome of the plant for a set of microarray experiments not included in the training dataset. We also used our model to analyze the robustness in expression levels conferred by network motifs such as the coherent feed-forward loop. In addition, the meta-analysis presented here has allowed us to identify regulatory and robust genetic structures. CONCLUSIONS: These data suggest that A. thaliana has evolved high connectivity in terms of transcriptional regulation among cellular functions involved in response and adaptation to changing environments, while gene networks constitutively expressed or less related to stress response are characterized by a lower connectivity. Taken together, these findings suggest conserved regulatory strategies that have been selected during the evolutionary history of this eukaryote. (10.1186/gb-2009-10-9-r96)
    DOI : 10.1186/gb-2009-10-9-r96
  • Computational protein design with side-chain conformational entropy.
    • Sciretti D.
    • Bruscolini P.
    • Pelizzola A.
    • Pretti M.
    • Jaramillo A.
    Proteins - Structure, Function and Bioinformatics, Wiley, 2009, 74 (1), pp.176-91. Recent advances in modeling protein structures at the atomic level have made it possible to tackle "de novo" computational protein design. Most procedures are based on combinatorial optimization using a scoring function that estimates the folding free energy of a protein sequence on a given main-chain structure. However, the computation of the conformational entropy in the folded state is generally an intractable problem, and its contribution to the free energy is not properly evaluated. In this article, we propose a new automated protein design methodology that incorporates such conformational entropy based on statistical mechanics principles. We define the free energy of a protein sequence by the corresponding partition function over rotamer states. The free energy is written in variational form in a pairwise approximation and minimized using the Belief Propagation algorithm. In this way, a free energy is associated to each amino acid sequence: we use this insight to rescore the results obtained with a standard minimization method, with the energy as the cost function. Then, we set up a design method that directly uses the free energy as a cost function in combination with a stochastic search in the sequence space. We validate the methods on the design of three superficial sites of a small SH3 domain, and then apply them to the complete redesign of 27 proteins. Our results indicate that accounting for entropic contribution in the score function affects the outcome in a highly nontrivial way, and might improve current computational design techniques based on protein stability. (10.1002/prot.22145)
    DOI : 10.1002/prot.22145