NEWS
[This article is a reprinted article, non-original. Article source: Sequencing China, if there is any infringement, please contact us in time]
Protein is the main carrier of biological function. Many diseases that cannot be explained at the genetic level, protein can give us the answer we want, for this reason, proteomics came into being. Scientists predict that with the completion of the sequencing of the human genome, the focus of life science research in the 21st century will shift from genomics to proteomics. Proteomics is the core content of life science research in the post-genome era. To understand the protein and further understand the molecular mechanism of life activities and disease occurrence, we must first support the appropriate protein sequencing technology. Many attempts have been made to improve protein sequencing technologies, such as Edman degradation, fluorescence staining, and mass spectrometry sequencing. However, existing sequencing methods have various technical deficiencies and application limitations, which is not conducive to the application of proteomics in life sciences and biomedical research.
Recently, Dr. Ben C Collins and Dr. Ruedi Aebersold from the Institute of Molecular Biology at the Federal Institute of Technology in Zurich, Switzerland, published a review article on Proteomics goes parallel on the parallel sequencing of proteomics at Nature Biotechnology. The editor edited and distributed the article to everyone.
Introduction: Large-scale parallel sequencing of peptides may herald a new era of high-throughput proteomics.
At present, proteomic sequencing technology is not as strong as genomics and transcriptomics. The performance of nucleic acid sequencing technology is impressive because it uses fluorescence as a reading for large-scale parallel sequencing of short oligonucleotides. On this issue, Swaminathan et al. demonstrated that peptides can also be subjected to parallel fluorescence sequencing. Their innovative approach integrates classical protein sequencing technology with nucleic acid optical sequencing systems. Although this method still needs further optimization, it shows us the development prospects of a universally feasible, reliable and truly universal proteomics sequencing technology.
Proteins are essential for life systems. They can serve as a catalyst for chemical catalysts, structural components, and physiological processes. Research techniques that accurately identify and quantify proteins can greatly enhance understanding of biology. Today, proteomes can already be predicted or inferred by transcriptomes. There is ample research evidence that the link between protein and mRNA levels is complex, and one group predicts that the other is imprecise and unreliable. So, why in many cases, would people prefer to predict proteins by mRNA rather than directly sequencing them? The answer lies in the development of two omics sequencing technologies and the detectability of the substances themselves. At present, biologists can obtain basic and complete transcriptome information and analysis results through existing core technologies and commercial companies, while proteomic analysis is still limited to professional laboratory research, in terms of flux, stability and reproducibility. The level of transcriptome analysis has not yet been reached.
The first generation of DNA sequencers mapped a groundbreaking genomic map based on the sequential sequencing of isolated DNA fragments. Although the instrument uses automation technology, the entire sequencing process is slow and expensive. Extensive genome analysis can only be performed by developing methods that can sequence millions of nucleic acid fragments in parallel, enabling high-throughput, high-coverage, and low-cost generation of complete genome maps. These commercially valuable sequencing technologies have changed biomedical research and become the mainstay of experimental biology research.
Although “top-down” proteomics research methods are gradually evolving, traditional protein quantification and sequencing are still carried out using a “bottom-up” approach. As with the principle of gene sequencing, these methods analyze the protein composition by detecting the enzymatic reaction to cleave the peptide chain produced by the protein. In the 1950s, Pehr Edman invented a method for determining the amino acid sequence of a peptide chain by cyclic chemical reaction, known as Edman degradation. The method is characterized in that the phenyl isothiocyanate is coupled with an accessible amino group, and then the amino acid is released from the N-terminus of the peptide chain to form a new N-terminus, and the process is repeated, and the released amino acid is identified to obtain a peptide chain. Amino acid sequence. Edman’s degradation process is slow and requires a large amount of high purity peptides. However, until the early 1990s, all known protein sequences were determined using this method.
In the 1990s, as mass spectrometry (MS) technology became the preferred method for protein sequencing, Edman degradation was second to none in the field. Mass spectrometry is based on the determination of the mass-to-charge ratio and the fragmentation pattern of the peptide to infer protein composition and quantification. Due to its advanced, powerful and diverse characteristics, MS has been widely used. Following the development of genomics technology, MS has evolved from manual sequencing of specific oligomers to automated sequencing of high-throughput peptide chains and to parallel sequencing of peptides by independent data analysis, such as SWATH-MS. Although the throughput, accuracy, and reproducibility of these methods are excellent, it is still difficult to achieve regular, complete proteomic quantification goals for similar large sample cohorts, as with genomic analysis.
With the continuous development of current data independent acquisition MS detection systems, it is also possible to achieve protein sequencing technology that has similar performance to genomics research techniques. In addition, to understand the complexity of the proteome, there is also a need for disruptive new technologies. Although the nanopore sequencing technology of protein shows good development prospects, the peptide fluorescence sequencing method developed by StimaaNet has a clear routine application and can be regarded as one of the most advanced examples of such disruptive technology. The peptide fluorescence sequencing method is a combination of eras that combines the almost forgotten Edman degradation with the large-scale parallel fluorescence imaging technology developed for next-generation DNA sequencing (Figure 1).
Figure 1. Peptide fluorescence sequencing as described by Swaminathan et al.
Complex peptide mixtures, most likely derived from enzymatic or chemically cleaved protein extracts, each with a different fluorescent label (left). In this case, we describe a two-color scheme in which lysine and cysteine residues are labeled with different fluorescent colors. The C-terminus of the labeled peptide was immobilized on a glass plate using an amide bond of an aminosilane. The peptide is then subjected to an iterative cycle of N-terminal amino acid residue cleavage by Edman degradation and fluorescence imaging (middle). The fluorescence intensity at each position (ie peptide) is tracked as a function of the Edman cycle. The pattern of fluorescence intensity reduction provides an annotation of the partial sequence of the peptide, and the resulting fluorescent signal can be matched and scored in a protein sequence database to infer the most likely group of proteins in the sample (right).
The first step in peptide fluorescence sequencing is to fluorescently label a specific amino acid side chain and fix its C-terminus in the flow channel of the sequencing system to generate an array of sequencing substrates. The immobilized peptides were then subjected to Edman degradation in parallel, and the assembly of immobilized substrates was imaged after each step of degradation. Unlike classical Edman degradation, this method identifies the eliminated phenylthiohydantoin-amino acid conjugate at each step, and the degradation step is only used to determine the decrease in fluorescence intensity caused by the elimination of the labeled amino acid. A software tool developed based on this principle can combine the observed fluorescent signal with a protein sequence database to derive the sequence of each immobilized substrate, that is, the sequence of the peptide chain.
This study has demonstrated the feasibility of peptide fluorescence sequencing. Specifically, the author (i) describes an imaging system compatible with Edman degradation under stringent conditions; (ii) determines the precise location of fluorescently labeled lysine or cysteine residues in the model peptide; Describes the sources of error and inefficiency in the system; (iv) studies the potential to identify proteins from more complex proteomes and provides a computational framework for inferring peptide sequences from observed fluorescent signals; (v) A specific phosphorylated serine residue is localized in a peptide containing a plurality of serine residues.
The peptide fluorescence sequencing method developed by Swaminathan et al. is exciting because it opens up a new research path to peptides and enables high-throughput, highly reproducible and potentially low-cost proteome sequencing. A significant advantage of this approach is that it incorporates the advantages of other research methods such as Edman degradation, large-scale DNA parallel sequencing, and MS-based protein sequence database retrieval computational frameworks. This strategy may help speed up the speed of conversion of relevant research methods from proof of concept to routine application. In addition, the data generated by this method is similar to the large-scale parallel lead data for genomics and transcriptomics. MS proteomics technology still has a large threshold for technology and computation, and its application is slow. Compared with MS-based proteomics technology, this method can help accelerate the use of peptide fluorescence sequencing technology in more organisms.
As Swaminathan et al. point out, there are some technical and conceptual challenges that must be overcome before the new approach reaches its full potential. These problems are mainly due to the nature of Edman degradation and the complexity of the human proteome, including the following: (i) Although the yield of each degradation step in the study is 91-97%, the detectable peptide chain length is also limited. (ii) because the sequencing yield is related to the protein sequence itself, challenging sequences, such as proline-rich protein sequences, may affect the clarity of the fluorescent signal; (iii) functional groups that can be fluorescently labeled The group is limited to groups in the peptide chain that can generate chemical reactions, mainly amino groups, carboxyl groups and sulfhydryl groups, so the amount of information represented by fluorescent signals is also limited; (iv) modified residues are usually not recognized unless they are specifically Fluorescent labeling, this special label only minorly modifies amino acids; (v) the dynamic range of human cell proteomes is large (~107), and each protein also produces a large number of peptides (~102) by enzymatic digestion, each Cells express a large number of open reading frames (~104), which poses a huge analytical challenge without regard to protein diversity. For peptide fluorescence sequencing, meeting these challenges requires an increase in the level of substrate multiplexing, but it has not yet been achieved.
Although the system developed by the author is currently limited to the analysis of relatively simple mixture samples, the development prospects are very good and it is a worthwhile protein sequencing method.
references:
1. Swaminathan, J. et al. Nat. Biotechnol. 36, 1076–1082 (2018)
2. Liu, Y., Beyer, A. & Aebersold, R. Cell 165, 535–550 (2016).
3. Goodwin, S., McPherson, J.D. & McCombie, w.R. Nat. Rev. Genet. 17, 333–351 (2016).
4. Toby, T.K., Fornelli, L. & Kelleher, N.L. Annu. Rev. Anal. Chem. 9, 499–519 (2016).
5. edman, P. Acta Chem. Scand. 4, 283–293 (1950).
6. Aebersold, R. & Mann, M. Nature 537, 347–355 (2016).
7. venable, J.D., Dong, M.Q., wohlschlegel, J., Dillin, A. & Yates, J.R. Nat. Methods 1, 39–45 (2004).
8. Purvine, S., eppel, J.T., Yi, e.C. & Goodlett, D.R. Proteomics 3, 847–850 (2003).
9. Gillet, L.C. et al. Mol. Cell. Proteomics 11, O111.016717 (2012).
10. Robertson, J.w.F. & Reiner, J.e. Proteomics 18, e1800026 (2018).
Shanghai Headquarters
Address: Buildings 9-10, No. 3377 Kangxin Road, Pudong New Area, ShanghaiScan on us