In the Laboratory of Functional and Structural Genomics we perform theoretical studies, whose main objective is to analyze and predict the three-dimensional structure of the human genome, and its relation with the genomic diversity of human populations, both natural and pathological. In particular, we investigate structural variants, copy number variants observed in various sub-populations and the groups of patients, and their three-dimensional localization in the structure of the nucleus.
We also examine the relationship of the expression levels of selected genes from their location in three-dimensional space. In addition, we use structural information to enrich the sequential genomic analysis in order to better define the function of selected genomic regions that are important in the context of personalized medicine.
For this purpose, first we are developing a variety of large-scale computational tools for analysis of whole genome sequences, the identification of structural variants, determining the statistical significance of the observed number of copies of genomic regions in selected cohorts of patients. Secondly, we evaluate their uniqueness comparing the observed changes with typical and natural genomic diversity that has been cataloged for example in the 1000 Genomes Project Consortium. Thirdly, we infer the biological function of these genomic regions using publicly available databases. Fourthly, we identify unique local three-dimensional environment for selected sites, eg. regulatory ones. In the fifth step, we analyze the impact of structural re-arrangements of those local neighborhoods on the gene expression profiles, which is related to the presence of transcription factories.
Laboratory website: https://4dnucleome.cent.uw.
phone: +48 22 55 43654
Degrees: MSc (Physics), PhD (chemistry), DSc (Habilitation, Bioinformatics)
Titles: Professor at University of Warsaw, PhD, DSc
Tel: (+48) 504 726 203 & (+4822) 554 36 54 Fax: (+4822) 554 08 01
Email: email@example.com & firstname.lastname@example.org
Affiliation: Center of New Technologies, University of Warsaw, Poland
Dariusz Plewczynski interests are focused on functional and structural genomics. Functional genomic attempts to make use of the vast wealth of data produced by high- throughput genomics projects, such as the structural genomics consortia, Human genome project, 1000 Genomes Project, ENCODE, and many others. The major tools that are used in this interdisciplinary research endeavor include statistical data analysis (GWAS studies, clustering, machine learning), genomic variation analysis using diverse data sources (karyotyping, confocal microscopy, aCGH microarrays, next generation sequencing: both whole genome and whole exome), bioinformatics (protein sequence analysis, protein structure prediction), and finally biophysics (polymer theory and simulations) and genomics (epigenetics, genome domains, three dimensional structure analysis of chromatin). He is presently involved in several Big Data projects at three institutes: Centre of New Technologies at University of Warsaw (his main affiliation), Jackson Laboratory for Genomic Medicine (an international partner of the TEAM project), and The Centre for Innovative Research (within the Leading National Research Centre KNOW 2012-2017) at Medical University of Bialystok (UMB). He is participating in two large projects, namely 1000 Genomes Project (NIH) by bioinformatics analysis of genomic data from aCGH arrays and NGS (next generation sequencing, deep coverage) experiments for structural variants (SV) identification; and biophysical modeling of chromatin three-dimensional conformation inside human cells using HiC and ChIA-PET techniques within the 4D Nucleome project funded by the NIH in the USA. His goal is to combine the SV data with three-dimensional cell nucleus structure for better understanding of normal genomic variation among human populations, the natural selection process during the human evolution, mammalian cell differentiation, and finally the origin, pathways, progression and development of cancer and autoimmune diseases.
Research Summary (Past & Current):
Professor at the University of Warsaw in the Center of New Technologies CeNT, Warsaw, Poland, the head of the Laboratory of Functional and Structural Genomics.
His main expertise covers computational genomics, biostatistics and bioinformatics. He is actively developing computational intelligence algorithms, performing biophysical simulations and applying computational modeling to various interdisciplinary problems in Human genomics. His recent achievements cover qualitative and quantitative biological data analysis, the general systems theory and interdisciplinary problems in the context of bioinformatics, genomics, drug design, and systems biology; ensemble learning systems, meta-clustering techniques.
He received an MS degree in Theoretical Physics (Department of Physics, UW) under the supervision of Prof. Marek Cieplak in 1995. In 2001, he defended his PhD degree in Physical Chemistry (Institute of Physical Chemistry, Polish Academy of Sciences) under supervision of Prof. Robert Hołyst. Later, He was a postdoc researcher at International Institute for Cell and Molecular Biology in 2001 collaborating closely with Prof. Adam Godzik. Dr. Dariusz Plewczynski worked in The Burnham-Sanford Institute in San Diego, CA, USA in 2002. He was a postdoc at Helsinki University, bioinformatics laboratory in 2003. In 2004 he was a visiting researcher at Merck Research Laboratories (IRBM) in Rome, Italy. From 2002 till 2011, he was the assistant professor at University of Warsaw, Warsaw, Poland. In 2011, He visited Stanford University within the Top500 Polish Ministry of Science programme. He received a DSc degree (habilitation) in Computer Science and bioinformatics in 2012 at the Institute of Computer Science Polish Academy of Sciences.
From 2011, He was the head of the bioinformatics team at the University of Warsaw (first at ICM, and later from 2015 at Centre of New Technologies). He has been involved in bioinformatics projects in the Leading National Research Centre of the Medical University of Bialystok from 2012. He was a visiting professor at The Jackson Laboratory for Genomic Medicine; Yale University within the senior Fulbright fellowship (2013-2014).
- MA: 1995, Faculty of Physics, Warsaw University, Poland. Major: theoretical physics.
“Statistical physics of phase transitions in thin magnetic layers”;
- PhD: 2001, Institute of Physical Chemistry, Poland. Major: physical chemistry. Dissertation title:
„Diffusion of curved surfaces”;
- PostDoc: 2001 – Warsaw, Poland, The International Molecular and Cell Biology Institute. Major:
bioinformatics. Research project: „Structural Comparison of proteins“;
- 2002 – San Diego, CA, The Sanford-Burnham Institute. Major: bioinformatics. Research project:
„Improving the sequence alignment quality using predicted local 3D structure of a protein chain“;
- 2003 – Helsinki, Finland, Helsinki University. Major: bioinformatics. Research project:
“Structural alignment of proteins using DALI”
- Habilitation: 2012, Institute of Computer Science Polish Academy of Sciences, Poland. Major: Computer
Science. Dissertation title: „Applications of machine learning and data analysis techniques to biological
function prediction of biomolecules”
- Starting from January 2015 – Assistant Professor, DSc, PhD, the head of Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw
- 2013 till 2016 – Research consultant, Centre for Innovative Research, Faculty of Medicine, Medical University of Bialystok, Poland.
- 2002 till 2015 – Assistant Professor, Bioinformatics and Systems Biology Laboratory, Statistical Data Analysis & Systems Theory Unit, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Poland.
- 2011 till 2013 – Assistant Professor, Department of Physical Chemistry, Bioinformatics and Applied Mathematics Unit, Faculty of Pharmacy, Medical University of Warsaw, Poland.
- 2004 – Rome, Italy, Merck Laboratories. Major: bioinformatics, chemoinformatics. Research project:
“Applications of machine learning algorithms in virtual High-throughput screening”
- 2005 – Helsinki, Finland, Helsinki University. Major: bioinformatics. Research project: “Prediction of
- 2003-2008 – Poznan, Poland, BioInfoBank Institute, Major: bioinformatics. Research project: „Prediction of
protein function using sequence and structual information“.
- 2011 – Stanford University, Centre of Professional Development, Top500 Innovators
- 2013/2014 – The Jackson Laboratory & Yale University, Farmington, CT, USA
Other professional activities and memberships
- Member of the Polish Bioinformatics Society
- Member of the Polish Physics Society
- Member of the International Society for Computational Biology
Editorships & Reviewing boards
- Member of the Editor Board: “BMC Bioinformatics”, BioMedCentral, UK
- Reviewer for: Genome Research, BMC Genome Biology, Nature Methods, Bioinformatics, J Chem Inf Modeling, BMC Bioinformatics, Chemical Biology and Drug Design and many other journals in the field of “omics-“, computational genomics, drug design, bioinformatics and systems biology.
Recent major publications
- “An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization” Szałaj P, Tang Z, Michalski P, Pietal MJ, Luo OJ, Sadowski M, Li X, Radew K, Ruan Y, Plewczynski D. Genome Res. 2016 Oct 27.
- “3DFlu: database of sequence and structural variability of the influenza hemagglutinin at population scale” Mazzocco G, Lazniewski M, Migdał P, Szczepińska T, Radomski JP, Plewczynski D. Database (Oxford).
2016 Oct 2;2016.
- “3D-GNOME: an integrated web service for structural modeling of the 3D genome” Szalaj P, Michalski PJ, Wróblewski P, Tang Z, Kadlof M, Mazzocco G, Ruan Y, Plewczynski D. Nucleic Acids Res. 2016 Jul 8;44(W1):W288-93.
- “2dSpAn: semiautomated 2-d segmentation, classification and analysis of hippocampal dendritic spine plasticity” Basu S, Plewczynski D, Saha S, Roszkowska M, Magnowska M, Baczynska E, Wlodarczyk J. Bioinformatics. 2016 Aug 15;32(16):2490-8.
- ”CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription” Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, Trzaskoma P, Magalska A, Wlodarczyk J, Ruszczycki B, Michalski P, Piecuch E, Wang P, Wang D, Tian SZ, Penrad- Mobayed M, Sachs LM, Ruan X, Wei CL, Liu ET, Wilczynski GM, Plewczynski D, Li G, Ruan Y. Cell 2015, Dec 17;163(7):1611-27. Epub 2015 Dec 10.
- “An integrated map of structural variation in 2,504 human genomes” by Sudmant PH, …, 1000 Genomes Project Consortium, Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO. Nature. 2015 Oct 1;526(7571):75-81.
- “A global reference for human genetic variation” by 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. Nature. 2015 Oct 1;526(7571):68-74.
- “Analysis of Structural Chromosome Variants by Next Generation Sequencing Methods” Plewczynski D, Gruca S, Szałaj P, Gulik K, de Oliveira SF and Malhotra A. book chapter in “Clinical Applications for Next-Generation Sequencing” book, Elsevier, 2015
- “A combined systems and structural modeling approach repositions antibiotics for Mycoplasma genitalium” by Kazakiewicz D, Karr JR, Langner KM, Plewczynski D. Comput Biol Chem. S1476-9271 (2015);
- “Binding Activity Prediction of Cyclin-Dependent Inhibitors” Saha I, Rak B, Bhowmick SS, Maulik U, Bhattacharjee D, Koch U, Lazniewski M, Plewczynski D. J Chem Inf Model. 55(7):1469-82. (2015);
- “Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models” by Karr JR, Williams AH, Zucker JD, Raue A, Steiert B, Timmer J, Kreutz C; DREAM8 Parameter Estimation Challenge Consortium, Wilkinson S, Allgood BA, Bot BM, Hoff BR, Kellen MR, Covert MW, Stolovitzky GA, Meyer P. PLoS Comput Biol. 28;11(5) (2015);
- 2017-2020 FNP TEAM grant “Three-dimensional Human Genome structure at the population scale:
computational algorithm and experimental validation for lymphoblastoid cell lines of selected families from
1000 Genomes Project” to D. Plewczynski, PI;
- 2016-2017 NCN Grant ETIUDA “Modelling and analysis of three dimensional structure and its dynamics in cel
nucleus” to Przemyslaw Szalaj (PhD student), D. Plewczynski – scientific advisor;
- 2015-2018 EU COST action BM1405 “Non-globular proteins: from sequence to structure, function and
application in molecular physiopathology” to D. Plewczynski, polish PI;
- 2016-2017 NCN Grant PRELUDIUM “Analysis of mechanisms of drug resistance to trastuzumab in HER2
overexpressing breast cancer based on gene expression changes in selected cell lines” to Anna Rusek (PhD
student), D. Plewczynski – scientific advisor;
- 2015-2018 NCN Grant OPUS “iCell: information processing in living organisms. The role of three-dimensional
structure and multi-scale properties in controlling the biological processes in a cell” to D. Plewczynski, PI;
- 2015-2016 NCN Grant ETIUDA “Integration of information in biological and synthetic systems” to J. Zubek
(PhD student), D. Plewczynski – scientific advisor;
- 2014-2017 NCN Grant OPUS “Virtual High Throughput Screening (vHTS) derivation of a cross-immunity model
for the Influenza-A Virus Infections” to D. Plewczynski, PI;
- 2008-2011 KBN Grant “Application of machine learning methods to prediction of protein-protein interactions” –
Polish Ministry of Science grant to dr D. Plewczynski, PI;
- Grant LSHG-CT- 2003-503265 BIOSAPIENS, a large-scale effort to annotate human genome using both
informatics tools and input from experimentalists. 6 th Framework EC Project, participant;
- Grant SP22-CT- 2004-003831 SEPSDA, Combatting and eventually eradicating the new coronavirus causing
Severe Acute Respiratory Syndrome (SARS) requires specific and efficient antiviral drugs and improved
diagnostics. 6 th Framework EC Project, participant;
- Grant QLRT-CT2000- 00127 ELM, The four principal objectives of the ELM consortium are to (1) design, (2)
develop, (3) maintain and (4) apply, a novel infrastructure resource devoted to the prediction of functional
motifs in protein sequences. 6 th Framework EC Project, participant;
- Lectures: Genome Biology (UW); Genomes Biophysics (UW); Bioinformatics (UW,WUT); Drug
Design (WUM); Machine Learning (UW); statistical data analysis (UW).
- Seminars and laboratories: Bioinformatics; Systems Biology; Machine Learning & Statistics.
- 2013 – Senior Fulbright Fellowship to visit Harvard University, and Yale Univ., USA;
- 2011 – Top500 Innovators: Science, Management and Commercialization Award; Polish Ministry of Science and Higher Education
- 1994 – Sosnowski Award (for outstanding physicists); Polish Physical Society
- 1993 – 1994 Polish Ministry of Education and Science Award
- 1986 – 1987 Polish Ministry of Education
Polish (native), English (fluent), German (basic), Russian (basic)
Photography, History, Traveling, Robotics and Artificial Intelligence.
Prof. Dariusz Plewczyński
Prof. Vahid Rezaei Tabar
Karolina Jodkowska, PhD
Michał Łaźniewski, PhD
Ayatullah Faruk Mollah, PhD
Michał Piętal, PhD
Teresa Szczepińska, PhD
Grzegorz Bokota, MSc
Anna Maria Bugaj, MA
Michał Denkiewicz, MSc
Anup Kumar Halder, MSc
Michał Kadlof, MSc Eng.
Somnath Rakshit, B. Tech
Anna Maria Rusek, MSc
Przemyslaw Szałaj, MSc
Paulina Urban, MSc
Michał Własnowolski, MSc
Anas Allaoui, BSc.
Agnieszka Kraft, BSc
Zofia Parteka, BSc
Michał Sadowski, BSc
Piotr Skłodkowski, BSc
Andrzej Szczepanczyk, BSc
Natalia Zawrotna, Eng.
Tabar, V. R., Zareifard, H., Salimi, S., & Plewczynski, D. (2019).
Journal of Statistical Computation and Simulation, 1-14.
Tabar, V. R., Fathipour, H., Pérez-Sánchez, H., Eskandari, F., and Plewczynski, D. (2019).
Journal of The Iranian Statistical Society
Szczepińska, T., Rusek, A. M., & Plewczynski, D.
Genes, Chromosomes and Cancer, 58(7), 500-506
Jezela‐Stanek, A., Walczak, A., Łaźniewski, M., Kosińska, J., Stawiński, P., Pienkowski, V. M., . . . Płoski, R.
Clinical Genetics, 95(6), 736-738
Saha, S., Chatterjee, P., Basu, S., Nasipuri, M., & Plewczynski, D.
Tabar, V. R., Zareifard, H., Salimi, S., & Plewczynski, D.
Journal of Statistical Computation and Simulation, 89(10), 1957-1970
Marusiak, A. A., Prelowska, M. K., Mehlich, D., Lazniewski, M., Kaminska, K., Gorczynski, A., . . . Nowis, D.
Oncogene, 38(15), 2860-2875
Urban, P., Rezaei, V., Bokota, G., Denkiewicz, M., Basu, S., & Plewczyński, D.
Journal of Computational Biology, 26(4), 322-335
Ciara, E., Rokicki, D., Lazniewski, M., Mierzewska, H., Jurkiewicz, E., Bekiesińska-Figatowska, M., ... & Kosińska, J. (2018).
Journal of human genetics, 63(4), 473.
Al Bkhetan, Z., & Plewczynski, D. (2018).
Scientific reports, 8(1), 5217
Basu, S., Saha, P. K., Roszkowska, M., Magnowska, M., Baczynska, E., Das, N., ... & Wlodarczyk, J. (2018).
Scientific reports, 8(1), 3545
Maszczyk, P., Babkiewicz, E., Czarnocka-Cieciura, M., Gliwicz, M. Z., Uchmanski, J., Urban, P. (2018)
Journal of Plankton Research, 40(4), 471-485
Basu, S., & Plewczynski, D.
Briefings in Functional Genomics, 17(6)
Tabar, V. R., Plewczynski, D., & Fathipor, H.
Journal of the Iranian Statistical Society, 17(2)
Wang, X., Li, X., Zhang, L., Wong, S. H., Wang, M. H., Tse, G., Plewczyński,D. . . . & Wu, W. K.
Annals of Oncology, 29(11), 2254-2260.
Malkowska, M., Zubek, J., Plewczynski, D., & Wyrwicz, L. S.
Szalaj, P., & Plewczynski, D.
Cell Biology and Toxicology, 34(5), 381-404.
Tatjewski, M., Kierczak, M., & Plewczynski, D. (2017).
Methods in Molecular Biology 2017;1484:275-300. PMID: 27787833
Lazniewski, M., Dawson, W. K., Szczepińska, T., & Plewczynski, D. (2017).
Briefings in functional genomics.
Ołdak, M., Oziębło, D., Pollak, A., Stępniak, I., Lazniewski, M., Lechowicz, U., Kochanek, K., Furmanek, M., Tacikowska, G., Plewczynski, D. and Wolak, T. (2017.)
Journal of translational medicine, 15(1), p.25.
Zubek, J., Denkiewicz, M., Barański, J., Wróblewski, P., Rączaszek-Leonardi, J., & Plewczynski, D. (2017).
PloS one, 12(8), e0182490. ISO 690
Dawson, W. K., Lazniewski, M., & Plewczynski, D. (2017).
Briefings in functional genomics. 2017, 1-13
Dekker, J., Belmont, A. S., Guttman, M., Leshyk, V., Lis, J. T., Lomvardas, S., Mirny, L. A., O’Shea C. C., Park, P. J., Ren, B., Ritland Politz, C. J., Shendure, J., Zhong, S. & the 4D Nucleome Network
Nature, 549, 219–226
Zubek, Julian, Michael L. Stitzel, Duygu Ucar, and Dariusz M. Plewczynski., (2016)
PeerJ 4: e1750
Chatterjee, P., Basu, S., Zubek, J., Kundu, M., Nasipuri, M., & Plewczynski, D. (2016).
Journal of molecular modeling, 22(4), 72. Chicago
Tatjewski, M., Gruca, A., Plewczynski, D., & Grynberg, M. (2016)
Molecular reproduction and development, 83(2), 144-148.
Szałaj, P., Tang, Z., Michalski, P., Pietal, M. J., Luo, O. J., Sadowski, M., ... & Plewczynski, D. (2016).
Genome research, 26(12), 1697-1709.
Szalaj, P., Michalski, P. J., Wróblewski, P., Tang, Z., Kadlof, M., Mazzocco, G., ... & Plewczynski, D. (2016).
Nucleic acids research, 44(W1), W288-W293.
Mazzocco, G., Bhowmick, S. S., Saha, I., Maulik, U., Bhattacharjee, D., & Plewczynski, D., (2016)
International Conference on Pattern Recognition and Machine Intelligence (pp. 462-471). Springer, Cham
Basu, S., Plewczynski, D., Saha, S., Roszkowska, M., Magnowska, M., Baczynska, E., & Wlodarczyk, J. (2016).
Bioinformatics, 32(16), 2490-2498.
Mazzocco, G., Lazniewski, M., Migdał, P., Szczepińska, T., Radomski, J. P., & Plewczynski, D.
Database, 2016, baw130
Bokota, G., Magnowska, M., Kuśmierczyk, T., Łukasik, M., Roszkowska, M., & Plewczynski, D. (2016)
Frontiers in computational neuroscience, 10, 140
M. Tatjewski, D. Plewczyński
Artificial Intelligence 2016, 9999: 1–12
Plewczynski, D., Gruca, S., Szałaj, P., Gulik, K., Oliveira, S. F., & Malhotra, A.
Clinical Applications for Next-Generation Sequencing, 39-61.
Lin, H., Chen, W., Anandakrishnan, R., & Plewczynski, D.,(2015)
The Scientific World Journal
Tang, Z., Luo, O. J., Li, X., Zheng, M., Zhu, J. J., Szalaj, P., ... & Michalski, P. (2015).
Cell, 163(7), 1611-1627.
1000 Genomes Project Consortium. (2015).
Nature, 526(7571), 68.
Kazakiewicz, D., Karr, J. R., Langner, K. M., & Plewczynski, D. (2015).
Computational biology and chemistry, 59, 91-97.
Saha, I., Rak, B., Bhowmick, S. S., Maulik, U., Bhattacharjee, D., Koch, U., ... & Plewczynski, D. (2015).
Journal of chemical information and modeling, 55(7), 1469-1482.
Saha, I., Bhowmick, S. S., Geraci, F., Pellegrini, M., Bhattacharjee, D., Maulik, U., & Plewczynski, D.
International Conference on Swarm, Evolutionary, and Memetic Computing (pp. 116-127). Springer, Cham.
Plewczynski, D., Gruca, S., Szałaj, P., Gulik, K., de Oliveira, S. F., & Malhotra, A. (2016)
Clinical Applications for Next-Generation Sequencing (pp. 39-61).
Plewczynski, D., Philips, A., Grotthuss, M. V., Rychlewski, L., & Ginalski, K. (2014).
Journal of Computational Biology, 21(3), 247-256.
Saha, I., Zubek, J., Klingström, T., Forsberg, S., Wikander, J., Kierczak, M., ... & Plewczynski, D. (2014).
Molecular BioSystems, 10(4), 820-830.
|Title||Deadline for applications|
|Chromatin conformation capture postdoc – experimental work in NGS-based 3D genomics techniques: Hi-C, ChIA-PET and HiChIP||12/10/2019|
|2 Computational modelling postdocs||12/10/2019|
|Computational modelling MSc Student||12/10/2019|
|MSc student in Laboratory of Functional and Structural Genomics||29/04/2019|
|Computational modelling PhD students||01/09/2018|
|Computational modelling MSc student||01/09/2018|
|Postdoctoral researchers to work in the Laboratory of Functional and Structural Genomics||01/09/2018|
Selected Scientific Discoveries in Laboratory of Functional and Structural Genomics headed by Dariusz Plewczynski, PhD at Centre of New Technologies, University of Warsaw, Poland
In the Laboratory of Functional and Structural Genomics we perform theoretical studies, whose main objective is to analyze and predict the three-dimensional structure of the human genome, and its relation with the genomic diversity of human populations, both natural and pathological. In particular, we investigate structural variants, copy number variants observed in various sub-populations and the groups of patients, and their three- dimensional localization in the structure of the nucleus. We also examine the relationship of the expression
levels of selected genes from their location in three-dimensional space. In addition, we use structural information to enrich the sequential genomic analysis in order to better define the function of selected genomic regions that are important in the context of personalized medicine. For this purpose, first we are developing a variety of large-scale computational tools for analysis of whole genome sequences, the identification of structural variants, determining the statistical significance of the observed number of copies of genomic regions in selected cohorts of patients. Secondly, we evaluate their uniqueness comparing the observed changes with typical and natural genomic diversity that has been cataloged for example in the 1000 Genomes Project Consortium. Thirdly, we infer the biological function of these genomic regions using publicly available databases. Fourthly, we identify unique local three-dimensional environment for selected sites, eg. regulatory ones. In the fifth step, we analyze the impact of structural re-arrangements of those local neighborhoods on the gene expression profiles, which is related to the presence of transcription factories.
Areas For Scientific Synergies:
Three dimensional Genomics, higher order chromatin organization, nucleus, GWAS & SV (deletions, duplications, insertions, inversions, and translocations), Big Data, statistical learning, massive dataset analysis; Computational Genomics and bioinformatics in population-level genomic data for medical applications and fundamental research in Life Sciences, structural and functional analysis of “omics” data; SV identification in human genome using NGS methods; biophysical methods, proteins structure prediction, protein-protein interactions: prediction and analysis and analysis of biomolecule interactions networks; whole-cell modeling;
Laboratory Research Methodology and Achievements
Our research methodology was already tested and validated experimentally for GM12878 cell line only, where the high resolution interaction datasets were available. The underlying biological model with experimental results was published in Cell 2015: http://linkinghub.elsevier.com/retrieve/pii/S0092-8674(15)01504-4 , followed by the detailed description of computational model that was published in Genome Research 2016 http://genome.cshlp.org/content/early/2016/10/27/gr.205062.116.abstract. The web server allowing research community for free access to both the experimental data and computational simulation engine was published in Nucleic Acids Research 2016: http://nar.oxfordjournals.org/content/early/2016/05/16/nar.gkw437.full . In short: ChIA-PET is a high throughput mapping technology that reveals long-range chromatin interactions and provides insights into the basic principles of spatial genome organization and gene regulation mediated by specific protein factors. Recently, we showed that a single ChIA-PET experiment provides information at all genomic scales of interest, from the high resolution locations of binding sites and enriched chromatin interactions mediated by specific protein factors, to the low resolution of non-enriched interactions that reflect topological neighborhoods of higher-order chromosome folding. This multilevel nature of ChIA-PET data offers an opportunity to use multiscale 3D models to study structural-functional relationships at multiple length scales, but doing so requires a structural modeling platform. We development of 3D-GNOME (3-Dimensional GeNOme Modeling Engine), a complete computational pipeline for 3D simulation using ChIA-PET data. 3D- GNOME consists of three integrated components: a graph-distance- based heatmap normalization tool, a 3D modeling platform, and an interactive 3D visualization tool. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of CTCF-motif
orientation and high-resolution looping patterns in 3D simulation provided additional reliability of potential biologically plausible topological structures.
Structural Genomics – to be further developed within proposed TEAM project
- “An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization” Szałaj P, Tang Z, Michalski P, Pietal MJ, Luo OJ, Sadowski M, Li X, Radew K, Ruan Y, Plewczynski D. Genome Res. 2016 Oct 27.
- “3D-GNOME: an integrated web service for structural modeling of the 3D genome” by Szalaj P, Michalski PJ, Wróblewski P, Tang Z, Kadlof M, Mazzocco G, Ruan Y, Plewczynski D. Nucleic Acids Res. 2016 May 16. pii: gkw437. pmid:27185892
3D-GNOME (3-Dimensional Genome Modeling Engine) is a complete computational pipeline for 3D simulation
using ChIA-PET or Hi-C data. 3D-GNOME consists of four integrated components: i) a graph-distance- based heat map normalization tool, ii) a 3D modeling platform, iii) an interactive 3D visualization tool, and finally iv) a web service which generates 3D structures from 3C data and provides tools to visually inspect and annotate the resulting structures, in addition to a variety of statistical plots and heatmaps which characterize the selected genomic region. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of the 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of the CTCF-motif orientation and high- resolution looping patterns in the 3D simulation provided additional reliability of potential biologically plausible topological structures. The 3D-GNOME web server is freely available at http://3dgnome.cent.uw.edu.pl/.
3. “CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription” Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, …, Wilczynski GM, Plewczynski D, Li G, Ruan Y. Cell 2015, Dec 17;163(7):1611-27. Epub 2015 Dec 10.
Spatial genome organization and its effect on transcription remains a fundamental question. We applied an advanced chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) strategy to comprehensively map the higher-order chromosome folding and specific chromatin interactions mediated by the CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) with haplotype specificity and nucleotide resolution in different human cell lineages. We find that the CTCF/cohesin-mediated interaction anchors serve as structural foci for spatial organization of constitutive genes concordant with the CTCF-motif orientation, whereas RNAPII interacts within these structures by selectively drawing cell-type- specific genes toward CTCF foci for coordinated transcription. Furthermore, we show that the haplotype variants and allelic interactions have differential effects on chromosome configuration, influencing gene expression, and may provide mechanistic insights into functions associated with disease susceptibility. The 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCFs are involved in defining the interface between condensed and open compartments for structural regulation. Our 3D genome strategy thus provides unique insights into the topological mechanism of human variations and diseases.
4. “A global reference for human genetic variation” by 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. Nature. 2015 Oct 1;526(7571):68-74.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. This publication reports the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. They characterized a broad spectrum of genetic variation, in total over 88 million
variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes ~99% of the SNP variants with a frequency of >1% for a variety of ancestries. The distribution of genetic variation across the global sample is provided, and discusses the implications for common disease studies. Our (D. Plewczynski) contribution was structural variants (SV) identification using combined aCGH arrays and NGS sequencing tightly coupled with computational meta-approach. The set of tools (including meta-caller for SV identification using combined several independent methods and machine learning was developed at Charles Lee laboratory at JAX GM. We participated in the early phase of this project, providing initial results and working independently further within this area by focusing on application of deep learning algorithms for SV calling and GWAS studies.
Functional Genomics – tools and algorithms developed within iCell (2015-2018) and Influenza (2013-2016) National Research Centre OPUS grants
5. “Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein–protein interactions using machine learning methods" by Srivastava,A, Mazzocco,G, Kel A, Wyrwicz LS and D. Plewczynski Mol BioSyst. 12(3):778-85 (2016).
6. “Ensemble learning prediction of protein-protein interactions using proteins functional annotations” by Saha, Zubek, Klingström, Forsberg, Wikander, Kierczak, Maulik and D. Plewczynski Mol BioSyst. 10(4):820-30. (2014).
Protein-protein interactions (PPI) are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins, such as DIP, MINT, BioGRID and IntAct. Then we constructed descriptive features for
machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, the sensitivity exceeded 80% and the classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a larger and more realistic dataset maintaining
sensitivity over 70%. Finally, we further improved those results by constructing a unique validation approach aimed to at collecting reliable non-interacting proteins (NIPs). The curated negatives and positives subsets were used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96% and 98%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances could be considerably improved by focusing on data preparation. The predicted PPI networks were analyzed in terms of their topological and geometrical properties.
7. “Consensus classification of Human Leukocyte Antigens class II proteins” by I. Saha, G. Mazzocco and D. Plewczynski. Immunogenetics 65(2):97-105 (2013);
Class II human leukocyte antigens (HLA II) are proteins involved in the human immunological adaptive response by binding and exposing some pre-processed, non-self peptides in the extracellular domain in order to make them recognizable by the CD4+ T lymphocytes. However, the understanding of HLA-peptide binding interaction is a crucial step for designing a peptide-based vaccine because the high rate of polymorphisms in HLA class II molecules creates a big challenge, even though the HLA II proteins can be grouped into supertypes, where members of different class bind a similar pool of peptides. We performed the supertype classification of 27 HLA II proteins using their binding affinities and structural-based linear motifs to create a stable group of supertypes. For this purpose, a well-known clustering method was used, and then, a consensus was built to find the stable groups and to show the functional and structural correlation of HLA II
proteins. Thus, the overlap of the binding events was measured, confirming a large promiscuity within the HLA II-peptide interactions. Moreover, a very low rate of locus-specific binding events was observed for the HLA-DP genetic locus, suggesting a different binding selectivity of these proteins with respect to HLA-DR and HLA-DQ proteins. Secondly, a predictor based on a support vector machine (SVM) classifier was designed to recognize HLA II-binding peptides.
Public Health Relevance of proposed research project
It is becoming increasingly recognized that higher-order genome organization crucially influences gene regulation, cell function and ultimately human health. However, the ability to investigate these relationships will depend on technological advances that enable a more precise, integrated view of genome structure and function. This proposal seeks to address this challenge through the development of a 3Depigenomic platform that will provide a powerful tool for studying genomic structure in space (3D) and time (4D).