Laboratorium Genomiki Funkcjonalnej i Strukturalnej

W Laboratorium Genomiki Funkcjonalnej i Strukturalnej prowadzone są badania teoretyczne, których głównym celem jest analiza i przewidywanie struktury trójwymiarowej genomu ludzkiego, oraz jej związku ze zróżnicowaniem genomicznym populacji ludzkiej, zarówno naturalnym jak i patologicznym.

W szczególności badamy zależność obserwowanych w różnych populacjach oraz grupach pacjentów wariantów strukturalnych, zmiany ilości kopii genów od ich lokalizacji w strukturze jądra komórkowego. Badamy również zależność ekspresji wybranych genów od ich umiejscowienia w przestrzeni trójwymiarowej. Dodatkowo wykorzystujemy informację strukturalną do wzbogacenia analiz sekwencyjnych w celu lepszego określenia funkcji wybranych regionów genomicznych o istotnym znaczeniu dla medycyny spersonalizowanej.
W tym celu po pierwsze rozwijamy szereg wielko-skalowych narzędzi obliczeniowych służących do analizy sekwencji pełnych genomów, identyfikacji występujących w nich wariantów strukturalnych, określania istotności statystycznej obserwowanej ilości kopii regionów genomicznych w wybranych kohortach pacjentów. Po drugie w celu określenia ich unikalności porównujemy zaobserwowane zmiany z typowym i naturalnym zróżnicowaniem genomicznym, które zostało skatalogowane np. w konsorcjum 1000 Genomes Project. Po trzecie określamy funkcję biologiczną pełnioną przez te regiony genomiczne. Po czwarte identyfikujemy unikalne otoczenie genomiczne w przestrzeni trójwymiarowej dla tak wybranych miejsc, np. regulatorowych. W piątym kroku przewidujemy wpływ re-aranżacji struktury trójwymiarowej tak określonych lokalnych sąsiedztw w jądrze komórkowym, związaną np. z obecnością fabryk transkrypcyjnych, na ekspresję genów.
dr hab. Dariusz Plewczyński , prof. UW
e-mail: d.plewczynski@cent.uw.edu.pl
telefon: +48 22 55 43654
pokój: 03.63
Strona internetowa: http://nucleus3d.cent.uw.edu.pl

 

Degrees: MSc (Physics), PhD (chemistry), DSc (Habilitation, Bioinformatics)
Titles: Professor at University of Warsaw, PhD, DSc
Tel: (+48) 504 726 203; (+4822) 554 36 54; Fax: (+4822) 554 08 01
Email: d.plewczynski@cent.uw.edu.pl & dariuszplewczynski@gmail.com

Affiliation: Center of New Technologies, University of Warsaw, Poland

Research Interests:
Dariusz Plewczynski interests are focused on functional and structural  genomics. Functional genomic attempts to make use of the vast wealth of data produced by high- throughput genomics projects, such as the structural genomics consortia, Human genome project, 1000 Genomes Project, ENCODE, and many others. The major tools that are used in this interdisciplinary research endeavor include statistical data analysis (GWAS studies, clustering, machine learning), genomic variation analysis using diverse data sources (karyotyping, confocal microscopy, aCGH microarrays, next generation sequencing: both whole genome and whole exome), bioinformatics (protein sequence analysis, protein structure prediction), and finally biophysics (polymer theory and simulations) and genomics (epigenetics, genome domains, three dimensional structure analysis of chromatin). He is presently involved in several Big Data projects at three institutes: Centre of New Technologies at University of Warsaw (his main affiliation), Jackson Laboratory for Genomic Medicine (an international partner of the TEAM project), and The Centre for Innovative Research (within the Leading National Research Centre KNOW 2012-2017) at Medical University of Bialystok (UMB). He is participating in two large projects, namely 1000 Genomes Project (NIH) by bioinformatics analysis of genomic data from aCGH arrays and NGS (next generation sequencing, deep coverage) experiments for structural variants (SV) identification; and biophysical modeling of chromatin three-dimensional conformation inside human cells using HiC and ChIA-PET techniques within the 4D Nucleome project funded by the NIH in the USA. His goal is to combine the SV data with three-dimensional cell nucleus structure for better understanding of normal genomic variation among human populations, the natural selection process during the human evolution, mammalian cell differentiation, and finally the origin, pathways, progression and development of cancer and autoimmune diseases.

Research Summary (Past & Current):

Professor at the University of Warsaw in the Center of New Technologies CeNT, Warsaw, Poland, the head of the Laboratory of Functional and Structural Genomics.
His main expertise covers computational genomics, biostatistics and bioinformatics. He is actively developing computational intelligence algorithms, performing biophysical simulations and applying computational modeling to various interdisciplinary problems in Human genomics. His recent achievements cover qualitative and quantitative biological data analysis, the general systems theory and interdisciplinary problems in the context of bioinformatics, genomics, drug design, and systems biology; ensemble learning systems, meta-clustering techniques.

He received an MS degree in Theoretical Physics (Department of Physics, UW) under the supervision of Prof. Marek Cieplak in 1995. In 2001, he defended his PhD degree in Physical Chemistry (Institute of Physical Chemistry, Polish Academy of Sciences) under supervision of Prof. Robert Hołyst. Later, He was a postdoc researcher at International Institute for Cell and Molecular Biology in 2001 collaborating closely with Prof. Adam Godzik. Dr. Dariusz Plewczynski worked in The Burnham-Sanford Institute in San Diego, CA, USA in 2002. He was a postdoc at Helsinki University, bioinformatics laboratory in 2003. In 2004 he was a visiting researcher at Merck Research Laboratories (IRBM) in Rome, Italy. From 2002 till 2011, he was the assistant professor at University of Warsaw, Warsaw, Poland. In 2011, He visited Stanford University within the Top500 Polish Ministry of Science programme. He received a DSc degree (habilitation) in Computer Science and bioinformatics in 2012 at the Institute of Computer Science Polish Academy of Sciences.

From 2011, He was the head of the bioinformatics team at the University of Warsaw (first at ICM, and later from 2015 at Centre of New Technologies). He has been involved in bioinformatics projects in the Leading National Research Centre of the Medical University of Bialystok from 2012. He was a visiting professor at The Jackson Laboratory for Genomic Medicine; Yale University within the senior Fulbright fellowship (2013-2014).

Education

  • MA: 1995, Faculty of Physics, Warsaw University, Poland. Major: theoretical physics.
    Thesis title:
    “Statistical physics of phase transitions in thin magnetic layers”;
  • PhD: 2001, Institute of Physical Chemistry, Poland. Major: physical chemistry. Dissertation title:
    „Diffusion of curved surfaces”;
  • PostDoc: 2001 – Warsaw, Poland, The International Molecular and Cell Biology Institute. Major:
    bioinformatics. Research project: „Structural Comparison of proteins“;
  • 2002 – San Diego, CA, The Sanford-Burnham Institute. Major: bioinformatics. Research project:
    „Improving the sequence alignment quality using predicted local 3D structure of a protein chain“;
  • 2003 – Helsinki, Finland, Helsinki University. Major: bioinformatics. Research project:
    “Structural alignment of proteins using DALI”
  • Habilitation: 2012, Institute of Computer Science Polish Academy of Sciences, Poland. Major: Computer
    Science. Dissertation title: „Applications of machine learning and data analysis techniques to biological
    function prediction of biomolecules”

Positions held

  • Starting from January 2015 – Assistant Professor, DSc, PhD, the head of Laboratory of Functional and Structural Genomics, Center of New Technologies, University of Warsaw
  • 2013 till 2016 – Research consultant, Centre for Innovative Research, Faculty of Medicine, Medical University of Bialystok, Poland.
  • 2002 till 2015 – Assistant Professor, Bioinformatics and Systems Biology Laboratory, Statistical Data Analysis & Systems Theory Unit, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Poland.
  • 2011 till 2013 – Assistant Professor, Department of Physical Chemistry, Bioinformatics and Applied Mathematics Unit, Faculty of Pharmacy, Medical University of Warsaw, Poland.

Visiting Researcher

  • 2004 – Rome, Italy, Merck Laboratories. Major: bioinformatics, chemoinformatics. Research project:
    “Applications of machine learning algorithms in virtual High-throughput screening”
  • 2005 – Helsinki, Finland, Helsinki University. Major: bioinformatics. Research project: “Prediction of
    protein-protein interactions”
  • 2003-2008 – Poznan, Poland, BioInfoBank Institute, Major: bioinformatics. Research project: „Prediction of
    protein function using sequence and structual information“.
  • 2011 – Stanford University, Centre of Professional Development, Top500 Innovators
  • 2013/2014 – The Jackson Laboratory & Yale University, Farmington, CT, USA

Other professional activities and memberships

  • Member of the Polish Bioinformatics Society
  • Member of the Polish Physics Society
  • Member of the International Society for Computational Biology

Editorships & Reviewing boards

  • Member of the Editor Board: “BMC Bioinformatics”, BioMedCentral, UK
  • Reviewer for: Genome Research, BMC Genome Biology, Nature Methods, Bioinformatics, J Chem Inf Modeling, BMC Bioinformatics, Chemical Biology and Drug Design and many other journals in the field of “omics-“, computational genomics, drug design, bioinformatics and systems biology.

Recent major publications

  1. “An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization” Szałaj P, Tang Z, Michalski P, Pietal MJ, Luo OJ, Sadowski M, Li X, Radew K, Ruan Y, Plewczynski D. Genome Res. 2016 Oct 27.
  2. “3DFlu: database of sequence and structural variability of the influenza hemagglutinin at population scale” Mazzocco G, Lazniewski M, Migdał P, Szczepińska T, Radomski JP, Plewczynski D. Database (Oxford).
    2016 Oct 2;2016.
  3. “3D-GNOME: an integrated web service for structural modeling of the 3D genome” Szalaj P, Michalski PJ, Wróblewski P, Tang Z, Kadlof M, Mazzocco G, Ruan Y, Plewczynski D. Nucleic Acids Res. 2016 Jul 8;44(W1):W288-93.
  4. „2dSpAn: semiautomated 2-d segmentation, classification and analysis of hippocampal dendritic spine plasticity” Basu S, Plewczynski D, Saha S, Roszkowska M, Magnowska M, Baczynska E, Wlodarczyk J. Bioinformatics. 2016 Aug 15;32(16):2490-8.
  5. ”CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription” Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, Trzaskoma P, Magalska A, Wlodarczyk J, Ruszczycki B, Michalski P, Piecuch E, Wang P, Wang D, Tian SZ, Penrad- Mobayed M, Sachs LM, Ruan X, Wei CL, Liu ET, Wilczynski GM, Plewczynski D, Li G, Ruan Y. Cell 2015, Dec 17;163(7):1611-27. Epub 2015 Dec 10.
  6. “An integrated map of structural variation in 2,504 human genomes” by Sudmant PH, …, 1000 Genomes Project Consortium, Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO. Nature. 2015 Oct 1;526(7571):75-81.
  7. “A global reference for human genetic variation” by 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. Nature. 2015 Oct 1;526(7571):68-74.
  8. “Analysis of Structural Chromosome Variants by Next Generation Sequencing Methods” Plewczynski D, Gruca S, Szałaj P, Gulik K, de Oliveira SF and Malhotra A. book chapter in “Clinical Applications for Next-Generation Sequencing” book, Elsevier, 2015
  9. “A combined systems and structural modeling approach repositions antibiotics for Mycoplasma genitalium” by Kazakiewicz D, Karr JR, Langner KM, Plewczynski D. Comput Biol Chem. S1476-9271 (2015);
  10. “Binding Activity Prediction of Cyclin-Dependent Inhibitors” Saha I, Rak B, Bhowmick SS, Maulik U, Bhattacharjee D, Koch U, Lazniewski M, Plewczynski D. J Chem Inf Model. 55(7):1469-82. (2015);
  11. “Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models” by Karr JR, Williams AH, Zucker JD, Raue A, Steiert B, Timmer J, Kreutz C; DREAM8 Parameter Estimation Challenge Consortium, Wilkinson S, Allgood BA, Bot BM, Hoff BR, Kellen MR, Covert MW, Stolovitzky GA, Meyer P. PLoS Comput Biol. 28;11(5) (2015);

Research projects

  • 2017-2020 FNP TEAM grant “Three-dimensional Human Genome structure at the population scale:
    computational algorithm and experimental validation for lymphoblastoid cell lines of selected families from
    1000 Genomes Project” to D. Plewczynski, PI;
  • 2016-2017 NCN Grant ETIUDA “Modelling and analysis of three dimensional structure and its dynamics in cel
    nucleus” to Przemyslaw Szalaj (PhD student), D. Plewczynski – scientific advisor;
  • 2015-2018 EU COST action BM1405 “Non-globular proteins: from sequence to structure, function and
    application in molecular physiopathology” to D. Plewczynski, polish PI;
  • 2016-2017 NCN Grant PRELUDIUM “Analysis of mechanisms of drug resistance to trastuzumab in HER2
    overexpressing breast cancer based on gene expression changes in selected cell lines” to Anna Rusek (PhD
    student), D. Plewczynski – scientific advisor;
  • 2015-2018 NCN Grant OPUS “iCell: information processing in living organisms. The role of three-dimensional
    structure and multi-scale properties in controlling the biological processes in a cell” to D. Plewczynski, PI;
  • 2015-2016 NCN Grant ETIUDA “Integration of information in biological and synthetic systems” to J. Zubek
    (PhD student), D. Plewczynski – scientific advisor;
  • 2014-2017 NCN Grant OPUS “Virtual High Throughput Screening (vHTS) derivation of a cross-immunity model
    for the Influenza-A Virus Infections” to D. Plewczynski, PI;
  • 2008-2011 KBN Grant “Application of machine learning methods to prediction of protein-protein interactions” –
    Polish Ministry of Science grant to dr D. Plewczynski, PI;
  • Grant LSHG-CT- 2003-503265 BIOSAPIENS, a large-scale effort to annotate human genome using both
    informatics tools and input from experimentalists. 6 th Framework EC Project, participant;
  • Grant SP22-CT- 2004-003831 SEPSDA, Combatting and eventually eradicating the new coronavirus causing
    Severe Acute Respiratory Syndrome (SARS) requires specific and efficient antiviral drugs and improved
    diagnostics. 6 th Framework EC Project, participant;
  • Grant QLRT-CT2000- 00127 ELM, The four principal objectives of the ELM consortium are to (1) design, (2)
    develop, (3) maintain and (4) apply, a novel infrastructure resource devoted to the prediction of functional
    motifs in protein sequences. 6 th Framework EC Project, participant;

Teaching experience

  • Lectures: Genome Biology (UW); Genomes Biophysics (UW); Bioinformatics (UW,WUT); Drug
    Design (WUM); Machine Learning (UW); statistical data analysis (UW).
  • Seminars and laboratories: Bioinformatics; Systems Biology; Machine Learning & Statistics.

Awards

  • 2013 – Senior Fulbright Fellowship to visit Harvard University, and Yale Univ., USA;
  • 2011 – Top500 Innovators: Science, Management and Commercialization Award; Polish Ministry of Science and Higher Education
  • 1994 – Sosnowski Award (for outstanding physicists); Polish Physical Society
  • 1993 – 1994 Polish Ministry of Education and Science Award
  • 1986 – 1987 Polish Ministry of Education

Languages:
Polish (native), English (fluent), German (basic), Russian (basic)

Hobby:
Photography, History, Traveling, Robotics and Artificial Intelligence.


Clinical and molecular characteristics of newly reported mitochondrial disease entity caused by biallelic PARS2 mutations
Ciara, E., Rokicki, D., Lazniewski, M., Mierzewska, H., Jurkiewicz, E., Bekiesińska-Figatowska, M., ... & Kosińska, J. (2018).
Journal of human genetics, 63(4), 473.
Three-dimensional Epigenome Statistical Model: Genome-wide Chromatin Looping Prediction
Al Bkhetan, Z., & Plewczynski, D. (2018).
Scientific reports, 8(1), 5217
Quantitative 3-D morphometric analysis of individual dendritic spines
Basu, S., Saha, P. K., Roszkowska, M., Magnowska, M., Baczynska, E., Das, N., ... & Wlodarczyk, J. (2018).
Scientific reports, 8(1), 3545
Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices.
Tatjewski, M., Kierczak, M., & Plewczynski, D. (2017).
Prediction of Protein Secondary Structure (2017): 275-300.
The structural variability of the influenza A hemagglutinin receptor-binding site.
Lazniewski, M., Dawson, W. K., Szczepińska, T., & Plewczynski, D. (2017).
Briefings in functional genomics.
The 4D nucleome project.
Dekker, J., Belmont, A.S., Guttman, M., Leshyk, V.O., Lis, J.T., Lomvardas, S., Mirny, L.A., O’shea, C.C., Park, P.J., Ren, B. and Politz, J.C.R., (2017)
Nature, 549(7671), p.219.
Social adaptation in multi-agent model of linguistic categorization is affected by network information flow.
Zubek, J., Denkiewicz, M., Barański, J., Wróblewski, P., Rączaszek-Leonardi, J., & Plewczynski, D. (2017).
PloS one, 12(8), e0182490.
Novel neuro-audiological findings and further evidence for TWNK involvement in Perrault syndrome.
Ołdak, M., Oziębło, D., Pollak, A., Stępniak, I., Lazniewski, M., Lechowicz, U., Kochanek, K., Furmanek, M., Tacikowska, G., Plewczynski, D. and Wolak, T. (2017.)
Journal of translational medicine, 15(1), p.25.
RNA structure interactions and ribonucleoprotein processes of the influenza A virus.
Dawson, W. K., Lazniewski, M., & Plewczynski, D. (2017).
Briefings in functional genomics. 2017, 1-13
The 4D nucleome project.
Dekker, J., Belmont, A. S., Guttman, M., Leshyk, V., Lis, J. T., Lomvardas, S., Mirny, L. A., O’Shea C. C., Park, P. J., Ren, B., Ritland Politz, C. J., Shendure, J., Zhong, S. & the 4D Nucleome Network
Nature, 549, 219–226
MaER: A New Ensemble Based Multiclass Classifier for Binding Activity Prediction of HLA Class II Proteins
Mazzocco, G., Bhowmick, S. S., Saha, I., Maulik, U., Bhattacharjee, D., & Plewczynski, D., (2016)
International Conference on Pattern Recognition and Machine Intelligence (pp. 462-471). Springer, Cham
Computational inference of H3K4me3 and H3K27ac domain length.
Zubek, Julian, Michael L. Stitzel, Duygu Ucar, and Dariusz M. Plewczynski., (2016)
PeerJ 4: e1750
PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.
Chatterjee, P., Basu, S., Zubek, J., Kundu, M., Nasipuri, M., & Plewczynski, D. (2016).
Journal of molecular modeling, 22(4), 72. Chicago
The proline-rich region of glyceraldehyde-3-phosphate dehydrogenase from human sperm may bind SH3 domains, as revealed by a bioinformatic study of low-complexity protein segments
Tatjewski, M., Gruca, A., Plewczynski, D., & Grynberg, M. (2016)
Molecular reproduction and development, 83(2), 144-148
An integrated 3-dimensional genome modeling engine for data-driven simulation of spatial genome organization.
Szałaj, P., Tang, Z., Michalski, P., Pietal, M. J., Luo, O. J., Sadowski, M., ... & Plewczynski, D. (2016).
Genome research, 26(12), 1697-1709.
3D-GNOME: an integrated web service for structural modeling of the 3D genome.
Szalaj, P., Michalski, P. J., Wróblewski, P., Tang, Z., Kadlof, M., Mazzocco, G., ... & Plewczynski, D. (2016).
Nucleic acids research, 44(W1), W288-W293.
2dSpAn: Semiautomated 2-d segmentation, classification and analysis of hippocampal dendritic spine plasticity
Basu, S., Plewczynski, D., Saha, S., Roszkowska, M., Magnowska, M., Baczynska, E., & Wlodarczyk, J. (2016).
Bioinformatics, 32(16), 2490-2498.
3DFlu: database of sequence and structural variability of the influenza hemagglutinin at population scale
Mazzocco, G., Lazniewski, M., Migdał, P., Szczepińska, T., Radomski, J. P., & Plewczynski, D.
Database, 2016, baw130
Computational Approach to Dendritic Spine Taxonomy and Shape Transition Analysis
Bokota, G., Magnowska, M., Kuśmierczyk, T., Łukasik, M., Roszkowska, M., & Plewczynski, D. (2016)
Frontiers in computational neuroscience, 10, 140
Application of machine learning method in genomics amd proteomics
Lin, H., Chen, W., Anandakrishnan, R., & Plewczynski, D., (2015)
The Scientific World Journal
CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription.
Tang, Z., Luo, O. J., Li, X., Zheng, M., Zhu, J. J., Szalaj, P., ... & Michalski, P. (2015).
Cell, 163(7), 1611-1627.
A global reference for human genetic variation.
1000 Genomes Project Consortium. (2015).
Nature, 526(7571), 68.
A combined systems and structural modeling approach repositions antibiotics for Mycoplasma genitalium.
Kazakiewicz, D., Karr, J. R., Langner, K. M., & Plewczynski, D. (2015).
Computational biology and chemistry, 59, 91-97.
Binding activity prediction of cyclin-dependent inhibitors.
Saha, I., Rak, B., Bhowmick, S. S., Maulik, U., Bhattacharjee, D., Koch, U., ... & Plewczynski, D. (2015).
Journal of chemical information and modeling, 55(7), 1469-1482.
Analysis of Next-Generation Sequencing Data of miRMA for the Prediction of Breast Cancer
Saha, I., Bhowmick, S. S., Geraci, F., Pellegrini, M., Bhattacharjee, D., Maulik, U., & Plewczynski, D.
International Conference on Swarm, Evolutionary, and Memetic Computing (pp. 116-127). Springer, Cham.
Analysis of Structural Chromosome Variants by Next Generation Sequencing Methods
Plewczynski, D., Gruca, S., Szałaj, P., Gulik, K., de Oliveira, S. F., & Malhotra, A. (2016)
In Clinical Applications for Next-Generation Sequencing (pp. 39-61).
HarmonyDOCK: the structural analysis of poses in protein-ligand docking.
Plewczynski, D., Philips, A., Grotthuss, M. V., Rychlewski, L., & Ginalski, K. (2014).
Journal of Computational Biology, 21(3), 247-256.
Ensemble learning prediction of protein–protein interactions using proteins functional annotations.
Saha, I., Zubek, J., Klingström, T., Forsberg, S., Wikander, J., Kierczak, M., ... & Plewczynski, D. (2014).
Molecular BioSystems, 10(4), 820-830.

Selected Scientific Discoveries in Laboratory of Functional and Structural Genomics headed by Dariusz Plewczynski, PhD at Centre of New Technologies, University of Warsaw, Poland
In the Laboratory of Functional and Structural Genomics we perform theoretical studies, whose main objective is to analyze and predict the three-dimensional structure of the human genome, and its relation with the genomic diversity of human populations, both natural and pathological. In particular, we investigate structural variants, copy number variants observed in various sub-populations and the groups of patients, and their three- dimensional localization in the structure of the nucleus. We also examine the relationship of the expression
levels of selected genes from their location in three-dimensional space. In addition, we use structural information to enrich the sequential genomic analysis in order to better define the function of selected genomic regions that are important in the context of personalized medicine. For this purpose, first we are developing a variety of large-scale computational tools for analysis of whole genome sequences, the identification of structural variants, determining the statistical significance of the observed number of copies of genomic regions in selected cohorts of patients. Secondly, we evaluate their uniqueness comparing the observed changes with typical and natural genomic diversity that has been cataloged for example in the 1000 Genomes Project Consortium. Thirdly, we infer the biological function of these genomic regions using publicly available databases. Fourthly, we identify unique local three-dimensional environment for selected sites, eg. regulatory ones. In the fifth step, we analyze the impact of structural re-arrangements of those local neighborhoods on the gene expression profiles, which is related to the presence of transcription factories.

Areas For Scientific Synergies:
Three dimensional Genomics, higher order chromatin organization, nucleus, GWAS & SV (deletions, duplications, insertions, inversions, and translocations), Big Data, statistical learning, massive dataset analysis; Computational Genomics and bioinformatics in population-level genomic data for medical applications and fundamental research in Life Sciences, structural and functional analysis of “omics” data; SV identification in human genome using NGS methods; biophysical methods, proteins structure prediction, protein-protein interactions: prediction and analysis and analysis of biomolecule interactions networks; whole-cell modeling;

Laboratory Research Methodology and Achievements
Our research methodology was already tested and validated experimentally for GM12878 cell line only, where the high resolution interaction datasets were available. The underlying biological model with experimental results was published in Cell 2015:
http://linkinghub.elsevier.com/retrieve/pii/S0092-8674(15)01504-4, followed by the detailed description of computational model that was published in Genome Research 2016 http://genome.cshlp.org/content/early/2016/10/27/gr.205062.116.abstract. The web server allowing research community for free access to both the experimental data and computational simulation engine was published in Nucleic Acids Research 2016: http://nar.oxfordjournals.org/content/early/2016/05/16/nar.gkw437.full . In short: ChIA-PET is a high throughput mapping technology that reveals long-range chromatin interactions and provides insights into the basic principles of spatial genome organization and gene regulation mediated by specific protein factors. Recently, we showed that a single ChIA-PET experiment provides information at all genomic scales of interest, from the high resolution locations of binding sites and enriched chromatin interactions mediated by specific protein factors, to the low resolution of non-enriched interactions that reflect topological neighborhoods of higher-order chromosome folding. This multilevel nature of ChIA-PET data offers an opportunity to use multiscale 3D models to study structural-functional relationships at multiple length scales, but doing so requires a structural modeling platform. We development of 3D-GNOME (3-Dimensional GeNOme Modeling Engine), a complete computational pipeline for 3D simulation using ChIA-PET data. 3D- GNOME consists of three integrated components: a graph-distance- based heatmap normalization tool, a 3D modeling platform, and an interactive 3D visualization tool. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of CTCF-motif
orientation and high-resolution looping patterns in 3D simulation provided additional reliability of potential biologically plausible topological structures.

Structural Genomics – to be further developed within proposed TEAM project

  1. „An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization” Szałaj P, Tang Z, Michalski P, Pietal MJ, Luo OJ, Sadowski M, Li X, Radew K, Ruan Y, Plewczynski D. Genome Res. 2016 Oct 27.
  2. “3D-GNOME: an integrated web service for structural modeling of the 3D genome” by Szalaj P, Michalski PJ, Wróblewski P, Tang Z, Kadlof M, Mazzocco G, Ruan Y, Plewczynski D. Nucleic Acids Res. 2016 May 16. pii: gkw437. pmid:27185892

3D-GNOME (3-Dimensional Genome Modeling Engine) is a complete computational pipeline for 3D simulation
using ChIA-PET or Hi-C data. 3D-GNOME consists of four integrated components: i) a graph-distance- based heat map normalization tool, ii) a 3D modeling platform, iii) an interactive 3D visualization tool, and finally iv) a web service which generates 3D structures from 3C data and provides tools to visually inspect and annotate the resulting structures, in addition to a variety of statistical plots and heatmaps which characterize the selected genomic region. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of the 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of the CTCF-motif orientation and high- resolution looping patterns in the 3D simulation provided additional reliability of potential biologically plausible topological structures. The 3D-GNOME web server is freely available at http://3dgnome.cent.uw.edu.pl/.

3. “CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription” Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, …, Wilczynski GM, Plewczynski D, Li G, Ruan Y. Cell 2015, Dec 17;163(7):1611-27. Epub 2015 Dec 10.

Spatial genome organization and its effect on transcription remains a fundamental question. We applied an advanced chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) strategy to comprehensively map the higher-order chromosome folding and specific chromatin interactions mediated by the CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) with haplotype specificity and nucleotide resolution in different human cell lineages. We find that the CTCF/cohesin-mediated interaction anchors serve as structural foci for spatial organization of constitutive genes concordant with the CTCF-motif orientation, whereas RNAPII interacts within these structures by selectively drawing cell-type- specific genes toward CTCF foci for coordinated transcription. Furthermore, we show that the haplotype variants and allelic interactions have differential effects on chromosome configuration, influencing gene expression, and may provide mechanistic insights into functions associated with disease susceptibility. The 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCFs are involved in defining the interface between condensed and open  compartments for structural regulation. Our 3D genome strategy thus provides unique insights into the topological mechanism of human variations and diseases.

4. “A global reference for human genetic variation” by 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. Nature. 2015 Oct 1;526(7571):68-74.

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. This publication reports the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. They characterized a broad spectrum of genetic variation, in total over 88 million
variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes ~99% of the SNP variants with a frequency of >1% for a variety of ancestries. The distribution of genetic variation across the global sample is provided, and discusses the implications for common disease studies. Our (D. Plewczynski) contribution was structural variants (SV) identification using combined aCGH arrays and NGS sequencing tightly coupled with computational meta-approach. The set of tools (including meta-caller for SV identification using combined several independent methods and machine learning was developed at Charles Lee laboratory at JAX GM. We participated in the early phase of this project, providing initial results and working independently further within this area by focusing on application of deep learning algorithms for SV calling and GWAS studies.

Functional Genomics – tools and algorithms developed within iCell (2015-2018) and Influenza (2013-2016) National Research Centre OPUS grants

5. „Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein–protein interactions using machine learning methods" by Srivastava,A, Mazzocco,G, Kel A, Wyrwicz LS and D. Plewczynski Mol BioSyst. 12(3):778-85 (2016).

6. „Ensemble learning prediction of protein-protein interactions using proteins functional annotations” by Saha, Zubek, Klingström, Forsberg, Wikander, Kierczak, Maulik and D. Plewczynski Mol BioSyst. 10(4):820-30. (2014).

Protein-protein interactions (PPI) are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins, such as DIP, MINT, BioGRID and IntAct. Then we constructed descriptive features for
machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, the sensitivity exceeded 80% and the classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a larger and more realistic dataset maintaining
sensitivity over 70%. Finally, we further improved those results by constructing a unique validation approach aimed to at collecting reliable non-interacting proteins (NIPs). The curated negatives and positives subsets were used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96% and 98%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances could be considerably improved by focusing on data preparation. The predicted PPI networks were analyzed in terms of their topological and geometrical properties.

7. “Consensus classification of Human Leukocyte Antigens class II proteins” by I. Saha, G. Mazzocco and D. Plewczynski. Immunogenetics 65(2):97-105 (2013);

Class II human leukocyte antigens (HLA II) are proteins involved in the human immunological adaptive response by binding and exposing some pre-processed, non-self peptides in the extracellular domain in order to make them recognizable by the CD4+ T lymphocytes. However, the understanding of HLA-peptide binding interaction is a crucial step for designing a peptide-based vaccine because the high rate of polymorphisms in HLA class II molecules creates a big challenge, even though the HLA II proteins can be grouped into supertypes, where members of different class bind a similar pool of peptides. We performed the supertype classification of 27 HLA II proteins using their binding affinities and structural-based linear motifs to create a stable group of supertypes. For this purpose, a well-known clustering method was used, and then, a consensus was built to find the stable groups and to show the functional and structural correlation of HLA II
proteins. Thus, the overlap of the binding events was measured, confirming a large promiscuity within the HLA II-peptide interactions. Moreover, a very low rate of locus-specific binding events was observed for the HLA-DP genetic locus, suggesting a different binding selectivity of these proteins with respect to HLA-DR and HLA-DQ proteins. Secondly, a predictor based on a support vector machine (SVM) classifier was designed to recognize HLA II-binding peptides.

Public Health Relevance of proposed research project 
It is becoming increasingly recognized that higher-order genome organization crucially influences gene regulation, cell function and ultimately human health. However, the ability to investigate these relationships will depend on technological advances that enable a more precise, integrated view of genome structure and function. This proposal seeks to address this challenge through the development of a 3Depigenomic platform that will provide a powerful tool for studying genomic structure in space (3D) and time (4D).