Teaching - Università Roma Tre

IN470- COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY (objectives)

Code

20410147

Language

ITA

Type of certificate

Profit certificate

Credits

7

Scientific Disciplinary Sector Code

INF/01

Contact Hours

60

Type of Activity

Related or supplementary learning activities

Derived from	20410147 IN470- COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY in Computational Sciences LM-40 CASTIGLIONE Filippo (syllabus) Outline of the course; Introduction and generality; Bioinformatics and algorithms; Computational biology in the clinic and in the pharmaceutical industry; Pharmacokinetics and pharmacodynamics; Introduction to Systems Biology: what is computational biology; The roles of mathematical modeling and bioinformatics; what is he aiming for; what are the problems; Theoretical tools used in bio-mathematics and bioinformatics. Introduction to molecular and cellular biology (first part): basic knowledge of genetics, proteomics and cellular processes; Ecology and evolution; the basic molecule; molecular bonds; the chromosomes; DNA and its replication; Introduction to molecular and cellular biology (second part); genomics; The central dogma of biology; The genome project; the structure of the human genome Analysis of genes; transcription of DNA; the viruses; Laboratory: generation of random numbers; the functions srand48 and drand48; random generation of arbitrary length nucleotide strings (program1.c); random generation of amino acid strings of arbitrary length (program2.c); Introduction to information theory; Shannon Entropy; Conditional Entropy; Mutual Information; Indices of biological diversity; Shannon Index; True diversity; Reny index; Laboratory: the genetic code; C program of transcription DNA sequence and translation into proteins; Introduction to stochastic processes; basic definition; examples; model of queues; Bernoulli and Poisson process; Markov processes; stochastic processes in bioinformatics and bio-mathematics; the autocorrelation; Outline of the Random Walks and the BLAST algorithm of sequence alignment as a stochastic process and principal algorithm for the consultation of biological sequence databases; Laboratory: development of an algorithm in C for the calculation of the Shannon Entropy of a text in English (or in Italian) any (e.g., http://www.textfiles.com/etext/) Random walks. The BLAST algorithm for aligning sequences as a random path; Laboratory: C implementation of different algorithms for the generation of a random walk in 1D and 2D on the lattice and in R or R ^ 2 signal and calculation of the mean square displacement; Compare sequences: similarity and homology; pairwise alignment; editing distance; scoring matrices PAM and BLOSUM; Needleman-Wunsch's algorithm; local alignment; Smith-Waterman's algorithm; BLAST algorithm; Laboratory: C implementation of an algorithm for the generation of a signal with noise and calculation of the correlogram in the presence or absence of a true signal; Multiple Sequence Alignment; consensus sequence; star alignment algorithms; ClustalW; entropy and circular sum scoring functions; Biological data banks; reasons; data format; taxonomy; Primary DBs; Secondary DBs; NCBI, EMBL, DDBJ; NCBI EBI-Entrez; Exact matching / string searching: general; the agony of Knuth-Morris-Pratt; Exact matching / string searching: the Boyer-Moore agoritm; Exercise on an implementation of the Knuth-Morris-Pratt exact matching algorithm. Exercise on biological databases; primary databases; secondary databases; NCBI, EMBL, DDBJ; NCBI EBI-Entrez; Use of the BLAST algorithm Phylogenetic Analysis; phylogenetic trees; dimension of the research space of phylogenetic algorithms; Methods of construction of phylogenetic trees; Data used for phylogenetic analysis; The Unweighted Pair Method Method with Arithmetic mean (UPGMA) algorithm; the Neighbor Joining Method algorithm; Hidden Markov Models; decoding; the Viterbi Algorithm; Evaluation; Laboratory: completion of the exercise on mutation, selection and evolution of nucleotide strings (genotype) translated into amino acid strings (phenotype); Selection is made based on the presence of certain substrings in the phenotype that determines the fitness value; Implemented details, display of the convergence criterion and results, discussion, etc .; Machine Learning; generality'; supervised and unsupervised learning; model selection; undefitting; overfitting; Polynomial curve fitting; machine learning as an estimate of the parameters and the problem of overfitting; subdivision of the training set into testing and testing; concept of bias and variance trade-off; Artificial Neural Networks; definizone; the percussion of Rosenblatt; the percettrone learning algorithm; the multi-layer perceptron; Laboratory: completion of the implementation in ANSI C of the evolutionary algorithm of nucleotide strings (genotype) translated, through the use of the genetic code, into amino acid strings (phenotype); Hidden Markov Models; The Forward Algorithm; The Backward Algorithm; Posterior Decoding; Learning; Baum-Welch Algorithm; Use of Hidden Markov Models for the analysis of bio-sequences; gene finding; Artificial Neural Networks; the error-back propagation algorithm for learning MLP; types of neural networks; convolution networks; reinforcement networks; unsupervised learning and self-organizing maps; Introduction to graph theory; representation, terminology, concepts; paths; cycles; connettivita '; distance; connected components; distance; Introduction to graph theory; visit breadth-first search; depth-first search; Dijkstra's algorithm; six-degree of separation; small world networks; centrality measures; degree centrality; eigenvector centrality; betweennes centrality; closeness centrality; The network biology; generality'; concepts; types of biological data used to build networks; network biology and network medicine; problems and algorithms used; centrality measures; random networks; scale-free networks; preferential attachment; scale-free network in biology; Laboratory: completion of the exercise on the evolutionary algorithm; Implemented details, display of the convergence criterion and results, discussion, etc .; Bio-mathematical models; prediction using theoretical models; the itertative paradigm of mathematical modeling; data-driven models; limited and non-population growth models; analytical derivation and examples; logistics growth; ecological models limited by density; The Lotka-Volterra model; the experiment by Huffaker and Kenneth; the SIR epidemic model and some of its variants; Perelson's model for HAART; the Java Populus application for the solution of continuous models of population dynamics; hints to the numerical resolution methods of differential equation systems; Discrete models; spin models (Ising models); Cellular automata; Boolean networks; Agent-based models; data fitting and parameter estimation; software tools available; Cellular automata; introduction and history; definition; the 1-dimensional automaton; Wolfram classification; the 2-dimensional automaton; Conway's Game of Life; Software available for CA simulation; dedicated hardware (CA-Machine); the prey-predator model as a two-dimensional cellular automaton; relationship with the system of ordinary derivation equations; stochastic models; Stochastic CAs as discrete stochastic dynamic systems and stochastic processes; example of CA: Belousov-Zabotonsky reactions; (reference books) [-] E.S. Allman, J.A. Rhodes. Mathematical Models in Biology: An Introduction (2004) Cambridge University Press. [-] W.J. Ewens, G.R. Grant. Statistical Methods in Bioinformatics, An Introduction (2005) Springer Verlag. [-] R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Biological sequence analysis - Probabilistic models of proteins and nucleic acids (1998) Cambridge University Press.
Dates of beginning and end of teaching activities	From to
Delivery mode	Traditional
Attendance	not mandatory
Evaluation methods	Written test Oral exam