Derived from
|
20410147 IN470- COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY in Computational Sciences LM-40 CASTIGLIONE Filippo
(syllabus)
Outline of the course; Introduction and generality; Bioinformatics and algorithms; Computational biology in the clinic and in the pharmaceutical industry; Pharmacokinetics and pharmacodynamics;
Introduction to Systems Biology: what is computational biology; The roles of mathematical modeling and bioinformatics; what is he aiming for; what are the problems; Theoretical tools used in bio-mathematics and bioinformatics.
Introduction to molecular and cellular biology (first part): basic knowledge of genetics, proteomics and cellular processes; Ecology and evolution; the basic molecule; molecular bonds; the chromosomes; DNA and its replication;
Introduction to molecular and cellular biology (second part); genomics; The central dogma of biology; The genome project; the structure of the human genome Analysis of genes; transcription of DNA; the viruses;
Laboratory: generation of random numbers; the functions srand48 and drand48; random generation of arbitrary length nucleotide strings (program1.c); random generation of amino acid strings of arbitrary length (program2.c);
Introduction to information theory; Shannon Entropy; Conditional Entropy; Mutual Information; Indices of biological diversity; Shannon Index; True diversity; Reny index;
Laboratory: the genetic code; C program of transcription DNA sequence and translation into proteins;
Introduction to stochastic processes; basic definition; examples; model of queues; Bernoulli and Poisson process; Markov processes; stochastic processes in bioinformatics and bio-mathematics; the autocorrelation; Outline of the Random Walks and the BLAST algorithm of sequence alignment as a stochastic process and principal algorithm for the consultation of biological sequence databases;
Laboratory: development of an algorithm in C for the calculation of the Shannon Entropy of a text in English (or in Italian) any (e.g., http://www.textfiles.com/etext/)
Random walks. The BLAST algorithm for aligning sequences as a random path; Laboratory: C implementation of different algorithms for the generation of a random walk in 1D and 2D on the lattice and in R or R ^ 2 signal and calculation of the mean square displacement;
Compare sequences: similarity and homology; pairwise alignment; editing distance; scoring matrices PAM and BLOSUM; Needleman-Wunsch's algorithm; local alignment; Smith-Waterman's algorithm; BLAST algorithm;
Laboratory: C implementation of an algorithm for the generation of a signal with noise and calculation of the correlogram in the presence or absence of a true signal;
Multiple Sequence Alignment; consensus sequence; star alignment algorithms; ClustalW; entropy and circular sum scoring functions;
Biological data banks; reasons; data format; taxonomy; Primary DBs; Secondary DBs; NCBI, EMBL, DDBJ; NCBI EBI-Entrez; Exact matching / string searching: general; the agony of Knuth-Morris-Pratt;
Exact matching / string searching: the Boyer-Moore agoritm;
Exercise on an implementation of the Knuth-Morris-Pratt exact matching algorithm. Exercise on biological databases; primary databases; secondary databases; NCBI, EMBL, DDBJ; NCBI EBI-Entrez; Use of the BLAST algorithm
Phylogenetic Analysis; phylogenetic trees; dimension of the research space of phylogenetic algorithms; Methods of construction of phylogenetic trees; Data used for phylogenetic analysis; The Unweighted Pair Method Method with Arithmetic mean (UPGMA) algorithm; the Neighbor Joining Method algorithm; Hidden Markov Models; decoding; the Viterbi Algorithm; Evaluation;
Laboratory: completion of the exercise on mutation, selection and evolution of nucleotide strings (genotype) translated into amino acid strings (phenotype); Selection is made based on the presence of certain substrings in the phenotype that determines the fitness value; Implemented details, display of the convergence criterion and results, discussion, etc .;
Machine Learning; generality'; supervised and unsupervised learning; model selection; undefitting; overfitting; Polynomial curve fitting; machine learning as an estimate of the parameters and the problem of overfitting; subdivision of the training set into testing and testing; concept of bias and variance trade-off; Artificial Neural Networks; definizone; the percussion of Rosenblatt; the percettrone learning algorithm; the multi-layer perceptron;
Laboratory: completion of the implementation in ANSI C of the evolutionary algorithm of nucleotide strings (genotype) translated, through the use of the genetic code, into amino acid strings (phenotype);
Hidden Markov Models; The Forward Algorithm; The Backward Algorithm; Posterior Decoding; Learning; Baum-Welch Algorithm; Use of Hidden Markov Models for the analysis of bio-sequences; gene finding;
Artificial Neural Networks; the error-back propagation algorithm for learning MLP; types of neural networks; convolution networks; reinforcement networks; unsupervised learning and self-organizing maps; Introduction to graph theory; representation, terminology, concepts; paths; cycles; connettivita '; distance; connected components; distance;
Introduction to graph theory; visit breadth-first search; depth-first search; Dijkstra's algorithm; six-degree of separation; small world networks; centrality measures; degree centrality; eigenvector centrality; betweennes centrality; closeness centrality; The network biology; generality'; concepts; types of biological data used to build networks; network biology and network medicine; problems and algorithms used; centrality measures; random networks; scale-free networks; preferential attachment; scale-free network in biology;
Laboratory: completion of the exercise on the evolutionary algorithm; Implemented details, display of the convergence criterion and results, discussion, etc .;
Bio-mathematical models; prediction using theoretical models; the itertative paradigm of mathematical modeling; data-driven models; limited and non-population growth models; analytical derivation and examples; logistics growth; ecological models limited by density; The Lotka-Volterra model; the experiment by Huffaker and Kenneth; the SIR epidemic model and some of its variants; Perelson's model for HAART; the Java Populus application for the solution of continuous models of population dynamics; hints to the numerical resolution methods of differential equation systems;
Discrete models; spin models (Ising models); Cellular automata; Boolean networks; Agent-based models; data fitting and parameter estimation; software tools available; Cellular automata; introduction and history; definition; the 1-dimensional automaton; Wolfram classification; the 2-dimensional automaton; Conway's Game of Life; Software available for CA simulation; dedicated hardware (CA-Machine); the prey-predator model as a two-dimensional cellular automaton; relationship with the system of ordinary derivation equations; stochastic models; Stochastic CAs as discrete stochastic dynamic systems and stochastic processes; example of CA: Belousov-Zabotonsky reactions;
(reference books)
[-] E.S. Allman, J.A. Rhodes. Mathematical Models in Biology: An Introduction (2004) Cambridge University Press. [-] W.J. Ewens, G.R. Grant. Statistical Methods in Bioinformatics, An Introduction (2005) Springer Verlag. [-] R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Biological sequence analysis - Probabilistic models of proteins and nucleic acids (1998) Cambridge University Press.
|