The network infrastructure. Transmission of information. Circuit-switched and packet-switched networks. Local networks. Topologies for local area networks. LANs interconnection. Software architecture for the network. Principles of operation of the TCP/IP. IP addressing, dotted decimal notation, netmask, gateway, DNS. Name authorities. (Chapter 1 of [1])
Network applications. Client/server Architectures. DBMS relational model. Differences between databases and data warehouses. HTTP Protocol: visualisation, http queries, http document specification, URI specification, HTML Language, the browser as a language interpreter; Definition of service proxy. Structuring of information in view of automatic processing. (Chapter 2 of [1]).
Information Theory. Shannon Communication Model. Information quantity. Information Entropy. Mutual Information. Application of information theory to automatic text processing. Information structuring for automatic analysis. (Chapter 2 of [2])
Search engines. Ranking algorithms: the mathematics underlying search engine. Browsing the Web and probability. Transition matrices and their eigenvalues. Interpretation of the principal eigenvector as a ranking measure (Chapters 6 and 7 of [4]). Ranking algorithms, search engines and interfaces. Advanced Google queries [5].
Structuring information. Introduction to the XML tagging language. XML syntax; Definition of a grammar using regular expressions. Definition of the DTD (Data Type Definition). [6]
(reference books)
Materiale didattico [1] James Kurose, Keith Ross. 2013. Reti di Calcolatori e Internet, ISBN:9788871929385. [2] Manning, C. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. [3] Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (March 2002), 1-47. DOI=http://dx.doi.org/10.1145/505282.505283 [4] Michael W. Berry, Murray Browne. 2005.
Understanding search engines: mathematical modeling and text retrieval. SIAM. ISBN:0-89871-581-4 [5] Stephan Spencer. 2011. Google Power Search. O’Reilly. 978-1-449-31156-8 [6] Luca Roversi, Gestione Strutturata dell'Informazione, online notes.