Algorithms for big data
(objectives)
In many application contexts huge volumes of data are produced which are used in the economic-financial, political, social and even institutional fields. Often the data is stored in huge distributed clouds and is sometimes generated according to a continuous flow, so large as to make complete storage unfeasible. In many cases the data pertains to entities in close relationship with each other and gives rise to massive networks of connections. Familiar examples for such networks are biological and social networks, distribution networks, and the Web graph. Furthermore, the fact that the data is stored in systems managed by third parties poses integrity problems, which have not been considered in the classical IT literature in terms of both their type and scale.
This scenario poses unprecedented algorithmic challenges, which are being considered by a vast audience of researchers. In the last decade, this effort has produced many innovations on both the methodological and technological level. This course aims at transferring to the students some of the most important methodological tools originated from the research on Big Data algorithms. These methodological tools are presented within challenging application contexts.
|
Code
|
20810211 |
Language
|
ITA |
Type of certificate
|
Profit certificate
|
Credits
|
6
|
Scientific Disciplinary Sector Code
|
ING-INF/05
|
Contact Hours
|
54
|
Type of Activity
|
Core compulsory activities
|
Teacher
|
DI BATTISTA GIUSEPPE
(syllabus)
1) Algorithms for data streams - Approximate counting - Majority problems - Sampling and reservoir sampling - Bloom filters - Frequent itemsets 2) Sub-linear algorithms - Diameter approximation - Property testing 3) Clustering 4) Algorithms and data structures for quantitative features analysis - 1d-,2d-,3d-range queries - Skyline (pareto frontier), near-neighbor search, voronoi diagrams 5) Dimensionality reduction 6) Algorithms for the decomposition of complex networks - Decomposition into k-connected components - Decomposition into k-cores, maximal cliques, maximal k-plexes 7) Distributed Hash Tables, Consistent Hashing 8) Integrity of large data sets, consistency in distributed systems, CAP/PACELC theorems and their impact on NoSQL databases
(reference books)
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Cambridge University Press http://www.mmds.org/
|
Dates of beginning and end of teaching activities
|
From 01/03/2021 to 11/06/2021 |
Delivery mode
|
Traditional
|
Attendance
|
not mandatory
|
Evaluation methods
|
Oral exam
A project evaluation
|
Teacher
|
PATRIGNANI MAURIZIO
(syllabus)
1) Algorithms for data streams - Approximate counting - Majority problems - Sampling and reservoir sampling - Bloom filters - Frequent itemsets 2) Sub-linear algorithms - Diameter approximation - Property testing 3) Clustering 4) Algorithms and data structures for quantitative features analysis - 1d-,2d-,3d-range queries - Skyline (pareto frontier), near-neighbor search, voronoi diagrams 5) Dimensionality reduction 6) Algorithms for the decomposition of complex networks - Decomposition into k-connected components - Decomposition into k-cores, maximal cliques, maximal k-plexes 7) Distributed Hash Tables, Consistent Hashing 8) Integrity of large data sets, consistency in distributed systems, CAP/PACELC theorems and their impact on NoSQL databases
(reference books)
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Cambridge University Press http://www.mmds.org/
|
Dates of beginning and end of teaching activities
|
From 01/03/2021 to 11/06/2021 |
Delivery mode
|
Traditional
|
Attendance
|
not mandatory
|
Evaluation methods
|
Oral exam
A project evaluation
|
Teacher
|
PIZZONIA MAURIZIO
(syllabus)
1) Algorithms for data streams - Approximate counting - Majority problems - Sampling and reservoir sampling - Bloom filters - Frequent itemsets 2) Sub-linear algorithms - Diameter approximation - Property testing 3) Clustering 4) Algorithms and data structures for quantitative features analysis - 1d-,2d-,3d-range queries - Skyline (pareto frontier), near-neighbor search, voronoi diagrams 5) Dimensionality reduction 6) Algorithms for the decomposition of complex networks - Decomposition into k-connected components - Decomposition into k-cores, maximal cliques, maximal k-plexes 7) Distributed Hash Tables, Consistent Hashing 8) Integrity of large data sets, consistency in distributed systems, CAP/PACELC theorems and their impact on NoSQL databases
(reference books)
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Cambridge University Press http://www.mmds.org/
|
Dates of beginning and end of teaching activities
|
From 01/03/2021 to 11/06/2021 |
Delivery mode
|
Traditional
|
Attendance
|
not mandatory
|
Evaluation methods
|
Oral exam
A project evaluation
|
Teacher
|
FRATI FABRIZIO
(syllabus)
1) Algorithms for data streams - Approximate counting - Majority problems - Sampling and reservoir sampling - Bloom filters - Frequent itemsets 2) Sub-linear algorithms - Diameter approximation - Property testing 3) Clustering 4) Algorithms and data structures for quantitative features analysis - 1d-,2d-,3d-range queries - Skyline (pareto frontier), near-neighbor search, voronoi diagrams 5) Dimensionality reduction 6) Algorithms for the decomposition of complex networks - Decomposition into k-connected components - Decomposition into k-cores, maximal cliques, maximal k-plexes 7) Distributed Hash Tables, Consistent Hashing 8) Integrity of large data sets, consistency in distributed systems, CAP/PACELC theorems and their impact on NoSQL databases
(reference books)
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Cambridge University Press http://www.mmds.org/
|
Dates of beginning and end of teaching activities
|
From 01/03/2021 to 11/06/2021 |
Delivery mode
|
Traditional
|
Attendance
|
not mandatory
|
Evaluation methods
|
Oral exam
A project evaluation
|
Teacher
|
DA LOZZO GIORDANO
(syllabus)
1) Algorithms for data streams - Approximate counting - Majority problems - Sampling and reservoir sampling - Bloom filters - Frequent itemsets - Number of distinct elements 2) Dimensionality reduction -Johnson–Lindenstrauss lemma Embedding metric spaces with low distortion 3) Algorithms and data structures for quantitative features analysis - orthogonal range searching (kd-trees and range trees) - nearest neighbour search, k-nearest neighbour search - fractional cascading and simplex range search 4) Algorithms for the decomposition of complex networks - Decomposition into k-connected components - Decomposition into k-cores, maximal cliques, maximal k-plexes 5) NoSQL internals: Distributed Hash Tables, chord, consistent hashing 6) Scalable security: integrity of big data sets in the cloud, consistency and scalability issues with authenticated data structures, pipelining, blockchain scalability trilemma.
(reference books)
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Cambridge University Press http://www.mmds.org/
|
Dates of beginning and end of teaching activities
|
From 01/03/2021 to 11/06/2021 |
Delivery mode
|
Traditional
|
Attendance
|
not mandatory
|
Evaluation methods
|
Written test
A project evaluation
|
|
|