The incredible amount of data that our modern societies are accumulating has led to the development of specific techniques for sorting, ordering, analysing or displaying these data. Data sciences is the discipline based on mathematical, statistical and computer tools, including machine learning, that encompasses these techniques. This course will introduce to the students the basic notions and techniques of data sciences, including machine learning, with practical applications in Biomedical or Environmental engineering. The course includes a significant practical component with programming in the Python language.
Summary:
Lecture 1: Data : The data in data science
Lab 1 : Python basics adapted from the “Machine learning preparatory week @PSL” (Python & NumPy)
Lecture 2: Python and pandas : Tabular data in Python
Lab 2: Notebook on European past floods
Lecture 3: Machine Learning: history; applications; recent successes from the “Machine learning preparatory week @PSL”
Lab 3: Loading big datasets; Analyzing big datasets
Lecture 4: Introduction to machine learning from the “Machine learning preparatory week @PSL”
Lab 4: Dimensionality; Dimensionality reduction : Principal Component Analysis; Classification
Lecture 5: Supervised machine learning models from the “Machine learning preparatory week @PSL”
Lecture 6: Scikit-learn: estimation and pipelines from the “Machine learning preparatory week @PSL”
Lab 5: finish the previous notebooks
Project: Presentation, description and modeling
References
Python for data science
Programming in Python for Data Science
Scientific Computing in Python: Introduction to NumPy and Matplotlib
Python for geo data science