Course Name: Multivariate Data Analysis
Participants: BSc Mathematics and Data Science students of Sorbonne University
Faculty Name : Dr. Tanujit Chakraborty
Timeline : January 17, 2022 to April 28, 2022 | Total Teaching Hours : 90 Hours (45 Sessions)
Email: tanujit.chakraborty@sorbonne.ae
Participants: BSc Mathematics and Data Science students of Sorbonne University
Faculty Name : Dr. Tanujit Chakraborty
Timeline : January 17, 2022 to April 28, 2022 | Total Teaching Hours : 90 Hours (45 Sessions)
Email: tanujit.chakraborty@sorbonne.ae
Course Introduction:
Data driven decision making is the state-of-the-art of decision making today. As the data collected and stored are multidimensional, to extract knowledge out of it requires statistical analysis in the multivariate domain. The aim of this course is therefore to build confidence in the students in analyzing and interpreting multivariate data.
Course Objectives:
The course will help the students by:
(i) Providing guidelines to identify and describe real life problems so that relevant data can be collected,
(ii) Linking data generation process with statistical distributions, especially in the multivariate domain,
(iii) Linking the relationship among the variables (of a process or system) with multivariate statistical models,
(iv) Providing step by step procedure for estimating parameters of a model developed,
(v) Analyzing errors along with computing overall fit of the models,
(vi) Interpreting model results in real life problem solving and providing procedures for model validation.
(vii) Hands on experience on the usage of open source software like R and Python.
(i) Providing guidelines to identify and describe real life problems so that relevant data can be collected,
(ii) Linking data generation process with statistical distributions, especially in the multivariate domain,
(iii) Linking the relationship among the variables (of a process or system) with multivariate statistical models,
(iv) Providing step by step procedure for estimating parameters of a model developed,
(v) Analyzing errors along with computing overall fit of the models,
(vi) Interpreting model results in real life problem solving and providing procedures for model validation.
(vii) Hands on experience on the usage of open source software like R and Python.
Evaluation Components:
The evaluation components for the Multivariate Data Analytics (MDA) course will be as follows:
1) Homework Assignments - 10% ; 2) Mid Term Test - 20% ; 3) Project Work - 20% ; 4) End Term Test - 50%.
1) Homework Assignments - 10% ; 2) Mid Term Test - 20% ; 3) Project Work - 20% ; 4) End Term Test - 50%.
Textbooks:
• Friedman J, Hastie T, Tibshirani R. (2009). The Elements of Statistical Learning. New York: Springer series in statistics. (Read the Free Online Copy from here (Second Edition): https://web.stanford.edu/~hastie/ElemStatLearn/
• Rencher, A.C. and Christensen, W.F. (2012). Methods of Multivariate Analysis. 3rd Edition. An Introduction to Stochastic Modeling. Wiley.
• Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics Series. Springer, New York.
• Rencher, A.C. and Christensen, W.F. (2012). Methods of Multivariate Analysis. 3rd Edition. An Introduction to Stochastic Modeling. Wiley.
• Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics Series. Springer, New York.
Some Very Interesting Papers For Reading :
I would recommend all the participants to go through these research articles (mostly non-mathematical) along with the course. Please click on the paper name to view these outstanding and interesting paper:
1. Statistics - What are the most important statistical ideas of the past 50 years? (2021)
2. Data Science - 50 Years of Data Science (2017)
3. Statistics Vs Data Science - The science of statistics versus data science: What is the future? (2021)
4. Statistics Vs Machine Learning - Prediction, Estimation, and Attribution (2020)
5. Future - The future of statistics and data science (2018)
1. Statistics - What are the most important statistical ideas of the past 50 years? (2021)
2. Data Science - 50 Years of Data Science (2017)
3. Statistics Vs Data Science - The science of statistics versus data science: What is the future? (2021)
4. Statistics Vs Machine Learning - Prediction, Estimation, and Attribution (2020)
5. Future - The future of statistics and data science (2018)
Lecture Notes:
This is a 13 weeks course offered at SUAD. All the data analysis were done using R software. Class notes, Slides, data and code are available below.
|
![]()
|
Week 1 : Recap
Topic: Introduction to Multivariate Data Analysis |
![]()
|
Week 1 : Recap
Topic: Descriptive Statistics |
![]()
![]()
|
Week 2 : Recap
Topic: Probability Distributions and Sampling Distributions |
![]()
![]()
![]()
|
Week 2 : Recap
Topic: Basics of R and RStudio |
![]()
![]()
|
Week 3 : Recap
Topic: Statistical Inference |
![]()
|
Week 3 : Recap
Topic: Analysis of Variance (ANOVA) |
![]()
![]()
|
Week 4 :
Topic: Correlation Analysis and Simple Linear Regression |
![]()
![]()
![]()
|
Week 5, 6 and 7 :
Topic: Multiple Linear Regression |
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
Week 8 and 9:
Topic: Nonlinear Regression, Logistic Regression, and GLM |
![]()
![]()
![]()
![]()
![]()
|
Week 9 :
Topic: Time Series Forecasting |
![]()
![]()
|
Week 10 and 11 :
Topic: Dimension Reduction Techniques |
![]()
![]()
![]()
|
Week 12 :
Topic: Tutorial Problems and Solutions |
![]()
|
Week 13 :
Topic: Special Talk Sessions |
![]()
![]()
![]()
![]()
|