Course Name: Data Analytics (3-Credit Course) 2021
Participants: MBA (IB) 2020-22 Batch, Indian Institute of Foreign Trade (IIFT) Faculty Name: Dr. Tanujit Chakraborty Total Number of Students Registered : 191 Timeline : July 1, 2021 to September 30, 2021 | Total Teaching Hours : 30 Hours Email: [email protected]
|
Course Introduction:
This course is designed to equip business students with the capabilities in extracting implicit, previously unknown and potentially useful knowledge from real-world data sets. It provides practical training that enables immediate and effective participation in data analytics projects. The course includes an introduction to Data Science to address business challenges that leverage business data. The course provides grounding in basic and advanced analytic methods (both Statistical and Machine Learning techniques) and an introduction to big data analytics technology and tools.
Course Objectives:
The participants will acquire the knowledge required for
1. Extracting insights through data summarization, aggregation and visualization methods.
2. Pre-processing the data for analytics and Decision making using Statistics methodologies.
3. Case studies and business domain specific applications of statistical and data analytics tools.
4. Developing models using statistical and machine learning techniques.
5. Generating actionable insights using unsupervised learning techniques.
6. Hands on experience on the usage of open source software like R and Python.
1. Extracting insights through data summarization, aggregation and visualization methods.
2. Pre-processing the data for analytics and Decision making using Statistics methodologies.
3. Case studies and business domain specific applications of statistical and data analytics tools.
4. Developing models using statistical and machine learning techniques.
5. Generating actionable insights using unsupervised learning techniques.
6. Hands on experience on the usage of open source software like R and Python.
Evaluation Components:
The evaluation components for the Data Analytics (DA) course will be as follows:
1) Quiz - 20% ; 2) Mid Term - 30% ; 3) Class participation - 10% ; 4) End Term Test - 40%.
1) Quiz - 20% ; 2) Mid Term - 30% ; 3) Class participation - 10% ; 4) End Term Test - 40%.
Textbooks:
Foundation books for this course are given below:
1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning, Springer. (Read the Free Online Copy from here (Second Edition): https://www.statlearning.com/)
2. Friedman J, Hastie T, Tibshirani R. (2009). The Elements of Statistical Learning. New York: Springer series in statistics. (Read the Free Online Copy from here (Second Edition): https://web.stanford.edu/~hastie/ElemStatLearn/)
3. Burkov, Andriy. The Hundred-page Machine Learning Book. (2019). (Read here: http://themlbook.com/wiki/doku.php)
1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning, Springer. (Read the Free Online Copy from here (Second Edition): https://www.statlearning.com/)
2. Friedman J, Hastie T, Tibshirani R. (2009). The Elements of Statistical Learning. New York: Springer series in statistics. (Read the Free Online Copy from here (Second Edition): https://web.stanford.edu/~hastie/ElemStatLearn/)
3. Burkov, Andriy. The Hundred-page Machine Learning Book. (2019). (Read here: http://themlbook.com/wiki/doku.php)
Other Reference Books:
The class notes are available below. References are given in the lecture notes. Students may also refer to:
1. Chambers, John. (2008). Software for data analysis: programming with R. Springer Science & Business Media, 2008.
2. Provost, Foster, and Tom Fawcett. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly.
3. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: principles and practice. OTexts. (Read Online: https://otexts.com/fpp3/)
4. Heumann, Christian, and Michael Schomaker. (2016). Introduction to statistics and data analysis. Springer.
5. Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data mining for business analytics: concepts, techniques, and applications in R. John Wiley & Sons.
1. Chambers, John. (2008). Software for data analysis: programming with R. Springer Science & Business Media, 2008.
2. Provost, Foster, and Tom Fawcett. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly.
3. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: principles and practice. OTexts. (Read Online: https://otexts.com/fpp3/)
4. Heumann, Christian, and Michael Schomaker. (2016). Introduction to statistics and data analysis. Springer.
5. Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data mining for business analytics: concepts, techniques, and applications in R. John Wiley & Sons.
Interesting Data Science Papers For Reading (Surveys / Critical Analysis) :
I would recommend all the participants to go through these research (mostly non-mathematical) papers along with the course. Please click on the paper name to view these outstanding and interesting paper:
1. Statistics - What are the most important statistical ideas of the past 50 years?
2. Data Science - 50 Years of Data Science
3. Time Series Forecasting - Statistical and Machine Learning forecasting methods: Concerns and ways forward
4. Machine Learning - How to avoid machine learning pitfalls: a guide for academic researchers
5. Deep Learning - Tabular Data: Deep Learning is Not All You Need
1. Statistics - What are the most important statistical ideas of the past 50 years?
2. Data Science - 50 Years of Data Science
3. Time Series Forecasting - Statistical and Machine Learning forecasting methods: Concerns and ways forward
4. Machine Learning - How to avoid machine learning pitfalls: a guide for academic researchers
5. Deep Learning - Tabular Data: Deep Learning is Not All You Need
Resources for Data Science Interview Preparations :
|
2. Data Science Interview Preparation Guide
Also see : Preparation Tips ( Link ) |
|
3. Data Science Interview Questions and Answers
Also see : Topic-wise Question and Answers ( Link ) |
|
4. Big Data Interview Questions and Answers
Also see : Reading Material on Deep Learning ( Link ) |
|
5. Tutorial Paper on Time Series Forecasting
Also see : Data and Code Link ( Link ) |
|
Lecture Notes:
Class notes (prepared using slides both in PPT and Latex) are taught ONLINE for IIFT MBA (IB) 2020-2022 students (3-Credit Course).
Class notes (prepared using slides both in PPT and Latex) are taught ONLINE for IIFT MBA (IB) 2020-2022 students (3-Credit Course).
Session 1 :
Topic: Introduction to Data Analytics |
|
Session 2 :
Topic: Descriptive Statistics & Probability Distributions |
|
Session 3 :
Topic: Sampling Distributions and Hypothesis Testing |
|
Session 4 :
Topic: Basics of R and RStudio |
|
Session 5 :
Topic: Hands-On Programming with RStudio |
|
Session 6 :
Topic: Analysis of Variance (ANOVA) |
|
Session 7 :
Topic: Correlation and Regression Analysis |
|
Session 8 :
Topic: Statistical Modelling with RStudio |
|
Session 9 :
Topic: Hands-On Statistical Modelling with RStudio |
|
Session 10 :
Topic: Logistic and Nonlinear Regression with RStudio |
|
Session 11 :
Topic: Pattern Classification and Bayesian Classifier |
|
Session 12 :
Topic: Similarity Measures and Sensitivity Analysis |
|
Session 13 :
Topic: Unsupervised Learning - Clustering Techniques |
|
Session 14 :
Topic: Supervised Learning - kNN and Decision Trees |
|
Session 15 :
Topic: Artificial Neural Networks and Deep Learning |
|
Session 16 :
Topic: Machine Learning and Deep Learning using Python |
|
Session 17 :
Topic: Hands-On Programming ML and DL with Python |
|
Session 18 :
Topic: Time Series Forecasting |
|
Session 19 :
Topic: Ensemble Method : XGBoost Algorithm |
|
Session 20 :
Topic: Unsupervised Learning using R |
|