About this Course

This course is designed to build a strong foundation for a career in using data to create smart information systems over the course of 4 weeks of classroom instruction (over three times the contact hours of a typical university course) and 2 weeks of project work. The instructional philosophy of this course is practice driven. Material is motivated through applications, understanding is solidified through hands on implementation and exercises, and clarity is gained through linking intuition to theory.

Objectives

By the end of the course, students would have:
1. Learnt Tableau at the level of the Tableau Desktop Qualified Associate
2. Gained a technical foundation for analytics and computational modelling that enables one to understand existing methods and underpins the synthesis needed to create new ones
3. Obtained a solid introduction to the methods of unsupervised/supervised/reinforcement learning and analytics including neural networks, ensemble methods, online optimization, optimization modelling, simulation, and more.
4. Built data pipelines and machine learning services using tools like Spark, Kafka and others. This course is aligned with the new Skills Framework for Infocomm Technology, rolled out by the IMDA following 18 months of consultation with industry. The course is designed to build the Technical Skills & Competencies needed to succeed as a Data Analyst, and chart a course towards gaining the skills sets demanded of a Data Scientist.

(The programming language of instruction will be Python and students will be assumed have fluency in the basic syntax.)

Phase 1: Foundations (1 Week)

The course begins with the foundations of data aggregation (parallelism and filter/map/reduce/etc. operations) and visualization in Tableau. Students will master the features of Tableau ranging from drag-and-drop aggregation and visualization, to creating interactive visualizations, to cross-sheet interactions, to table calculations and LOD expressions, to linking Tableau to Python.

A foundation in numerical linear algebra and classical statistical inference will then be built to underpin subsequent content in machine learning and analytics. This will include: linear systems, linear operators, matrix decomposition, hypothesis testing, maximum likelihood and mathematical optimization. Students will gain intuition on the relevant mechanics through use of libraries like numpy and scipy.

Phase 2: Broad Methodology Exposure (2 Weeks)

Students will subsequently be guided along a grand tour of analytics and machine learning methodologies, framed in the context of the problem domains that usage may arise in. They will learn methods in unsupervised learning, supervised learning, mathematical optimization, simulation, Bayesian modelling and inference, online optimization/learning. Special attention will be given to the increasingly important areas of reinforcement learning and neural networks.

Coverage will include:
● Descriptive/Predictive/Prescriptive Analytics:
    ○ Optimization and Gradient Methods
    ○ Mixed Integer Linear Optimization Modelling
    ○ Linear Regression, Logistic Regression and other General Linear Models
    ○ Simulation and Importance Sampling
● Unsupervised Learning:
    ○ Clustering
    ○ Principal Component Analysis (PCA)
    ○ Singular Value Decomposition (SVD)
● Supervised Learning:
    ○ Support Vector Machines
    ○ Decision Trees and Random Forests
    ○ Ensemble Methods (and Bagging & Boosting)
    ○ Bayesian Models and Gibbs Sampling
    ○ Neural Networks (via Keras atop Tensorflow)
● Reinforcement Learning:
    ○ Adaptive A/B Testing (Bandit methodologies)
    ○ From Value Iteration to Q-learning and Temporal Difference Learning
● Other Topics:
    ○ Spatial Data Structures
    ○ Collaborative Filtering
    ○ Cross Validation and Parameter Tuning

Phase 3: Building Data Solutions (3 Weeks)

This phase begins with instruction on building APIs and doing data engineering to support the deployment of machine learning services. Students will learn to create web-servers and about data architectures and the infrastructure needed to support them. Students will make use of tools like Kafka, Spark, Spark Streaming and Tensorflow.

Students will be grouped into teams of 2 to 5 to work on a project. Projects will either be proposed by students or by industry partners.

Project work be supplemented by interleaved instruction in the various aspects of software project management. Students will be exposed to issues ranging from agile development, to project planning and control, to stakeholder management, to reliability/availability/maintainability/safety, to life cycle management, to programme management framed in the lean start-up context.

This phase will culminate in a Demo Day where students will present their work to industry partners from the venture capital and software engineering communities.