Data Science: Sound Foundations

This course is designed to build a strong foundation for a career in using data to create smart information systems. The course comprises 4 weeks of classroom instruction (about three times the contact hours of a typical university course) and 2 weeks of supported project work.

Attendees will be guided along analytics and machine learning methodologies, framed in the context of the relevant problem domains. They will learn methods ranging from supervised learning, to unsupervised learning, to mathematical optimization.

The course will culminate in a capstone project where attendees will conceptualize and develop a data-driven web service, building the foundations for a career in data science.

By the end of the course, attendees would have:
1. Gained a technical foundation for analytics and computational modelling that enables one to understand existing methods and underpins the synthesis needed to create new ones
2. Obtained a solid introduction to the methods of data science including neural networks, ensemble methods, optimization modelling, and more
3. Gained fluency in the use of Python for data science tasks
4. Built a data-driven web service
5. Learnt Tableau at the level of the Tableau Desktop Qualified Associate

The instructional philosophy of this course is practice-driven. Material is motivated through applications, understanding is solidified through hands on implementation and exercises, and clarity is gained through linking intuition to theory.

This course is aligned with the new Skills Framework for Infocomm Technology, rolled out by the IMDA following 18 months of consultation with industry. The course is designed to build the Technical Skills & Competencies needed to succeed as a Data Analyst, and chart a course towards gaining the skills sets demanded of a Data Scientist.

(The programming language of instruction will be Python. Attendees will be assumed have knowledge of the basic syntax, but we will build on that gradually to the point of fluency.)

Broad Methodology Exposure

The course begins with an introduction to the machine learning world with the classical tools of supervised/unsupervised learning and data wrangling, culminating in a look at the world of recommendation systems.

Subsequently, we will explore the foundations of business intelligence via data aggregation and visualization. Attendees will master the features of Tableau, the leading business intelligence and visualization tool on the market, ranging from drag-and-drop aggregation and visualization, to creating interactive visualizations, to cross-sheet interactions, to table calculations and LOD expressions. Attendees will learn to build on Tableau using Python, an advantage in the competitive analytics space.

Next, we will look into the question of scaling data aggregation and machine learning using Dask, which supports efficient distributed computations and is competitive with Spark (while staying within the Python ecosystem). Following that we will dive into the exciting world of neural networks and experiment with applications in computer vision, text mining and more.

In addition, attendees will be exposed to mathematical optimization, an unsung hero of modern business. Attendees will learn how to formulate constrained decision problems as optimization models and use powerful solvers to obtain solutions. Methodological coverage will include:
● Supervised Learning:
  ○ Linear Regression, Logistic Regression and other General Linear Models
  ○ Support Vector Machines
  ○ Decision Trees and Random Forests
  ○ Neural Networks
● Unsupervised Learning:
  ○ Clustering
  ○ Principal Component Analysis (PCA) & Singular Value Decomposition (SVD)
  ○ Association Rule Learning
● Other Topics:
  ○ Mixed Integer Linear Optimization Modelling
  ○ Collaborative Filtering
  ○ Cross Validation and Parameter Tuning

Capstone Project: Building Data-Driven Solutions

A practical course in data science would not be complete without attendees being able to roll solutions out. Attendees will build web services to support the deployment of machine learning. Attendees will work on a project individually or in teams. Projects may be proposed by attendees or by industry partners.

Technical preparation for project work will include additional instruction on Python, data scraping, HTTP & HTML, web services, databases, and infrastructure-as-a-service (IaaS).

In addition, attendees will be exposed to project management, covering issues ranging from agile development, to project planning and control, to stakeholder management, to RAMS (Reliability, Availability, Maintainability, and Safety), to life cycle management, to programme management framed in the lean start-up context.

The course will culminate in a Demo Day where attendees will present their work to industry partners from the venture capital and software engineering communities.