BS6219 - Biostatistics & Data Analytics for Omics Applications

Summary of co​urse content

Statistics and data science has become indispensable since the era of omics (genomics, transcriptomics, proteomics, and metabolomics) at the start of this millennium. The omics field has been growing rapidly with many job opportunities created worldwide. However, there is a general lack in data analysis skillsets. This course is aimed at equipping students with basic concepts and know-how in statistics and data analytics to deal with high-throughput datasets in general, with specific focus on biological applications. The course is especially intended for students who are new to, or are interested in, statistics or data science. Geared towards beginners with little background knowledge, the course guides each student to understand the basic principles, learn the fundamentals and apply the relevant methods taught to real life datasets. It will do this through lectures covering cross-disciplinary content and giving students hands-on experience in performing basic statistics of large gene expression datasets. Students will be required to demonstrate understanding in various statistical approaches and why and when to use them.

 

Aims and objectives

By the end of this course, you should be able to:

1. Critically evaluate and analyze large datasets:

  • Look out for standard errors and variability
  • Understand and form statistical distributions
  • Understand linear & non-linear correlations, PCA, Noise, Clustering
  • Perform differential expression analyses
  • Investigate the kind of analytics to be used according to set goals
  • Identify the assumptions used to formulate solutions
  • Assess the validity of scientific analysis

2. Develop skills to communicate data and concepts used to scientific society

  • Display and explain scientific results clearly and persuasively to peers both verbally and in writing (includes the ability to graph data appropriately and accurately)
  • Demonstrate an understanding of the recursive nature of science, where new results continually modify previous knowledge
  • Demonstrate an understanding of the history of statistics and data science and its recent development in biology

3. Develop communication, creative and critical thinking skills for life-long learning

  • Learn independently and share knowledge with others
  • Demonstrate critical thinking skills such as analysis, discrimination, logical reasoning, prediction and transforming knowledge
  • Demonstrate good observation skills and a curiosity about the world

 

Syllabus

  • Introduction to Biostatistics and Data Analytics (current review in biological applications)
  • Basic Principles of Probability
  • Descriptive Statistics
  • Inferential Statistics
  • Preprocessing & Normalizations of omics data
  • Linear/nonlinear Correlations
  • PCA
  • Noise and Variability Analysis
  • Statistical Clustering I (k-means, hierarchical)
  • Statistical Clustering II (t-SNE)
  • Differential Gene Expression (DGE) Analysis
  • Gene Enrichment (GE) Analysis
  • Summary/Revision 

 

Assessment

Assignments
Individual50%
Written Final Exam
Individual50%
Total 100%