This unit explores the statistical modelling foundations that underlie the analytic aspects of Data Science. It covers: • Data: collection and sampling, data quality. • Analytic tasks: statistical hypothesis testing, exploratory and confirmatory analysis. • Probability distributions: dependence and independence, multivariate Gaussian, Poisson, Dirichlet, random number generation and simulation of distributions, simulation of samples (bootstrap). • Predictive models: linear and logistic regression, and Bayesian classification. • Estimation: parameter and function estimation, maximum likelihood and minimum cost estimators, Monte Carlo estimators, inverse probabilities and Bayes theorem, bias versus variance and sample size effects, cross validation, estimation of model performance.
Minimum total expected workload to achieve the learning outcomes for this unit is 144 hours per semester typically comprising a mixture of scheduled online and face to face learning activities and independent study. Independent study may include associated reading and preparation for scheduled teaching activities.
Produce models for predictive statistical analysis;
Implement a model for data analysis through programming and scripting;
Perform fundamental random sampling, simulation and hypothesis testing for required scenarios;
Perform exploratory data analysis with descriptive statistics on given datasets;
Construct models for inferential statistical analysis;
Interpret results for a variety of models.
