MATH 3080 Foundations of Data Science
- Division: Natural Science and Math
- Department: Mathematics
- Credit/Time Requirement: Credit: 3; Lecture: 3; Lab: 0
- Prerequisites: Math 1210 and (either Math 2040 or Math 3040) with a C or better in each course
- Semesters Offered: Fall, Spring
- Semester Approved: Fall 2025
- Five-Year Review Semester: Summer 2030
- End Semester: Summer 2031
- Optimum Class Size: 20
- Maximum Class Size: 25
Course Description
Students will get an introduction to a programming language for data analysis, data analysis tools, and the necessary statistics to acquire, clean, analyze, explore, and visualize data using real-life data sets. Using statistics, students will learn to make data-driven inferences and decisions, and to communicate those results effectively. Students interested in this course that don't meet the necessary pre-requisites should consider taking Math 2080 instead.
Justification
Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course provides a subset of the tools necessary to leverage data for prediction. This course will support the bachelor's in software engineering degree by providing relevant mathematics coursework.
Student Learning Outcomes
- Students will be able acquire data through web-scraping and data APIs.
- Students will be able to clean and reshape messy datasets.
- Students will be able to employ techniques of data visualization using computational software, including interactive visualization.
- Students will be able to use statistical software to deploy statistical methods including generalized linear regression and classification.
- Students will be able to explain the mathematics behind the models addressed in the class.
- Students will be able to identify strengths and weaknesses inherit in models, understand the behavior of the model, and determine when certain models are appropriate to be used.
- Students will be able to discuss the ethical issues of modern data science.
Course Content
This course will include an introduction to data analysis tools in a data analysis programming language, descriptive statistics, data structures in the language chosen for the course, introductory hypothesis testing & statistical inference, data acquisition via web scraping and APIs, data visualization, generalized linear regression, classification methods such as logistic regression and Naïve-Bayes, and ethical principles of Data Science. The class will also look at the mathematical processes on which these models are based.
Key Performance Indicators: Student learning will be evaluated through: Representative Text and/or Supplies: McKinney, W. (current edition). Python for data analysis: Data wrangling with pandas, NumPy, and IPython. Sebastopol, CA: O'Reilly Media.Bruce, Peter, et. al., (current edition). Practical Statistics for Data Scientists. Sebastopol, CA: O'Reilly Media.A computer and statistical software are required for this course. Free software such as Python or R are recommended, but subscription software (e.g., SAS, SPSS) may be used at the discretion of the instructor.Pedagogy Statement: Instructional Mediums: LectureHybrid