Skip to content

Course Syllabus

Course: MATH 3280

Division: Natural Science and Math
Department: Mathematics
Title: Data Mining

Semester Approved: Spring 2020
Five-Year Review Semester: Summer 2025
End Semester: Fall 2025

Catalog Description: Students will learn to efficiently find structures and patterns in large data sets. Topics will include acquiring data sets and cleaning messy and noisy raw data sets into structured and abstract forms; applying scalable and probabilistic algorithms to these well-structured abstract data sets; and, formally modeling and analyzing the error inherent in these methods. Students will consider data representations and trade-offs between accuracy and scalability.

Semesters Offered: Fall
Credit/Time Requirement: Credit: 3; Lecture: 3; Lab: 0

Prerequisites: Prerequisites: Math 3080 and (either Math 2270 or Math 2250) with a C or better in each course.

Justification: Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course is a necessity for any data scientist. This course will support the bachelor’s in software engineering degree by providing relevant mathematics coursework.


Student Learning Outcomes:
Students will understand the basic data structures in Python (or similar software) Students will be assessed through assignments, quizzes, exams and/or class discussion – instructor will provide feedback.

Students will understand how to visualize, explain, and present data using Python (or similar software) Students will be assessed through assignments, quizzes, exams and/or class discussion, and projects – instructor will provide feedback.

Students will understand how to use web scraping tools, APIs, and other methods to acquire data. Students will be assessed through assignments, class projects, quizzes, exams and/or class discussion – instructor will provide feedback.

Students will understand how to clean, structure, and explore data using Python (or similar software) Students will be assessed through assignments, quizzes, exams and/or class discussion – instructor will provide feedback.


Content:
This course will include a survey of data acquisition and cleaning tools; similarity search, clustering, regression/dimensionality reduction, graph analysis, PageRank, and small space summaries; and, recent developments and the application of these topics to modern applications, often relating to large internet-based companies.

Key Performance Indicators:
Student learning will be evaluated through:

Homework assignments 20 to 40%

Quizzes 0 to 20%

Periodic examinations 20 to 30%

Final exam 15 to 20%

Oral/Written/Computer projects 10 to 30%

Class group activities 0 to 15%


Representative Text and/or Supplies:
Grus, J. (current edition). Data science from scratch: first principles with Python. Sebastopol (CA): OReilly Media.

A computer and data analytics software are required for this course. Python is recommended, but similar software (e.g., R, SAS, SPSS) may be used at the discretion of the instructor. The instructor may also find tools like Google’s Open Refine useful.


Pedagogy Statement:
John Dewey’s stated that “education should not revolve around the acquisition of a pre-determined set of skills, but rather the realization of one’s full potential and the ability to use those skills for the greater good.” Applying this idea to the pedagogy of this course, the teacher will help students learn both theory and application in a modern curriculum. By the end of the course, students should know how to use technology to apply specific skills and to analyze the results of their work.

Instructional Mediums:
Lecture

IVC

Hybrid

Maximum Class Size: 25
Optimum Class Size: 20