MATH 3280 Data Mining
- Division: Natural Science and Math
- Department: Mathematics
- Credit/Time Requirement: Credit: 3; Lecture: 3; Lab: 0
- Prerequisites: Prerequisites: Math 3080 and (either Math 2270 or Math 2250) with a C or better in each course.
- Semesters Offered: Fall
- Semester Approved: Spring 2020
- Five-Year Review Semester: Summer 2025
- End Semester: Fall 2025
- Optimum Class Size: 20
- Maximum Class Size: 25
Course Description
Students will learn to efficiently find structures and patterns in large data sets. Topics will include acquiring data sets and cleaning messy and noisy raw data sets into structured and abstract forms; applying scalable and probabilistic algorithms to these well-structured abstract data sets; and, formally modeling and analyzing the error inherent in these methods. Students will consider data representations and trade-offs between accuracy and scalability.
Justification
Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course is a necessity for any data scientist. This course will support the bachelor’s in software engineering degree by providing relevant mathematics coursework.
Student Learning Outcomes
- Students will understand the basic data structures in Python (or similar software)
- Students will understand how to visualize, explain, and present data using Python (or similar software)
- Students will understand how to use web scraping tools, APIs, and other methods to acquire data.
- Students will understand how to clean, structure, and explore data using Python (or similar software)
Course Content
This course will include a survey of data acquisition and cleaning tools; similarity search, clustering, regression/dimensionality reduction, graph analysis, PageRank, and small space summaries; and, recent developments and the application of these topics to modern applications, often relating to large internet-based companies.
Key Performance Indicators: Student learning will be evaluated through:Homework assignments 20 to 40%Quizzes 0 to 20%Periodic examinations 20 to 30%Final exam 15 to 20%Oral/Written/Computer projects 10 to 30%Class group activities 0 to 15%Representative Text and/or Supplies: Grus, J. (current edition). Data science from scratch: first principles with Python. Sebastopol (CA): OReilly Media.A computer and data analytics software are required for this course. Python is recommended, but similar software (e.g., R, SAS, SPSS) may be used at the discretion of the instructor. The instructor may also find tools like Google’s Open Refine useful.Pedagogy Statement: John Dewey’s stated that “education should not revolve around the acquisition of a pre-determined set of skills, but rather the realization of one’s full potential and the ability to use those skills for the greater good.” Applying this idea to the pedagogy of this course, the teacher will help students learn both theory and application in a modern curriculum. By the end of the course, students should know how to use technology to apply specific skills and to analyze the results of their work.Instructional Mediums: LectureIVCHybrid