CSI 4352, Introduction to Data Mining

Fall 2017


Class Hours and Classroon: MWF 9:05am - 9:55am @ Cashion 319

Professor: Dr. Young-Rae Cho    (email: Young-Rae_Cho@baylor.edu)
       Office: Cashion 304
       Office Hours: Mon 2:30pm - 4:00pm

TA: Collin Rapp    (email: Collin_Rapp@baylor.edu)

Course Web Page: web.ecs.baylor.edu/faculty/cho/4352/


Description
Introduction to the concepts, techniques and applications of data warehousing and data mining. Topics include (1) design and implementation of data warehouse and OLAP operations; (2) data mining concepts and methods such as association rule mining, pattern mining, classification and clustering; and (3) applications of data mining techniques to complex types of data in various fields.

Objectives
  • To understand the basic concepts and techniques of Data Mining.
  • To develop computational skills of implementing data mining algorithms to solve practical problems.
  • To gain experience of performing a research project on Data Mining.
Textbook
  • Data Mining: Concepts and Techniques, 3rd Edition, by Jiawei Han, et al., Morgan Kaufmann
Prerequisites
  • CSI 3335, Database Design & Application
  • CSI 3344, Introduction to Algorithms
  • Proficiency in Java or Python programming
Assignments
  • 7 programming assignments
  • Implementing data mining algorithms and analysing results
  • Submission of results and source codes via CANVAS
Exams
  • 5 exams in class
  • Closed-book, short-answer questions
Grading (Undergraduate Students)
  • Assignments: 47%   (5% for the first assignment, 7% for the other assignments)
  • Exams: 53%   (9% for the first exam, 11% for the other exams)
  • Extra credit: 1% (given by the proof of online course evaluation submission)
  • Grading scale: A (93% ~), A- (90% ~), B+ (87% ~), B (83% ~), B- (80% ~), C+ (77% ~), C (73% ~), C- (70% ~), D+ (67% ~), D (63% ~), D- (60% ~), F (~ 60%)
Grading (Graduate Students)
  • Assignments: 42%   (6% each)
  • Exams: 50%   (10% each)
  • Project: 8% (submission of a final report by email attachment - due 12/6)
  • Extra credit: 1% (given by the proof of online course evaluation submission)
  • Grading scale: Same to undergraduate students
Policies
  • According to the university policy, absences more than 7 times will cause getting an F as the final grade of the course no matter what scores are obtained in exams and assignments.
  • Programming assignment is due at 11:59pm on the specified date. Late submission of programming assignments will receive the penalty of -10% per day including Saturday and Sunday.
  • Discussions on programming assignments are allowed, but all programming assignments must be independent work. Any forms of cheating on the programming assignments, project and exams will cause a penalty of getting an F as the final grade to the university regulation guidelines.
Topics & Tentative Schedule
     
Week
Topic
Reading
Exam
1
 Introduction
Chapter 1
 
2
 Frequent Pattern Mining
Chapter 6
 
3
Chapter 7
Exam 1 (9/8)
4
 Clustering
Chapter 10
 
5
Chapter 10
 
6
 Classification
Chapter 8
Exam 2 (9/29)
7
Chapter 9
 
8
 Graph Data Mining
Chapter 11
 
9
Chapter 11
Exam 3 (10/20)
10
 Sequence Data Mining
Chapter 13
 
11
 Data Warehouse and OLAP Operations
Chapter 4
 
12
Chapter 4
Exam 4 (11/10)
13
 Data Cube Computation
Chapter 5
 
14
 Data Preprocessing
Chapter 2
 
15
Chapter 3
Exam 5 (12/4)