Data Mining

Syllabus

Course Description

A course covering data mining concepts, methodologies, and programming. Topics include decision tables and trees, classification and association rules, clustering, pattern analysis, and linear and statistical modeling. Additional topics may include data cleaning and warehousing and techniques for text and web mining.

Required Text

Machine Learning: An Algorithmic Perspective by Stephen Marsland (2nd edition). Required. Link

Prerequisites

CSCI 221, MATH 207, MATH 250

Contact Information

Professor: Dr. Paul Anderson
Office: 313 Harbor Walk East (but I’m in my lab most of the time)
Office Hours: Monday, Wednesday, Friday from 12:30 to 1:30.
Office Hour Policy: Drop by anytime. I’ll let you know if I can’t talk, but most of the time I can.
E-mail: andersonpe2@cofc.edu
Phone: 843-953-8151

Course Times

MWF: 8:30 - 9:20 AM in Harbor Walk East 301

Learning Outcomes

  1. Know the meaning of data mining, some of the application areas and disciplines that use data mining, and understand some of the current major challenges in data mining.
  2. Recognize that data mining is part of a larger process and be able to describe the various stages of that process.
  3. Understand the need for and techniques for carrying out data cleaning and other data pre-processing activities and to apply them to real-world data sets.
  4. Understand and apply a wide range of the fundamental classification and prediction algorithms, including algorithms for decision trees and rule-based classifiers, Bayes classification methods, and other classification approaches such as logistic regression, k-nearest neighbor, and neural networks.
  5. Examine and apply metrics for classifier performance and selection.
  6. Examine and apply metrics for association pattern evaluation.
  7. Understand and apply several clustering algorithms including k-means clustering and BIRCH clustering.
  8. Examine and apply metrics for cluster evaluation such as clustering tendency, number of clusters, and clustering quality.
  9. Examine and apply metrics for attribute selection.
  10. Recognize some of the current data mining trends and research frontiers.
  11. Explore the use of data mining techniques on different datasets using software packages.

Grading Policy

Exams - 40%
Written Homework - 20%
Programming Assignments - 20%
Literature Questions - 10%
Final Project - 10%

Grading Scale

A: 90-100; B: 80-89; C: 70-79; D: 65-69; F: <65. Plusses and minuses will be used at the discretion of the instructor.

Grading Guidelines

Submitted work requires Analysis, Evaluation, and Creation of ideas, concepts, and materials into various deliverables (e.g., see revised Bloom’s Taxonomy (http://www.nwlink.com/~donclark/hrd/bloom.html) and reference below).

The grade of A is for work that involves high-quality achievement in all three Bloom areas.
The grade of B is for work that involves high-quality achievement in at least two Bloom areas, and mediumlevel achievement in the other.
The grade of C is for work that involves high-quality achievement in at least one Bloom area, and mediumlevel achievement in the others.
The grade of F is for work that does not meet above criteria.

Reference: Errol Thompson, Andrew Luxton-Reilly, Jacqueline L. Whalley, Minjie Hu, and Phil Robbins. 2008. Bloom’s taxonomy for CS assessment. In Proceedings of the tenth conference on Australasian computing education

  • Volume 78 (ACE ‘08), Simon Hamilton and Margaret Hamilton (Eds.), Vol. 78. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 155-161.

Policies

Exam Policy: Student performance will be accessed through weekly exams. Each exam will cover the material from the previous week. They will last no more than 15 minutes. No makeup exams will be allowed. Your lowest exam grade will be dropped.

Homework Policy: Homework will be assigned each week and submitted on OAKS. Homework will be graded pass or fail and are designed to help you study for the exams. For the majority of the classes, the worksheets will double as assigned homework for you to turn in before the following exam.

Programming Assignments There will be a variety of programming assignments throughout the semester. Most of these assignments will focus on implementing the algorithms discussed in class.

Programming Project You will have a final practical programming project where you will be free to use any available Python library and produce a full data mining write-up.

Your write-up must have the following headings and you must answer the following:
a.) What were the steps you carried out to arrive at your answer? i.e., describe your process.
b.) Please provide documentation of the correctness of your answer (screenshots, code examples, etc).
c.) Are there any portions of your answer that are not completed, wrong, or otherwise a problem? </b>

Literature Questions Throughout the semester we will be digging into a modern seminal machine learning/data mining research paper every Friday. This will begin after the quiz with a discussion led by me about a segment of the paper. I’ll then give you a question/prompt that builds upon this discussion. You’ll have the rest of the class to discuss in a group and then each student will have to type up a response to the question/prompt and submit it to OAKS. Your individual submissions may overlap some with your group mates, but it should be augmented with your own perspective.

Honor Code: Lying, cheating, attempted cheating, and plagiarism are violations of our Honor Code that, when identified, are investigated. Each instance is examined to determine the degree of deception involved.

Incidents where the professor believes the student’s actions are clearly related more to ignorance, miscommunication, or uncertainty, can be addressed by consultation with the student. We will craft a written resolution designed to help prevent the student from repeating the error in the future. The resolution, submitted by form and signed by both the professor and the student, is forwarded to the Dean of Students and remains on file.

Cases of suspected academic dishonesty will be reported directly to the Dean of Students. A student found responsible for academic dishonesty will receive a XF in the course, indicating failure of the course due to academic dishonesty. This grade will appear on the student’s transcript for two years after which the student may petition for the X to be expunged. The student may also be placed on disciplinary probation, suspended (temporary removal) or expelled (permanent removal) from the College by the Honor Board.

It is important for students to remember that unauthorized collaboration–working together without permission– is a form of cheating. Unless a professor specifies that students can work together on an assignment and/or test, no collaboration is permitted. Other forms of cheating include possessing or using an unauthorized study aid (such as a PDA), copying from another’s exam, fabricating data, and giving unauthorized assistance.

Remember, research conducted and/or papers written for other classes cannot be used in whole or in part for any assignment in this class without obtaining prior permission from the professor.

Students can find a complete version of the Honor Code and all related processes in the Student Handbook at http://www.cofc.edu/studentaffairs/general_info/studenthandbook.html.

Classroom Policies:

  • The beginning of the class will be devoted to introducing new material. It has been shown in numerous studies that even having a laptop open during this type of instruction negatively affects not only your learning but those around you. Because of this, I recommend you put your laptop away for this portion. I will do my best to record all lectures and post them in a timely fashion (sorry. the room we are in doesn’t support this).
  • You are expected to take good notes during class.
  • You are expected to participate in class with questions and invited discussion.
  • You are expected to attend all classes. The grade ‘WA’ will be given for excessive (>= 3) absences. If you miss class, you must get an absence memo from the Associate Dean of Students Office (http://studentaffairs.cofc.edu/general_info/absence/) ; also, you are responsible for announcements made in class, assignment due dates, etc.
  • In summary, you should contribute positively to the classroom learning experience, and respect your classmates right to learn (see College of Charleston Student Handbook (http://studentaffairs.cofc.edu/honorsystem/ studenthandbook/index.php) , section on Classroom Code of Conduct (p. 58)).

Late Policy: No late days will be allowed without an excuse. This course is an upper level course, and it will move very fast. Falling behind on assignments will make it difficult to achieve the learning outcomes of this course.

Final Exam: No final exam.

Disability Accommodations Any student who feels he or she may need an accommodation based on the impact of a disability should contact me individually to discuss your specific needs. Also, please contact the College of Charleston, Center for Disability Services http://www.cofc.edu/~cds/ for additional help.

Facebook Group: Everyone must join the Facebook group for class discussions: https://www.facebook.com/groups/212428835983837/. If you do not join, you might miss important announcements and discussions.