Introduction to Data Science: DATA 101

Syllabus

Official Course Description

Introduction to the use of computer based tools for the analysis of large data sets for the purpose of knowledge discovery. Students will learn to understand the Data Science process and the difference between deductive hypothesis-driven and inductive data-driven modeling. Students will have hands-on experience with various on-line analytical processing and data mining software and complete a project using real data.

Required Text

There is no required textbook to purchase for this course, but there will be a lot of required reading and online tutorials and guides.
Reference book: https://cran.r-project.org/doc/manuals/R-intro.pdf.
R basics in a data science context: R for Data Science.

Our Plan

Every DATA 101 course, I search online to find a suitable real-world problem for students to study. We match this with an internal competition/judging with local judges from industry. It is a lot of fun. Ideally, this would line up with the end of the semester, but alas, this almost never happens. Here is a breakdown of our general plan:

You won’t be graded on how well you do in the competition, but you will be graded on the progress you make every week. A big part of being a data scientist is communicating your methods, progress, and results to various stakeholders. To this end, each group will need to submit a report each week. We will be using RMarkdown for this purpose (http://rmarkdown.rstudio.com/)

After the end of the study, we will hold our own judging at the end of the semester with local judges from local tech companies.

If we are not focusing on the aforementioned real-world problem, we will revert to a more standard lecture+lab course setup. Targeted data science lessons on concepts discussed in class. The topics will vary from week to week. They are designed to build up your skills to accomplish the next two tasks. Each week will have a consistent schedule. The first class will be traditional lecture style with a heavy emphasis on interactive discussion, where I will go over the theory behind the algorithms and concepts. The second class will be mostly lab style. My goal is to be your guide as you gain experience being a data scientist. It is during this second class and out of the classroom that you will gain additional practical experience as a data scientist.

If you have laptops, please bring them to each class.

Course Details

Contact Information

  • Professor: Dr. Paul Anderson
  • Office: 313 HWEA
  • Office Hours: My preferred method of e-contact is the Facebook group as I can respond to questions there quickly and for everyone to benefit. If you would like to use e-mail I will endeavor to respond within 48 hours.
  • E-mail: andersonpe2@cofc.edu
  • Office Phone: 843-953-8151 (I never pick this up, but it does exist :)
  • Section 01 -
  • Section 02 -

Course (learning) outcomes

  • To gain an overview the field of knowledge discovery and data science
  • To be able to distinguish and translate between data, information, and knowledge
  • To apply algorithms for inductive and deductive reasoning
  • To learn introductory and state-of-the-art machine learning algorithms, including supervised and unsupervised learning, and clustering
  • To apply data mining, statistical inference, and machine learning algorithms to a variety of datasets, including text, image, biological, and health
  • To learn and apply a data science programming language (e.g., R, Python) to real world data
  • To apply artificial intelligence concepts to real world datasets
  • To understand the social, ethical, and legal issues of informatics and data science

Grading Policy

  • Project/Competition Reports - 40%
  • Exam - 20%
  • Homework - 10%
  • Programming Assignments - 30%

Grading Scale: A: 90-100; B: 80-89; C: 70-79; F: <70. Plusses will be used at the discretion of the instructor.

Grading Guidelines: Submitted work requires Analysis, Evaluation, and Creation of ideas, concepts, and materials into various deliverables (e.g., see revised Bloom’s Taxonomy and reference below).

  • The grade of A is for work that involves high-quality achievement in all three Bloom areas.
  • The grade of B is for work that involves high-quality achievement in at least two Bloom areas, and medium-level achievement in the other.
  • The grade of C is for work that involves high-quality achievement in at least one Bloom area, and medium-level achievement in the others.
  • The grade of F is for work that does not meet above criteria.

Reference: Errol Thompson, Andrew Luxton-Reilly, Jacqueline L. Whalley, Minjie Hu, and Phil Robbins. 2008. Bloom’s taxonomy for CS assessment. In Proceedings of the tenth conference on Australasian computing education - Volume 78 (ACE ‘08), Simon Hamilton and Margaret Hamilton (Eds.), Vol. 78. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 155-161.

Feedback will be given as quickly as possible with a goal of within a week of the assignment due date.

Homework Policy

Written homework must be placed under my office door by 5 PM on the due date. No late homework will be accepted. Cheating/sharing will result in a zero on the assignment and a report to the judicial board.

Programming Assignments

Most programming assignments will be submitted through the Learn2Mine environment. There will be a combination of in-class lab assignments, and out of class programming assignments.

Honor Code

Lying, cheating, attempted cheating, and plagiarism are violations of our Honor Code that, when identified, are investigated. Each instance is examined to determine the degree of deception involved.

Incidents where the professor believes the student’s actions are clearly related more to ignorance, miscommunication, or uncertainty, can be addressed by consultation with the student. We will craft a written resolution designed to help prevent the student from repeating the error in the future. The resolution, submitted by form and signed by both the professor and the student, is forwarded to the Dean of Students and remains on file.

Cases of suspected academic dishonesty will be reported directly to the Dean of Students. A student found responsible for academic dishonesty will receive a XF in the course, indicating failure of the course due to academic dishonesty. This grade will appear on the student’s transcript for two years after which the student may petition for the X to be expunged. The student may also be placed on disciplinary probation, suspended (temporary removal) or expelled (permanent removal) from the College by the Honor Board.

It is important for students to remember that unauthorized collaboration–working together without permission– is a form of cheating. Unless a professor specifies that students can work together on an assignment and/or test, no collaboration is permitted. Other forms of cheating include possessing or using an unauthorized study aid (such as a PDA), copying from another’s exam, fabricating data, and giving unauthorized assistance.

Remember, research conducted and/or papers written for other classes cannot be used in whole or in part for any assignment in this class without obtaining prior permission from the professor.

Students can find a complete version of the Honor Code and all related processes in the Student Handbook at http://www.cofc.edu/studentaffairs/general_info/studenthandbook.html.

Disability Accomodations

Any student who feels he or she may need an accommodation based on the impact of a disability should contact me individually to discuss your specific needs. Also, please contact the College of Charleston, Center for Disability Services http://www.cofc.edu/~cds/ for additional help.

Late Policy

No late days will be allowed.