Lab 1 - CSC 466

Preface

The labs and all material throughout this course will have two complementary components.

  1. KDD Method Implementation - write original code that performs the entire or part of an algorithm. You will be asked to submit code and test code or output when appropriate.
  2. Insight, a.k.a., data analysis - We will provide datasets for analysis in each lab. Sometimes the objectives will be specified and other times you will be asked to discuss and attack your own objectives.

Provide data driven insight as well as implementations (code) is probably new to a lot of students. It is very important to realize that this is just as important for the learning objectives of this course as writing the best code. To say it bluntly, providing the best code for a lab but no insight and data analysis will most likely result in a 50%.

Mechanics

  • This is a pair programming assignment. You may pick your own partner for this lab. Working in a pair programming environment requires joint work on all aspects of the project without delegation. It is not divide and conquer. If there is an odd number, there may be a team of three, but they are expected to go above and beyond the normal level of work for the lab (see me for more information). My goal is to provide an environment for discussion, so I cannot say this enough times, but do not delegate work.

Bayesian Classifier

In this lab you will:

  • Implement a naive bayes classifier in Python. You must implement this from scratch using standard libraries such as NumPy and Pandas.
  • Apply your algorithm to a dataset and evaluate the results.
  • Provide a rudimentary study of the variable importance.
  • https://www.kaggle.com/kaggle/us-baby-names#NationalNames.csv