DATA 301 Summer 2021

Table of contents

  1. Syllabus
  2. Schedule


Professor Information

Paul Anderson, PhD


Office: 222 Building 14

Office hours:

  • By appointment (contact me via Slack or e-mail)


Most of the course content is on Canvas. I keep the syllabus and schedule publicly available.

Course Prerequisites and Goals

Data science is often described as the intersection of computer science and statistics. While data science is much more than this, it is true that you need to know both computer science and statistics to be a successful data scientist. We assume that everyone has taken at least STAT 302/312 and CPE 102.

This class is taught in Python, which for most of you was the language that was used in CPE 101. We will not go over basic Python syntax in this class. For example, I will assume that you know how lists and dicts are used in Python. If you do not know Python or need a refresher, please go through the Codecademy course.

The type of programming you will be doing in this class is likely very different from the programming you do in CS classes. Rather than writing long, complex programs and then testing them, you will constantly iterate between writing code and running it to see what it does.

This course is not introduction to machine learning or artificial intelligence. We have other classes in this sequence that cover those topics. This course will treat machine learning as mostly a black box activity. Instead the focus is on data manipulation, summarization, processing, pipelining, analytics, etc.

Course Learning Objectives

  1. Identify different types of data used in data science and know their properties

  2. Write programs implementing key data manipulation, management, and analytic tasks

  3. Choose techniques to solve common data analytic tasks, and perform appropriate (simple) data analyses with given data

  4. Visualize, demonstrate and explain the results of data analyses to customers/data owners

Textbook and Other Material

There is no textbook to buy print textbook for this course. We will be using the Principles of Data Science by Dennis Sun which is freely available online. It was designed with DATA 301 in mind, so it is a perfect choice for us. At times, I will supplement with other material.


All grades are subject to being prorated due to illness. For example, if you are sick for a week, you will not have to make up that week. Your grade will just be computed out of fewer total points. You are responsible for making sure that this is correctly reflected in Canvas. Canvas is our official record of grades. Missing weeks will be indicated not with a 0, but with a -1. Your final grade calculations are not correct in Canvas. I always calculate the final grades in Excel which allows me to handle absences because of illness, etc. Canvas is our source record.

Grading philosophy: Mastery learning

  • I approach grading as student focused. I don’t believe students should be penalized if it takes them two weeks to do an assignment when other students can finish it in a week.
  • I enjoy a challenge, and I want you to enjoy a challenge as well. I want students to say. “Wow. This is challenging and that is what makes it fun”.
  • There is this notion that great scientists/mathematicians/human beings are born as great in those respects. Not true.
  • A lot of my own thinking is driven by my own experiences. The education system almost completely missed for me. I did not focus or engage with school until fortune intervened on my behalf. I moved from Maryland to Ohio in my junior year of HS. It was only because of a conversation while onboarding at that new school. They asked me to select my own classes for the first time in my life. I had the power because I was a transfer student. I thought what the hell, I’ll jump into all these honors and AP classes. And they were hard. But C’s in hard classes became B’s in hard classes became A’s in college. It wasn’t that my brain wasn’t as good as other students. I just hadn’t exercised it as much as some of my peers. We can all grow and improve. I still try to grow in the same way as before. To push myself and not just choose things that are easy for me. Choose the path that is hard. Spend time on what interests you.

Labs and assignments: 50% (5 labs in total)

  • The primary form of labs and assignments will be programming exercises with open ended questions throughout.

  • Unless specified otherwise, labs and assignments will be submitted to GitHub classroom. A link to each lab and assignment will be available on Canvas.
  • Submitting late will not affect your grade, but falling behind will make it harder to catch up at the end AND do well on the project.
  • Mastery interpretation: You are attempting to master topic modules. Some may be harder than others based on your background. I am not taking lab points away from you. You are pushing your grade up from 0% on the labs to a max of 50%. This part of the class is like going to the gym. It is at times not as exciting as a project, but it is here where we build your core.

Project: 30%

  • All projects are student centered and student driven. I am not assigning or pushing structure upon you. Mastery is only achieved on the project by taking ownership of your learning through knowledge creation.
  • You are to select an ongoing competition on Kaggle and compete. You will be graded based on your weekly summaries of progress (10%/30%) and on your final report (20%/30%)
  • Your weekly summaries will be done via video on Links will be posted in Canvas.

Participation: 20%

  • We will be conducting this class using a variety of technologies (Slack, JupyterHub, etc). It is important you contribute to the class on these platforms.
  • The biggest way to participate in this class is to show up to class, and to measure this I am requiring you to submit your answers to the questions and exercises completed in class online through Canvas. We will develop answers for these in class, so they are graded for participation. I will set the due dates in Canvas, and we will stick to those :)
  • A typical Chapter will be worth 3 participation points. You will get 3/3 for good participation, etc.

Overall, we are trying to encourage a growth mindset. Take time to master things. Work on improving what you aren’t good at. Maybe that is communication. Maybe that is programming. Maybe it is working with creating new knowledge (i.e., project).

Grading Scale

Grading Scale:

  • A 100% to 93%
  • A- < 93% to 90%
  • B+ < 90% to 87%
  • B < 87% to 83%
  • B- < 83% to 80%
  • C+ < 80% to 77%
  • C < 77% to 73%
  • C- < 73% to 70%
  • D+ < 70% to 67%
  • D < 67% to 63%
  • D- < 63% to 60%
  • F < 60% to 0%

Honor Code

Lying, cheating, attempted cheating, and plagiarism are violations of our Honor Code that, when identified, are investigated. Each instance is examined to determine the degree of deception involved.

Incidents where the professor believes the student’s actions are clearly related more to ignorance, miscommunication, or uncertainty, can be addressed by consultation with the student. We will craft a written resolution designed to help prevent the student from repeating the error in the future. The resolution, submitted by form and signed by both the professor and the student, is forwarded to the Dean of Students and remains on file.

Cases of suspected academic dishonesty will be reported directly to the Dean of Students. A student found responsible for academic dishonesty will receive a XF in the course, indicating failure of the course due to academic dishonesty. This grade will appear on the student’s transcript for two years after which the student may petition for the X to be expunged. The student may also be placed on disciplinary probation, suspended (temporary removal) or expelled (permanent removal) from the College by the Honor Board.

It is important for students to remember that unauthorized collaboration–working together without permission– is a form of cheating. Unless a professor specifies that students can work together on an assignment and/or test, no collaboration is permitted. Other forms of cheating include possessing or using an unauthorized study aid (such as a PDA), copying from another’s exam, fabricating data, and giving unauthorized assistance.

Remember, research conducted and/or papers written for other classes cannot be used in whole or in part for any assignment in this class without obtaining prior permission from the professor.

Diversity Statement (Cal Poly official statement)

At Cal Poly we believe that academic freedom, a cornerstone value, is exercised best when there is understanding and respect for our diversity of experiences, identities, and world views. Consequently, we create learning environments that allow for meaningful development of self-awareness, knowledge, and skills alongside attention to others who may have experiences, worldviews, and values that are different from our own. In so doing, we encourage our students, faculty, and staff to seek out opportunities to engage with others who are both similar and different from them, thereby increasing their capacity for knowledge, empathy, and conscious participation in local and global communities.

In the spirit of educational equity, and in acknowledgement of the significant ways in which a university education can transform the lives of individuals and communities, we strive to increase the diversity at Cal Poly. As an institution that serves the state of California within a global context, we support the recruitment, retention, and success of talented students, faculty, and staff from across all societies, including people who are from historically and societally marginalized and underrepresented groups.

Cal Poly is an inclusive community that embraces differences in people and thoughts. By being open to new ideas and showing respect for diverse points of view, we support a climate that allows all students, faculty, and staff to feel valued, which in turn facilitates the recruitment and retention of a diverse campus population. We are a culturally invested university whose members take personal responsibility for fostering excellence in our own and others’ endeavors. To this end, we support an increased awareness and understanding of how one’s own identity facets (such as race, ethnicity, gender, sexual orientation, religion, age, disability, social class, and nation of origin) and the combinations of these identities and experiences that may accompany them can affect our different worldviews.

Disability Accommodations

Any student who feels he or she may need an accommodation based on the impact of a disability should contact me individually to discuss your specific needs. Also, please contact the Disability Resource Center: