NICO 101-0 Introduction to Programming for Big Data

NICO 101-0

Fall 2017 - Professors Luis Amaral and Adam Pah

Lectures: September 6-8 and 12-15 from 9:30am-12:00pm & 1:30pm-4:30pm in L361.

Prerequisites: None

Overview: Our digital, connected, sensor rich world is generating extraordinary amounts of data (“Big Data”) that are being used to purposes as diverse as teaching a computer to win at Jeopardy or offering taxi alternatives. The skills needed to go from data to knowledge and application, which go under the name of Data Science, are in big demand in industry, government, and academia. This course provides an introduction to the foundational skills needed by data scientists. Prior knowledge of programming is not needed.

Restrictions: Intended primarily for undergraduate students. Other students must contact the instructor. Students will need an up-to-date laptop running Linux, OS X, or Windows 7 or higher. Chromebooks will not be permitted. Prior to the start of the course, students must install several packages and verify that they run properly in their machine. Texts: Lecture materials are available online.

Requirements: There will be about 6 homework assignments involving the writing of Python code for solving specific problems. Students’ solutions will be uploaded to a server where they will be unit tested. All students will be expected to attend lectures and complete in class assignments.

TOPICS:

  • Examples of problems amenable to computation
  • Overview of computer hardware & different filesystems
  • The Zen of Python: Code style & commenting
  • Using IPython notebook
  • Basic Python data types: Integers, floats, strings, & lists
  • Flow control: Loops, conditionals, exceptions
  • Input & output
  • Functions & code modularity
  • The Python standard library: string, math, sys, & so on
  • Sophisticated data types: tuples, sets, & dictionaries
  • Data visualization using matplotlib
  • Numerical computing using numpy & scipy
  • Example: Image processing using numpy
  • Retrieving data from the web using requests & splinter
  • Text analysis & intro to regular expressions
  • Example: Computing with Shakespeare
  • Computing with dates & times
  • Analyzing tabular data using pandas
  • Example: Time series analysis of stock prices
  • Numerical precision & algorithm scaling
  • Statistical analysis with statsmodels
  • Finding other resources

Enrollment begins May 15, 2017. Visit CAESAR to register for the course.

To learn more about Data Science at Northwestern, visit the Northwestern Data Science Initiative.