A new undergraduate certificate program in Statistics and Machine Learning was approved by the University at a faculty meeting last week. According to the 2013-14 Undergraduate Announcement, students seeking the certificate will be required to take five courses: one on foundations of statistics, one on foundations of machine learning and three electives.
The foundations of statistics requirement may be satisfied by any of the following: ECO 202: Statistics and Data Analysis for Economics; MOL 355: Introduction to Statistics for Biology; ORF 245: Fundamentals of Engineering Statistics; POL 345: Quantitative Analysis and Politics; or PSY 251: Quantitative Methods. The foundations of machine learning requirement may be satisfied by COS 424: Interacting with Data or ORF 350: Analysis of Big Data. Electives may be drawn from participating departments, including the above departments as well as mathematics, electrical engineering and geosciences.
Students will also be required to complete a thesis or independent project on a topic related to machine learning or statistics and participate in a culminating poster session, according to associate professor of politics Kosuke Imai, who will serve as director of the certificate program.
The certificate program will be under the supervision of an executive committee, consisting of associate professor of computer science David Blei, professor of computer science Robert Schapire, Operations Research and Financial Engineering professor Jianqing Fan and professor of molecular biology John Storey.
Blei noted that part of the impetus for the creation of the certificate program was the notable increase in student interest in statistics and machine learning. Storey explained that many students have approached him expressing interest in data sciences and seeking more formal training in the subject.
“We found that students were trying to navigate through our courses on their own a curriculum that teaches them about data,” Blei explained. “We wanted to start by creating a trajectory through courses at Princeton where, when you’re done, you are trained in the craft and science of learning from and using data to solve problems.”
Fan explained that students have realized the increasing importance of processing large data sets in many industries, most notably in the tech industry.
Additionally, several fields in the social sciences that were previously “less quantitative” are becoming more data-oriented, Imai explained. For example, in the fields of policy-making and education, tests involving randomized trials and data analysis are often required before new policies are implemented.
“In the social sciences, the ability to analyze data is a huge advantage for students,” Imai said.
Blei explained that the initial idea for an interdisciplinary statistics and machine learning program developed two years ago, after he and Storey organized a symposium at which Princeton students and faculty in statistics and machine learning discussed their research.

“There was just a lot of energy in the room about this area, particularly about the great people that are working on this at Princeton,” Blei said.
Although the attendees of the symposium were working in different disciplines, they were united by their use of statistical methods and large data sets, Storey explained. The symposium led the members of the executive committee to recognize the need for a centralized coordination in both teaching and research.
There are currently six introductory statistics courses offered to undergraduates. While the courses cover very similar material, Imai explained, the participating departments currently do not coordinate with one another.
“It turns out that there are a lot of statistics people around campus across different departments, but we didn’t have a place to get together or coordinate courses,” he said. Imai noted that the University used to have a statistics department, but it was eventually incorporated into the math department.
The executive committee has met three times since the symposium to coordinate the implementation of the certificate program in an effort to create a more centralized center of resources in statistics and machine learning, Imai explained.
However, Imai said that the creation of an undergraduate certificate program is only the first step in a long-term effort to create a center for statistics and machine learning. The center, which would eventually offer a Ph.D. program, is currently under consideration by the University.
While the program will rely on the University’s existing courses for the time being, Blei explained that one of the long-term goals of the program is to create new courses to fit within the certificate’s curriculum. The executive committee aims to consolidate the number of introductory-level statistics courses so that they can redirect resources toward the creation of more specialized and advanced courses.
“It’s really important that Princeton undergrads have the opportunity to become literate in data analysis, regardless of what your major is,” Storey said. “We hope that this certificate program helps to make that possible.”