‘Building a Web Scraper’ Will Help Students Learn How to Collect Data Sets from the Internet
By Caroline Murray
Columbia, Mo. (Sept. 30, 2013) – Three Missouri School of Journalism alumni – all experts in advanced data journalism – will help current students learn these skills at an Oct. 10-13 workshop on the University of Missouri campus
The “Building a Web Scraper” seminar will teach students how to write code in a way that enables them to collect data sets from the Web in organized ways.
Data journalism allows reporters to sift through potentially billions of records to analyze trends and patterns. Stories might share information about clusters of key words in U.S. government reports, hospital billing records, electrical rate comparisons between cities of the same size, patterns of unsolved murders and more.
The leaders for the four-day conference work in the field of data journalism. Chase Davis, BJ ’06, is a member of the New York Times data team. Jackie Kazil, BJ ’05, MA ’08, is a White House Presidential Innovation Fellow and was formerly an app developer for the Washington Post. Matt Wynn, BJ ’07, is a Django developer and investigative reporter at the Omaha (Neb.) World-Herald.
“There is an immense amount of data that’s available easily through the Web,” said Mike Jenner, professor and the Houston Harte Chair in Journalism. “There are public sets of data, but you have to know what to do with them. These tools can help journalists capture and analyze data and present it in a way people can understand.”
The course will be taught using Python, but the concepts will be applicable to any programming language. Topics such as how to write and run code, how computer programs are structured and how data journalists use these tools in real-world newsrooms will be covered. The workshop also will teach basic data-cleaning skills needed to prepare a dataset of public records for analysis.
Students who complete the course will have many career opportunities.
“We get more phone calls for people with these skills than we can fill,” Jenner said. “There is a big demand for journalists who are also able to program stuff.”
The seminar is just one offering available to Missouri students who want to learn data journalism. Students can also take Fundamentals of Data Reporting to learn simple spreadsheet analysis, Computer-Assisted Reporting to learn more in-depth database analysis and Mapping Data for Stories and Graphics.
Davis emphasized the value of the Building a Web Scraper seminar and related course opportunities offered at the School.
“The best way to learn something is to just go out there and do it. Dive in head-first,” he said.” That’s one of the most important things I learned at Mizzou while reporting at the Missourian and working for IRE . If you really want to learn, you have to get your hands dirty. And that’s what this class is all about, too.”
The National Institute for Computer-Assisted Reporting provides graduate students with opportunities to learn even more in-depth skills. NICAR database library students help obtain and prepare federal government databases used by working journalists. In addition, they help analyze data for professional news organizations. NICAR is a joint program of the School of Journalism and Investigative Reporters and Editors, a membership organization based on campus. Associate Professor David Herzog serves as NICAR’s academic adviser; Associate Professor Mark Horvit as IRE’s executive director.