The New York Times integrates data science not only into its digital business, but also its print operations. Sending an optimal number of newspapers to each of our sales locations is a long-standing problem that we are newly addressing with a modeling and experimentation platform deployed on Google Cloud services.
Our models combine custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. In particular, we probabilistically account for censored data (as demand in unknown when the paper sells out) and perform a constrained optimization to maximize profit while minimizing any decrease in circulation.
The algorithms are tested using paired treatment and control stores in which we can directly compare profits and sales. This "single copy" modeling must be executed regularly in a robust manner, as it drives our weekly sales in many stores throughout the country; these concerns have informed our recent redesign of the system as part of our company's move to Google Cloud Platform. This is one of the group's longest-running projects, and I will share some surprising lessons we've learned along the way.
Anne is Lead Data Scientist at The New York Times