Recently, there has been substantial media attention placed on failures in machine learning systems. Here I will present some of the challenges that Predata has faced in building predictive products, as well as giving brief overviews of a few techniques for combatting these. While by no means an exhaustive catalogue of ML failure modes, the nature of our prediction problem and our data has led us to face challenges including but not limited to class imbalance, non-stationarity, seasonality, concept drift, and difficulty establishing good metrics and loss functions.
John is Lead Data Engineer at Predata, working at the intersection of data science and data engineering. Predata transforms social and web traffic data into descriptive and predictive signals that provide insights on financial markets and political volatility. John previously worked as a Forward Deployed Engineer at Palantir following the acquisition of Poptip, where he built machine learning and natural language processing systems on streaming social data. He holds a B.S. in Electrical Engineering from Princeton University.