pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs to help data scientists prepare, analyze, and explore their data. However, despite its widespread adoption, pandas suffers from severe memory and performance issues on moderately large datasets. We present Modin, a fast, scalable drop-in replacement for pandas. By changing just a single line of code, Modin seamlessly speeds up pandas workflow on a laptop or in a cluster. Modin has over 6.6k GitHub stars, 1.7 million downloads, and is deployed at many data-centric organizations to accelerate dataframe workflows.
For more details, see: https://github.com/modin-
Devin Petersohn is the lead developer of Modin and the co-founder and CTO of Ponder. Devin recently completed his Ph.D. from UC Berkeley RISE Lab, where he did research on distributed systems for data science. As a part of this work, he created Modin, a system for enabling scalable interactive data science.