Data cleansing and exploration made simple.

Prepare, process and explore your Big Data with fastest open source library on the planet using Apache Spark and Python (PySpark)

$ pip install optimuspyspark

$ pip install optimuspyspark

Improve your Big Data preparation and Data Science workflow with Optimus


Simple and Robust

Prepare, explore, visualize your data in few lines of code


Apache Spark and Python

Easy, fast, parallelized and scalable data cleansing and exploration.


Local or Cloud

In your laptop, local cluster or in the cloud.

Some snippets

Data Cleaning

In just 10 lines you can, remove white spaces, accents in all columns, lowercase all columns data, drop a "dummyCol", transform date format, calculate age, convert integers to "None", replace "taco" per "taaaccoo" and "pizza" per "pizzza" 😀

Outlier Detection

Detect and remove outliers using MAD

Missing data imputation

Use mean or median to fill the missing data


The group of BBVA Data & Analytics in Mexico has been using Optimus for the past months and we have boosted our performance for cleansing, exploring and analyzing our data by 10x factor.


Data & Analytics  

Help us shape Optimus' future

Just take a couple of minutes to help us shape Optimus' roadmap

powered by Typeform

Featured on


Want to know about new releases and how you can help Optimus?