Agile Data Science Workflows made easy with Python and Spark.

Prepare, explore and create Machine learning models for Big Data with the fastest open source library on the planet.

$ pip install optimuspyspark

Simple and Robust

Prepare, explore, visualize your data in few lines of code.

Apache Spark and Python

Easy, fast, parallelized and scalable data cleansing, exploration and Machine Learning Models creation.

Local or Cloud

In your laptop, local cluster or in the cloud.

Easy API

In a little more than 10 lines you can, remove white spaces, accents in all columns, lowercase all columns data, drop a "dummyCol", transform date format, sort a column, convert integers to a "string", replace "taco" per "taaaccoo" and "pizza" per "pizzza"

from optimus import Optimus
op = Optimus()
# This is a custom function
def func(value, arg):
return "this was a number"
df =op.load.url("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/foo.csv")
df\
.rows.sort("product","desc")\
.cols.lower(["firstName","lastName"])\
.cols.date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")\
.cols.years_between("birth", "years_between", "yyyy/MM/dd")\
.cols.remove_accents("lastName")\
.cols.remove_special_chars("lastName")\
.cols.replace("product","taaaccoo","taco")\
.cols.replace("product",["piza","pizzza"],"pizza")\
.rows.drop(df["id"]<7)\
.cols.drop("dummyCol")\
.cols.rename(str.lower)\
.cols.apply_by_dtypes("product",func,"string", data_type="integer")\
.cols.trim("*")\
.show()

Machine Learning

To apply random forest just need to import the ML Library and one line of code.

from optimus import Optimus
from optimus.ml.models import ML
op = Optimus()
ml = ML()
df_predict, rf_model = ml.random_forest(df_cancer, columns, "diagnosis")
['label',
'diagnosis',
'radius_mean',
'texture_mean',
'perimeter_mean',
'area_mean',
'smoothness_mean',
'compactness_mean',
'concavity_mean',
'concave points_mean',
'symmetry_mean',
'fractal_dimension_mean',
'features',
'rawPrediction',
'probability',
'prediction']

Data Enrichment

You can connect to any external API to enrich your data using Otimus.

import requests
def func_request(params):
# You can use here whatever header or auth info you need to send.
# For more information see the requests library
url= "https://jsonplaceholder.typicode.com/todos/" + str(params["rank"])
return requests.get(url)
def func_response(response):
# Here you can parse de response
return response["title"]
df_result = op.enrich(df, func_request= func_request, func_response= func_response)

Used by Forward thinking companies

Here are a few of our favourites!

“The group of BBVA Data & Analytics in Mexico has been using Optimus for the past months, and we have boosted our performance for cleansing, exploring and analyzing our data by 10x factor.”

BBVA Data & Analytics

Featured On

Newsletter

Want to know about new releases and how you can help Optimus?