Torbjorn Zetterlund

Sun 13 2022

What is Reverse ETL?

by bernt & torsten

The data engineering industry has evolved from Extract, Transform, Load (ETL) to ELT, where raw data is copied from the source system loaded into a data warehouse or data lake and then transformed. The current trend is to adopt a new approach, called “reverse ETL,” the process of moving data from a data warehouse into third-party systems to make data operational 

In this type of data stack, the data warehouse becomes the single source of truth for data, including customer data that can be spread across different systems. Solutions that have enabled this new architecture include Fivetran, Airbyte and Cloud Function for (EL), DBT for (T), BigQuery, Snowflake and Redshift for the data warehouse.

Traditionally data stored in data warehouses were used for analytical workloads and business intelligence applications like Looker and Superset. New uses cases have appeared in that data can be further utilized for operational analytics, which drives action by automatically delivering real-time data to your organization. where it matters.

There are many use cases for reverse ETL, having a consistent view of the customer across all systems mirroring product usage data can help improve customer interactions by supporting personalized messages that include product metrics. Pushing data to Salesforce you can have an up-to-date list of high lifetime value customers or customers that spend more than a defined amount or how the customer interacts with your organization.

Syncing customer data into your support portal can save time when responding to support requests or automatically prioritize messages when they come in.

Write your own Data Connectors

You could write your own API connectors to extract or to push data from the data warehouse e.g to pipe the data into operational systems like Salesforce, Marketo, HubSpot. Writing your own data pipeline connectors can be done with Cloud Function, there is a downfall with going this route in that it can be hard to write these connectors because endpoints may be brittle and most APIs are not built to handle real-time data transfer.

Data teams must setup batching, retries, and checkpointing to avoid rate limits. Mapping fields from the data warehouse to SaaS products can take time. From there, it can be challenging to maintain the connectors over time because API specs change.

Why Reverse ETL tool

Reverse ETL solutions offer out-of-the-box connectors to numerous systems, so teams no longer need to write and maintain their own connectors. In doing it in-house with Cloud Functions teams might have only written a few connectors for systems like Salesforce, Marketo, HubSpot as it takes time, and when connectors go live time is spent on maintenance, even having to plan for regular API compliance to make sure specs has not changed.

Create customer segmentation, audiences, and lead scoring through a visual analysis interface or dbt model outputs that can be pushed downstream. Using a reverse ETL tool, your data team can now push data into more systems, getting better use of the data. reverse ETL tools provide a visual interface to choose which query output columns are used to populate standard and custom fields, allowing you to continuously sync or define what triggers the syncing between the systems.

For example, after a dbt job is run it can trigger the sync, reverse ETL solutions log and monitor sync status and progress and notify teams if they need attention.

Using a reverse ETL tool will allow data teams to maintain a single data pipeline compared to multiple. They no longer have to write scripts and have visibility and control over syncs. Sales, marketing, growth, and analytics teams can analyze and act upon the same, consistent, and reliable data. Data consistency helps create continuity across the business since functional teams are working off the same data even if using e.g. Salesforce, Marketo, HubSpot, and will accelerate decision-making.

There are over 300 companies from startups, open-source and commercial companies that offer ETL or ELT or Reverse ETL, as you can see from the table below it is not easy to find the solutions that fit your need.

Data Solutions

Name Website Cat SubCat Type Deployment Started HQ Description
Abacus AI All-in-one AutoML Commercial 2019 USA Abacus.AI makes it effortless to create large-scale customizable deep learning systems. Accurate predictions generated by our system can be easily and securely incorporated into all aspects of your customer experience and business processes
Accord Modeling & Training Framework Commercial 2012 France Machine learning, computer vision, statistics and general scientific computing for .NET
Actian Data pipeline Data management Commercial On Premises Actian DataConnect aggregates data from any source, whether on premises or in the cloud, in a database, or in a SaaS application.
Adeptia Integration Suite Data pipeline Data management Commercial On Premises Adeptia offers self-service ETL capabilities to business users and data scientists. Developers can use it for data validations, cleansing, routing, exception-handling, and back-end connectivity.
Aible All-in-one Serving Commercial 2018 USA Create AI that delivers impact, not accuracy, with cost-benefit tradeoffs & operational constraints, in a friendly, intuitive UI designed for real business.
AIMET Modeling & Training Model compression Open Source 2020 USA AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Airbyte Data pipeline ETL Platofrm/Tools Commercial / Open Source SaaS/On Premises Get all your ELT data pipelines running in minutes, even your custom ones. Let your team focus on insights and innovation
Aircloak Data pipeline Privacy 2012 Germany Aircloak's unique approach ensures the existing primary database is not modified in any way. Aircloak handles all data types including unstructured text.
Airflow Infrastructure Workflow orchestration Open Source Hybrid and multi-cloud 2015 USA Airflow is a modern platform that designs, creates, and tracks workflows. It is an open-source Google Cloud ETL tool. It supports integration with cloud services, including Google Cloud Platform, Azure, and AWS. It offers a user-friendly interface and provides clear visualization. Scaling becomes very easy with Airflow due to its modular structure.
Alectio Modeling & Training Active learning 2019 USA Not all data is created equal You can build better models with less data. We can show you how.
Algorithmia Serving Serving 2013 USA Algorithmia makes applications smarter, by building a community around algorithm development, where state of the art algorithms are always live and accessible to anyone
Alink Modeling & Training Framework 2018 China Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Allegro AI/TRAINS Modeling & Training Experiment tracking 2016 Israel Deep learning platform tailored for computer vision. Allegro AI offers the first end-to-end machine learning product life-cycle management solution.
AllenNLP Modeling & Training NLP 2016 USA AllenNLP is an open-source NLP research library, built on PyTorch.
Alluxio Data pipeline Data management 2015 USA an open source data orchestration layer that brings data close to compute for big data and AI/ML workloads in the cloud.
Alooma Data pipeline Data management Commercial SaaS Alooma is a real-time data pipeline that lets you integrate any data source – databases, applications, and any API - with your data warehouse.
Alteryx Data pipeline Data management Commercial On Premises 2011 USA Alteryx allows you to prep, blend, and analyze data using a repeatable workflow, then deploy and share analytics for deeper insights in hours, not weeks.
Amazon Redshift Data pipeline Data warehouse 2012 USA Amazon Redshift is a fast, fully managed, and cost-effective data warehouse that gives you petabyte scale data warehousing and exabyte scale data lake analytics together in one service. Amazon Redshift is up to ten times faster than traditional on-premises data warehouses.
Amundsen Data pipeline Database/Query 2019 USA Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Angel ML Modeling & Training Distributed 2017 China A Flexible and Powerful Parameter Server for large-scale machine learning
Anodot Data pipeline Data monitoring 2014 Israel We monitor your business. Anodot monitors all your data in real time for lightning fast detection of the incidents that impact your revenue
Anyscale Infrastructure Cloud management 2019 USA From the creators of Ray, a framework for building machine learning applications at any scale originating from the UC Berkeley RISELab.
Anyverse All-in-one Commercial Accelerate advanced perception system development with hyperspectral synthetic data that mimics exactly what your sensors see
Apache Druid Data pipeline Database/Query 2012 USA Apache Druid is a high performance real-time analytics database
Apache Flink Serving Stream processing 2011 Germany Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities
Apache Hudi Data pipeline Data warehouse 2016 USA Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores)
Apache Kafka Serving Stream storage Open Source ETL 2011 USA Apache Kafka is an open-source distributed event streaming platform used by many companies to develop high-performance data pipelines, perform streaming analytics and data integration.
Apache Mahout Modeling & Training Framework 2008 Remote Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end, or can be extended to other distributed backends.
Apache MXNet Modeling & Training Framework 2015 A flexible and efficient library for deep learning.
Apache NiFi Data pipeline ETL Platofrm/Tools Open Source ETL Apache NiFi is an open-source ETL tool and is free for use. It allows you to visually assemble programs from boxes and run them without writing code. So, it is ideal for anyone without a background in coding. It can work with numerous different sources, including RabbitMQ, JDBC query, Hadoop, MQTT, UDP socket, etc. You can use it to filter, adjust, join, split, enhance, and verify data.
Apache ORC Data pipeline File format 2013 the smallest, fastest columnar storage for Hadoop workloads.
Apache Spark Data pipeline ETL Platofrm/Tools Open Source SaaS/On Premises Apache Spark is an excellent ETL tool for Python-based automation for people and enterprises that work with streaming data. Growth in data volume is proportional to business scalability, making automation necessary and relentless with Spark ETL.
Apache Superset All-in-one Open Source Apache Superset is a modern, enterprise-ready business intelligence web application. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple pie charts to highly detailed geospatial charts.
Apache TVM Serving Inference 2017 Apache TVM (incubating) is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.
Aparavi Data pipeline Data management 2016 USA Aparavi's highly scalable data intelligence and automation solutions enable organizations to easily discover, classify, protect, and optimize their data.
ApatarForge Data pipeline ETL Platofrm/Tools Open Source Hybrid and multi-cloud Apatar is an open source data integration and ETL tool written in Java.
AresDB Data pipeline Database/Query Open Source 2019 USA A GPU-powered real-time analytics storage and query engine.
Argo Serving CI/CD Open Source 2018 USA Get stuff done with Kubernetes. Open source Kubernetes native workflows, events, CI and CD
Arize AI Serving Monitoring 2019 USA Arize AI is the watcher, troubleshooter and the guardrail on deployed AI
Arthur AI Serving Monitoring 2018 USA Always-on Explainability, Bias, and Performance Monitoring for AI, ML, and analytics. Get up and running in minutes and start sleeping better at night. Dedicated. Innovative. Data pipeline Data management 2015 USA Experience continuously optimized data pipelines with less code and fewer breakages. Enter the new era of data engineering with Ascend's autonomous dataflow service.
Astera Centerprise Data pipeline ETL Platofrm/Tools Commercial On Premises Centerprise ETL offers data warehouse loading functionality, including the Slowly Changing Dimension (SCD) transformation.
Astronomer Data pipeline ETL Platofrm/Tools Commercial Hybrid and multi-cloud Build, run, and manage data pipelines-as-code at enterprise scale with Apache Airflow, the most popular open source orchestrator.
AtScale Data pipeline Data management 2013 USA Freedom of choice for the enterprise. Break free the complexities and security risks associated with cloud migration and self-service analytics with Intelligent Data Virtualization—no matter where dat.
Backend AI Infrastructure Workflow orchestration 2016 South Korea Backend.AI: Minute-made GPU clustering solution for Machine Learning.
BentoML Modeling & Training Pretrained models 2018 USA BentoML makes it easy to serve and deploy machine learning models in the cloud. It is an open source framework for building cloud-native model serving services. BentoML supports most popular ML training frameworks and deployment platforms, including major cloud providers and docker/kubernetes.
Blaize Hardware Edge devices 2010 USA Intelligence at the edge of everywhere. Blaize unleashes the potential of AI to drive leaps in the value that technology delivers to transform markets and improve the way we all work and live.
Blendo Data pipeline ETL Platofrm/Tools Commercial SaaS Blendo provides a data management platform that connects, reshapes, and delivers actionable data, with a focus on simple integration procedures and automated data collection.
Bonobo Data pipeline ETL Platofrm/Tools Open Source SaaS/On Premises Bonobo is an open-source, Python-based ETL pipeline deployment and data extraction tool. You can leverage its CLI to extract data from SQL, CSV, JSON, XML, and many other sources.
Boruta Modeling & Training Feature engineering Open Source 2010 Python implementations of the Boruta all-relevant feature selection method.
Boulder AI Hardware Edge devices 2017 USA Human insight and decision making on a visual sensor.
BrainChip Hardware Edge devices 2006 USA BrainChip brings artificial intelligence to the edge with a high-performance, small, ultra-low power solution that enables continuous learning and inference.
Bubbles Data pipeline ETL Platofrm/Tools Open Source SaaS/On Premises Bubbles is a Python framework for data processing and data quality measurement. Basic concept are abstract data objects, operations and dynamic operation dispatch. Flow Data pipeline ETL Platofrm/Tools Commercial SaaS Flow is a drag-and-drop tool for building enterprise integrations.
Cadence Infrastructure Workflow orchestration 2017 USA Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
Caffe Modeling & Training Framework 2013 USA Caffe: a fast open framework for deep learning.
Cambricon Hardware Accelerator 2016 China Cambricon Technologies builds core processor chips for intelligent cloud servers, intelligent terminals, and intelligent robots.
Catalyst Modeling & Training Framework Open Source 2018 Russia PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write another regular train loop.
Cazena Data pipeline Data management 2014 USA First Data Lake with a SaaS Experience. Cazena empowers enterprises to collect, store and analyze any data in the cloud, without any DevOps resources or admin time. Cazena's Data Lake as a Service includes everything, and is delivered as secure SaaS, ready to load, store and analyze data with any method: SQL, Spark, R, Python, and many more.
CDAP Data pipeline ETL Platofrm/Tools Open Source ETL Hybrid and multi-cloud Interoperability across on-premises and Cloud environments; Support for all major public cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud Platform.
CData Software Data pipeline ETL Platofrm/Tools Commercial SaaS CData Software offers data integration solutions for real-time access to online or on-prem applications, databases, and Web APIs. The vendor specializes in providing access to data through established data standards and application platforms such as ODBC, JDBC, ADO.NET, SSIS, BizTalk, and Microsoft Excel. CData Software products are broken down into six categories: driver technologies, enterprise connectors, data visualization, ETL and ELT solutions,
Cerebras Hardware Accelerator 2016 USA AI insights, faster Cerebras is a computer systems company dedicated to accelerating deep learning. The pioneering Wafer-Scale Engine (WSE) – the largest chip ever built – is at the heart of our deep learning system, the Cerebras CS-1.
Chainer Modeling & Training Framework 2015 Japan A Powerful, Flexible, and Intuitive Framework for Neural Networks
Civis Analytics All-in-one Analytics/Ai Commercial Civis Turns Data Into Campaigns That Drive Action
ClearSky Data Data pipeline Storage 2014 USA ClearSky Data offers enterprise storage as a hybrid cloud service delivering on-demand primary storage, offsite backup, and DR as a single service.
CleverHans Modeling & Training Adversarial robustness 2017 USA An adversarial example library for constructing attacks, building defenses, and benchmarking both
Clipper Serving Web 2017 USA Clipper is a low-latency prediction serving system for machine learning. Clipper makes it simple to integrate machine learning into user-facing serving systems.
Cloudera Infrastructure Cloud management 2008 USA Cloudera delivers an Enterprise Data Cloud for any data, anywhere, from the Edge to AI.
CloverETL All-in-one ETL Platofrm/Tools Open Source On Premises CloverETL is a data integration software suite for data migration and data warehousing, and for feeding data into business intelligence and reporting applications.
Cohesity Data pipeline Data management 2013 USA Eliminate mass data fragmentation with Cohesity's modern approach to data management, beginning with backup. Gain instant recovery. Learn more today.
Colab Modeling & Training Notebook 2017 USA Colab notebooks allow you to combine executable code and rich text in a single document, along with images, HTML, LaTeX and more.
Comet Modeling & Training Experiment tracking 2017 USA Comet lets you track code, experiments, and results on ML projects. It’s fast, simple, and free for open source projects.
Confluent Data pipeline Stream processing 2014 USA Confluent is a fully managed Kafka service and enterprise stream processing platform. Real-time data streaming for AWS, GCP, Azure or serverless. Try free!
Core ML Serving Mobile 2017 USA Use Core ML to integrate machine learning models into your app. Core ML provides a unified representation for all models.
Cortex Serving Web 2019 USA Cortex is an open source platform for deploying machine learning models as production web services.
Cubonacci All-in-one AI Apps platform 2018 Netherlands Machine learning lifecycle management Cubonacci enables organizations to focus on developing custom machine learning models without having to worry about peripheral matters. The Cubonacci platform manages deployment, versioning, infrastructure, monitoring and lineage for you, eliminating risk and minimizing time-to-market.
cuDF Data pipeline Data processing 2018 USA Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
DAGsHub Modeling & Training Versioning 2019 Israel DAGsHub is a platform for data version control and collaboration for data scientists and machine learning engineers.
DarwinAI Modeling & Training Explanability 2017 Canada DarwinAI’s Generative Synthesis 'AI building AI' technology enables optimized and explainable deep learning.
Dash Serving App interface 2015 Canada Dash Enterprise is the end-to-end development & deployment platform for low-code AI Dash applications.
Dask Data pipeline Data processing 2015 Remote Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
Databricks All-in-one Data management Commercial 2013 USA All your data, analytics and AI on one lakehouse platform
Dataddo All-in-one ETL + Analytics Commercial SaaS Your data, from any source, to any destination
Datadog Infrastructure Cloud management 2010 USA See inside any stack, any app, at any scale, anywhere.
Datagrok All-in-one Data processing 2019 USA Datagrok: Swiss Army Knife for Data. A platform for turning data into actionable insights
Dataiku All-in-one AI Apps platform 2013 USA Dataiku's single, collaborative platform powers both self-service analytics and the operationalization of machine learning models in production.
DataRobot All-in-one AI Apps platform 2012 USA DataRobot combines a trusted enterprise AI platform and a trusted AI-native strategic partnership for global enterprises that want to harness the power of AI and their existing teams to succeed in today's Intelligence Revolution.
Datatable Data pipeline Data processing 2017 USA Python library for efficient multi-threaded data processing, with the support for out-of-memory datasets.
Datatron Serving Monitoring 2016 USA Production AI Model Management at Scale. Automate the standardized deployment, monitoring, governance, and validation of all your models to be developed in any environment.
Dataturks Data pipeline Labeling 2018 India ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
DataVirtuality Data pipeline ETL Platofrm/Tools Commercial SaaS Rapid data integration for analytics: Integrates multiple data sources, web services, and front ends in a snap.
Datera Data pipeline Storage 2013 USA Get sub-200µS latency & millions of IOPS with 100% software-defined data automation. Save up to 70% on data infrastructure total-cost-of-ownership.
Datmo Modeling & Training Experiment tracking 2016 USA Be as effective as AI engineers at Google and Facebook. Workflow tools to help you experiment, deploy, and scale. By data scientists, for data scientists.
Datorama Data pipeline ETL Platofrm/Tools Commercial SaaS Loading...
DAWNBench Modeling & Training Benchmarking 2018 USA DAWNBench is a benchmark suite for end-to-end deep learning training and inference.
Deeplite Serving Model compression 2020 Canada Enabling faster, smaller and more energy-efficient DNNs to run on edge devices and in the cloud
DeepNote Modeling & Training Notebook 2019 USA The notebook you’ll love to use Deepnote is a new kind of data science notebook. Jupyter-compatible with real-time collaboration and easy deployment. Oh, and it's free.
DefinedCrowd Data pipeline Data generation 2015 USA Leverage machine learning technology and human intelligence to source, structure, and enrich high quality training data in speech, NLP, and computer vision.
Dell Boomi All-in-one ETL Platofrm/Tools Commercial SaaS Boomi AtomSphere lets you configure and deploy integrations at a fraction of the cost and time of traditional approaches, all from a single interface.
Delta Lake Data pipeline Data warehouse 2019 USA Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
Dessa All-in-one Monitoring 2016 Canada Create more with machine learning. Build, run & monitor 1000s of ML experiments with Foundations
Determined AI Modeling & Training AutoML 2016 USA Our AutoML platform streamlines your deep learning workflows, tracks your work, and manages your GPU clusters.
Dialogflow Modeling & Training NLU 2014 USA Dialogflow is a Google service that runs on Google Cloud Platform, letting you scale to hundreds of millions of users. Optimized for the Google Assistant.
Doccano Data pipeline Labeling 2018 Japan Text annotation for Human. Just create project, upload data and start annotation. You can build dataset in hours.
Dockship Modeling & Training Pretrained models 2019 India is a marketplace for AI models and datasets. Publish your models on Dockship for people all over the world.
Dolt Data pipeline Versioning 2018 USA Liqiudata's mission is to make data move more efficiently. We built Dolt, an an open-source version-controlled SQL database with Git-like semantics.
Domino Data Lab Infrastructure Cloud management 2013 USA Deliver winning models. One place for your data science tools, apps, results, models, and knowledge
Domo Data pipeline ETL Platofrm/Tools Commercial SaaS With Domo, you can use data and insights delivered in data experiences to multiply your business impact and drive your business forward.
dotData All-in-one Feature engineering 2018 USA When AutoML is enhanced with AI-powered feature engineering, the result is dotData. We focus on delivering data science automation for the enterprise. End-to-end data science automation platform accelerates, democratizes, and operationalizes the entire data science process.
Dremio Data pipeline Data management 2015 USA Get more value from your data, faster. Dremio makes your data engineers more productive, and your data consumers more self-sufficient.
DVC - Data pipeline Versioning 2017 USA Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
EdgeQ Hardware Edge devices 2018 USA EdgeQ is an information technology company that specializes in the fields of 5G chip systems.
Eight Wire Conductor Data pipeline ETL Platofrm/Tools Commercial SaaS Conductor, from New Zealand-based Eight Wire, offers point-and-click data integrations.
Elastifile Data pipeline Storage 2013 USA Elastifile's cloud-native file storage helps organizations adapt and accelerate their business in the cloud era. Powered by a scalable, enterprise-grade distributed file system with intelligent object tiering, Elastifile augments existing public cloud services with a scalable, POSIX-compliant NAS, facilitating frictionless cloud adoption. With Elastifile, organizations enjoy low-touch file storage services, or deploy and manage cloud-native file storage themselves, eliminating the need for manual storage management and IT forecasting. Elastifile's unique combination of features and flexibility empowers organizations to seamlessly integrate cloud resources, with no application refactoring… thereby modernizing their infrastructure and achieving IT agility and efficiency goals.
Elementl All-in-one Workflow orchestration 2018 USA Building Dagster, the data orchestrator. Dagster is a data orchestrator for machine learning, analytics, and ETL
Elixir Repertoire Data ETL Data pipeline ETL Platofrm/Tools Commercial On Premises Elixir Data ETL provides on-demand, self-service data manipulation. It provides design, test, and implement data extraction, aggregation, and transformation.
erwin Data pipeline Data management 2016 USA Integrated enterprise architecture, business process and data modeling with data cataloging and data literacy for risk management and digital transformation.
Etleap Data pipeline ETL Platofrm/Tools Commercial SaaS Etleap is a Redshift ETL tool that makes it easy to bring data from disparate data sources into a Redshift data warehouse.
Etlworks Data pipeline ETL Platofrm/Tools Commercial SaaS,On Premises Etlworks Integrator is a powerful and easy-to-use cloud data integration service that can work with structured and semi-structured data of any type and size.
Evidently AI Serving Monitoring 2020 Russia Open-source tools to analyze, monitor, and debug machine learning model in production
Excelero Data pipeline Storage 2014 USA Local NVMe performance at data center scale through true convergence. Software-defined block storage for Cloud and Enterprise applications at any scale. Modeling & Training Interpretability 2020 USA ExplainX enables you to explain, present, and monitor how your AI models work. We make sure your models never fail in the real-world.
Facets Data pipeline Visualization Open Source 2017 USA Facets: An Open Source Visualization Tool for Machine Learning Training Data
fastText Modeling & Training NLP 2016 USA Library for fast text representation and classification.
FEAST Data pipeline Feature engineering 2019 Asia Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.
Featuretools Modeling & Training Feature engineering 2018 USA An open source python library for automated feature engineering
FedAI (FATE) Modeling & Training Framework 2019 China FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure computing framework to support the federated AI ecosystem. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). It supports federated learning architectures and secure computation of various machine learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning.
Fiddler Labs Modeling & Training Interpretability 2018 USA AI with trust, visibility, and insightts built in. Fiddler is a breakthrough AI engine with explainability at its heart.
Figure Eight Data pipeline Labeling 2008 USA Figure Eight combines the best of human and machine intelligence to provide high-quality annotated training data that powers the world's most innovative machine learning and business solutions
Fivetran Data pipeline ETL Platofrm/Tools Commercial SaaS All your data organized in a full data warehouse in minutes, not months.
flair Modeling & Training NLP Open Source 2018 Germany A very simple framework for state-of-the-art Natural Language Processing (NLP)
FloydHub Infrastructure Cloud management 2016 USA FloydHub is a zero setup Deep Learning platform for productive data science teams.
Fluree Data pipeline Database/Query 2017 USA Welcome to better data management. The Fluree platform organizes blockchain-secured data in a highly-scalable, highly-insightful graph database.
Flyte Infrastructure Workflow orchestration 2019 USA Lyft’s Cloud Native Machine Learning and Data Processing Platform, Now Open Sourced
Formant Serving Robotics 2019 USA Deploy faster. Improve uptime. Achieve scale.
Fritz AI Serving Mobile 2017 USA Fritz AI is the machine learning platform for iOS and Android developers. Teach your mobile apps to see, hear, sense, and think.
Gemini Data Data pipeline Data management 2015 USA Gemini Data provides Data Availability for AI/ML driven analysis and applications to enable unified enterprise knowledge and access.
Gensim Modeling & Training Framework 2012 Czech Topic Modelling for Humans. Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Git LFS Data pipeline Versioning Open Source 2014 Remote Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like or GitHub Enterprise.
Gluent Data pipeline Visualization 2014 USA Data virtualization software eliminates data silos. Gluent's transparent data virtualization provides virtual access to all enterprise data, with zero code changes.
GluonCV Modeling & Training Pretrained models 2018 USA GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It aims to help engineers, researchers, and students quickly prototype products, validate new ideas and learn computer vision.
Google Cloud Data Fusion All-in-one ETL Platofrm/Tools Commercial SaaS Google Cloud Data Fusion is a cloud-native data integration tool. It is a fully managed Google Cloud ETL tool that allows data integration at any scale. It is built with an open-source core, CDAP for your pipeline portability. It offers a visual point and clicks interface that allows code-free deployment of your ETL/ELT data pipelines. Apart from native integration with Google Cloud Services, it also offers 150+ pre-configured connectors and transformations at zero additional cost.
Gradio Serving App interface 2018 USA Gradio allows you to quickly create customizable UI components around your TensorFlow or PyTorch models, or even arbitrary Python functions. Mix and match
Graphcore Hardware Accelerator 2016 UK Graphcore has built a new type of processor for machine intelligence to accelerate machine learning and AI applications for a world of intelligent machines.
Graviti Data Platform Data pipeline Data management Commercial SaaS 2019 China As a platform for unstructured data management, Graviti Data Platform provides services in data hosting, version control, data visualization, and collaboration. You can also integrate Graviti Data Platform into your own pipeline using developer tools.
GreenWaves Technologies Hardware Edge devices 2014 France GreenWaves' GAP8 is the industry's first ultra-low-power processor enabling battery-operated AI in IoT applications.
Gretel AI Data pipeline Privacy 2019 USA The first and only APIs to enable you to balance, anonymize, and share your data. With privacy guarantees.
Grid AI Modeling & Training Distributed training 2020 USA Seamlessly train hundreds of Machine Learning models on the cloud from your laptop. Focus on machine learning, not infrastructure.
Groq Hardware Accelerator 2016 USA The Next Generation of Computing is here.
H2O All-in-one AI Apps platform 2012 USA is the creator of H2O the leading open source machine learning and artificial intelligence platform trusted by data scientists across 14K enterprises
Habana Labs Hardware Edge devices 2016 Israel Habana Labs was founded in 2016 to create world-class AI Processors, developed from the ground-up and optimized for training deep neural networks and for inference deployment in production environments.
Hailo Hardware Edge devices 2017 Israel The World’s Top Performing AI Processor for Edge Devices Hailo offers a breakthrough microprocessor uniquely designed to accelerate embedded AI applications on edge devices. Breathe life into your edge AI product today with Hailo-8.
Hammerspace Data pipeline Database/Query 2015 USA Hammerspace allows data to move freely, like the air you breathe, across clouds and services. Make data accessible exactly where you need it, when you need it – on demand.
Heartex Label Studio Data pipeline Labeling 2018 USA Label Studio is a multi-type data labeling and annotation tool with standardized output format
Hevo Data Data pipeline Data management Commercial SaaS Hevo Data is a No-code Data Pipeline that offers a fully-managed solution to set up data integration from Google Cloud Platform and 100+ data sources (including 30+ free data sources) and will let you directly load data to a Data Warehouse such as Snowflake, Amazon Redshift, Google BigQuery, etc
Hitachi Vantara Data pipeline Data management Commercial SaaS Hitachi Vantara’s Pentaho platform for data integration and analytics offers traditional capabilities and big data connectivity. The solution supports the latest Hadoop distributions from Cloudera, Hortonworks, MapR, and Amazon Web Services. However, one of the tool’s shortcomings is that its big data focus takes attention away from other use cases. Pentaho can be deployed on-prem, in the cloud, or via a hybrid model.
HIVE All-in-one Labeling 2013 USA Hive is a full-stack deep learning company focused on solving visual intelligence problems. Let us help you join the AI Revolution. End-To-End Solutions. Full-Stack Approach.
Horovod Modeling & Training Distributed 2017 USA Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.
Hugging Face Modeling & Training NLP 2016 USA We're on a journey to solve and democratize artificial intelligence through natural language.
HYCU Infrastructure Cloud management 2009 USA Keep hyper-converged infrastructure running with HYCU's powerful, simple backup & recovery and monitoring solutions. Deploy in seconds for superior results.
HyperOpt Modeling & Training Hyperparameter tuning Open Source 2013 Canada Distributed Asynchronous Hyperparameter Optimization in Python - hyperopt/hyperopt
IBM InfoSphere DataStage All-in-one ETL Platofrm/Tools Commercial On Premises IBM InfoSphere Information Server is a data integration platform that helps businesses understand, cleanse, transform, and deliver trusted information.
IBM Infosphere Information Server All-in-one Data management Commercial On Premises Information Server is a branch of IBM’s product that revolves around data warehousing and data integration. It’s an enterprise product for large organizations that supports integration with cloud data storage, including Google Cloud, AWS S3, etc.
Igneous Data pipeline Data management 2013 USA Igneous Unstructured Data Protection offers the scalability to handle hundreds of file systems, billions of files, and exabytes of enterprise data requiring backup
Iguazio All-in-one AI Apps platform 2014 Israel The Iguazio Data Science Platform automates your machine learning pipeline, transforming AI projects into real-world business outcomes.
iMerit Data pipeline Labeling 2012 USA iMerit specializes in data labeling and annotation for purposes of training models for Machine Learning and Artificial Intelligence.
Imply Data pipeline Data management 2015 USA Imply delivers real-time analytics powered by Apache Druid. ... Stream or batch load data into Druid for high performance, ad-hoc analytic queries.
Improvado All-in-one Analytics/Ai Commercial SaaS Loading...
Incorta Data pipeline Data processing 2013 USA Incorta aggregates large complex business data in real time, eliminating the need to reshape it. No Data Warehouse. No Transformations. Real-Time Insight.
Inferrd Serving Deployment 2020 USA You build the model, we handle the deployment. Inferrd is the easiest, cheapest and the most performant hosting provider for ML models.
Informatica All-in-one ETL Platofrm/Tools Commercial SaaS Informatica is an enterprise on-premise Google Cloud ETL tool that can build enterprise warehouses. It also supports integration with various traditional databases. It has the capability of delivering data on-demand. Some of its key features include advanced transformation, dynamic partitioning, zero downtime, universal connectivity, data masking, etc. All-in-one ETL Platofrm/Tools Commercial SaaS Turn your data warehouse into a data platform that powers all company decision making and operational systems.
InterpretML Modeling & Training Interpretability 2019 USA Fit interpretable machine learning models. Explain blackbox machine learning
Jaspersoft Data pipeline Data management Commercial SaaS Loading...
JAX Modeling & Training Framework 2018 USA Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Keboola All-in-one ETL Platofrm/Tools Commercial SaaS Keboola is a cloud-based data integration platform that connects data sources to analytics platforms. It supports the entire data workflow process, from the point of data extraction, preparation, cleansing, warehousing, and all the way to its integration, enrichment, and loading.
Kedro All-in-one AI Apps platform 2019 UK Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.
Kimono Labs Data pipeline Data generation 2014 USA Kimono Labs is an online platform that allows its users to convert their websites into APIs.
Kneron Hardware Edge devices 2015 USA Kneron develops an application-specific integrated circuit and software that offers artificial intelligence-based tools.
Koalas Data pipeline Data processing Open Source 2019 USA The Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.
Komprise Data pipeline Storage 2014 USA In 15 minutes, our free data management software trial will show you how you can save 70% on data management costs, on-premises and in the cloud.
Kubeflow Serving Deployment 2018 USA The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Kyvos Insights Data pipeline Database/Query 2015 USA Kyvos accelerates BI on trillions of rows of data on the cloud and on-premise platforms with a semantic layer powered by its next-generation OLAP technology.
Labelbox Data pipeline Labeling 2018 USA A complete solution for your training data problem with fast labeling tools, human workforce, data management, a powerful API and automation features.
LabelImg Data pipeline Labeling Open Source 2016 Canada LabelImg is a graphical image annotation tool and label object bounding boxes in images
LeapMind Hardware Edge devices 2012 Japan Ultra-low power consumption AI inference accelerator IP specialized for inference arithmetic processing of CNN that operates as a circuit on FPGA device or ASIC device .
Lightelligence Hardware Accelerator 2017 USA Accelerate AI, Neuromorphic, AI Chip, Optical Computing, Lightmatter
LightGBM Modeling & Training Framework Open Source 2016 USA A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
LIME Modeling & Training Interpretability 2016 USA Lime: Explaining the predictions of any machine learning classifier
Losswise Serving Monitoring 2017 USA Turn your GPUs into monitored build servers from a git push with Losswise. Interactive visualization, logs, smart notifications, and more. Start free today.
Ludwig Modeling & Training Framework 2019 USA Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
Luigi Infrastructure Workflow orchestration Open Source SaaS/On Premises 2012 Sweden Luigi is a lightweight, well-functioning Python ETL framework tool that supports data visualization, CLI integration, data workflow management, ETL task success/failure monitoring, and dependency resolution.
Luminous Computing Hardware Accelerator 2018 USA Hardware is bottlenecked by data movement & compute. We use photonics to solve both
Materialize Data pipeline Stream processing 2019 USA Materialize delivers SQL exploration for streaming events and real-time data. Incrementally updated materialized views - in ANSI Standard SQL and in real-time. Micro-batching.
Matillion All-in-one ETL Platofrm/Tools Commercial SaaS Matillion offers data integration software for cloud data warehouses, and was designed for Amazon Redshift, Snowflake, and Google BigQuery.
Matroid Modeling & Training Computer vision 2016 USA Computer vision made simple. Deploy computer vision solutions in minutes, not months.
Metaflow Infrastructure Workflow orchestration 2019 USA Metaflow makes it quick and easy to build and manage real-life data science projects. Metaflow is built for data scientists, not just for machines.
Metl All-in-one ETL Platofrm/Tools Open Source SaaS/On Premises Metl or Mito-ETL is a fast-proliferating Python ETL development platform used to develop bespoke code components. These code components can range from RDBMS data integrations, Flat file data integrations, API/Service-based data integrations, and Pub/Sub (Queue-based) data integrations.
Michelangelo All-in-one Workflow orchestration 2015 USA Michelangelo, Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep learning models that span a myriad of use cases ranging from generating marketplace forecasts, responding to customer support tickets, to calculating accurate estimated times of arrival (ETAs) and powering our One-Click Chat feature using natural language processing (NLP) models on the driver app.
Microsoft (SQL Server Integration) Data pipeline Database/Query Commercial On Premises USA Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions.
Milvus Data pipeline Database/Query 2019 China Milvus is an open source similarity search engine for massive feature vectors. Designed with heterogeneous computing architecture for the best cost efficiency. Searches over billion-scale vectors take only milliseconds with minimum computing resources.
Mindspore Modeling & Training Framework 2020 China MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios
ML Kit Serving Mobile 2018 USA ML Kit beta brings Google's machine learning expertise to mobile developers in a powerful and easy-to-use package.
ML.NET Modeling & Training Framework 2018 USA ML.NET is an open source and cross-platform machine learning framework for .NET
MLFlow All-in-one Experiment tracking 2018 USA An open source platform for the machine learning lifecycle
MLlib Modeling & Training Framework 2010 MLlib is Apache Spark's scalable machine learning library.
MLPerf Modeling & Training Benchmarking 2018 USA Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.
MMdnn Serving Compatibility 2017 USA MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
MNN Serving Inference 2019 China MNN is a lightweight deep neural network inference engine.
Modin Data pipeline Data processing Open Source 2018 Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
Mona Labs Serving Monitoring 2018 USA PRODUCTION MONITORING FOR AI. With Mona, you gain complete transparency into how your data and models behave in the real world.
Mozart Data Data pipeline Analytics/Ai Commercial SaaS Mozart isn’t strictly an ETL tool, but it can help you automate the process of extracting, transforming, and loading your data into a warehouse all in one central tool.
Mythic Hardware Edge devices 2012 USA An architecture built from the ground up for AI Mythic has developed a truly unique AI compute platform that enables smart camera systems, intelligent appliances, brilliant robotics, and more.
Naveego Data pipeline Data processing 2014 USA A leading provider of cloud-first, distributed data accuracy solutions for seamless, end-to-end data cleansing, Naveego enables organizations to proactively manage, detect and eliminate data accuracy issues across all enterprise data sources in real-time–regardless of structure or schema.
ncnn Serving Mobile 2017 USA ncnn is a high-performance neural network inference framework optimized for the mobile platform
NeMo Modeling & Training NLU 2019 USA NeMo: a toolkit for conversational AI
Neptune Modeling & Training Experiment tracking 2017 Poland All experiment-related objects relevant to your projects organized, ready to be analyzed, discussed and shared with your team.
Netron Modeling & Training Visualization 2011 USA Netron is a viewer for neural network, deep learning and machine learning models.
Neural Network Distiller Serving Model compression 2018 USA Distiller is an open-source Python package for neural network compression research. Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.
nteract Modeling & Training Notebook 2015 USA nteract is an open-source organization committed to creating fantastic interactive computing experiences that allow people to collaborate with ease. We build SDKs, applications, and libraries that help you and your team make the most of interactive (particularly Jupyter) notebooks and REPLs.
Nuvia Hardware Accelerator 2019 USA Silicon design reimagined for a compute-intensive world.
Obliviously AI All-in-one AI Apps platform 2018 USA The entire process of running Data Science - building Machine Learning algorithm, explaining results and predicting outcomes, packed in one single click.
OctoML Serving Deployment 2019 USA Optimize machine learning and deep learning models for deployment. From the creators of Apache TVM, XGBoost and Apache MxNet, OctoML brings the cutting edge of AI, Systems, programming languages, compilers and architecture to make machine learning systems easier to optimize and deploy.
Octopai Data pipeline Data management 2015 Israel An automated, centralized, cross-platform metadata search engine that enables BI groups to quickly and precisely discover and govern shared metadata.
ONNX Serving Compatibility 2018 ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
OpenBridge All-in-one ETL Platofrm/Tools Commercial SaaS Openbridge is a data logistics platform that manages the real-time flow of consumer data, big or small, delivering it exactly where it needs to be to create value for customers.
OpenSeq2Seq Modeling & Training NLP 2017 USA Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
OpenText Integration Center All-in-one ETL Platofrm/Tools Commercial On Premises A native integration platform to extract, enhance, transform, integrate, and migrate data and content across the enterprise.
Oracle Data Integrator All-in-one ETL Platofrm/Tools Commercial On Premises Oracle Data Integrator is a comprehensive data integration platform that covers all data integration requirements, including batch loads, integration processes, and SOA-enabled data services.
Owox All-in-one ETL + Analytics SaaS
Pachyderm Data pipeline Versioning 2014 USA Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. And… It's open source!
Paddle Modeling & Training Distributed 2016 China PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice
Pandas All-in-one ETL Platofrm/Tools Open Source SaaS/On Premises Pandas is an ETL batch processing library with Python-written data structures and analysis tools. Python's Pandas expedite processing of unstructured/semi-structured data. The libraries are used for low-intensity ETL tasks including data cleansing and working with small structured datasets post-transformation from semi or unstructured sets. All-in-one ETL Platofrm/Tools Commercial SaaS Panoply automates data management tasks associated with running big data in the cloud. Smart Data Warehouse require no schema, modeling, or configuration. Panoply features an ETL-less integration pipeline that can connect to structured and semi-structured data sources. It also offers columnar storage and automatic data backup to a redundant S3 storage framework.
papermill Modeling & Training Notebook 2017 USA Papermill is a tool for parameterizing and executing Jupyter Notebooks.
Paperspace Infrastructure Cloud management 2014 USA GPU cloud tools built for developers. Powering next-generation workflows and the future of intelligent applications.
Apache Parquet Data pipeline File format 2013 USA Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Paxata All-in-one ETL Platofrm/Tools Commercial SaaS Paxata is the first interactive, self-service data preparation solution built for everyone who works with data, from business analysts to data scientists.
Peltarion All-in-one AI Apps platform 2005 Sweden A single AI platform, for real world deployments, without code. Fast & Efficient Production of AI Applications. Rich data capability. Develop AI Services fast. Usable & Affordable AI.
PerceptiLabs Modeling & Training Visual modeling 2019 USA PerceptiLabs takes the process of building and training a machine learning model to warp speed. We not only accelerate machine learning, we advance explainability in AI
Pervasive Data Integrator All-in-one ETL Platofrm/Tools Commercial On Premises Pervasive Data Integrator supports both data integration and application integration, and runs on premises, in the cloud, or hybrid.
Petl Data pipeline ETL Platofrm/Tools Open Source SaaS/On Premises Petl is a stream processing engine ideal for handling mixed quality data. This Python ETL tool helps data analysts with little to no prior coding experience quickly analyze datasets stored in CSV, XML, JSON, and many other data formats. You can sort, join, and aggregate transformations with minimal effort.
Petuum All-in-one Data management 2016 USA Petuum accelerates and simplifies AI solutions so your enterprise can deploy it easily and maintain it effortlessly.
Picsell.ia All-in-one Computer Vision 2020 France Picsell.ia is a development platform dedicated to Computer Vision. From open-source to business, you can create and review datasets, track your experiments and follow your project in a Lean AI mode.
Pilosa Data pipeline Database/Query 2017 USA Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets.
PlaidML Modeling & Training Hardware compatiblity 2017 USA PlaidML is a framework for making deep learning work everywhere
Playment Data pipeline Labeling 2015 India Build high-quality ground truth datasets with ML-assisted tools, sophisticated project management software, expert human workforce, and much more.
Plotly Serving App interface 2013 Canada Plotly is a data science and AI company that makes it easy to create and deploy interactive web apps in any programming language.
Polyaxon All-in-one Serving 2016 Germany A platform for reproducing and managing the whole life cycle of machine learning and deep learning applications.
Precisely Data pipeline ETL Platofrm/Tools Commercial SaaS Precisely offers its data integration capabilities via two product families, Precisely Connect and Precisely Ironstream. The company’s flagship application and data integration tools are the Precisely Connect product family.
PredictionIO Serving Web 2013 USA Apache PredictionIO is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture.
Prefect Infrastructure Workflow orchestration 2018 USA The Global Leader in Dataflow Automation
Presto Data pipeline Database/Query 2012 USA Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Prodigy Data pipeline Labeling 2017 Germany Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. ... With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection.
Prometheus Data pipeline Monitoring 2012 Germany An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
Prophesee Hardware Edge devices 2014 France With the world’s most advanced Event-Based Vision systems, inspired by human vision and built on the foundation of neuromorphic engineering. PROPHESEE is the revolutionary system that gives Metavision to machines, revealing what was previously invisible to them.
pygrametl Data pipeline ETL Platofrm/Tools Open Source On Premises pygrametl allows for ETL programming in Python.
Pyro Modeling & Training Programming language 2017 USA Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch
PySyft Modeling & Training Privacy 2017 UK PySyft is a Python library for secure and private Deep Learning. PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Multi-Party Computation (MPC) within the main Deep Learning frameworks like PyTorch and TensorFlow.
Pythia Modeling & Training Framework 2018 USA A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
PyTorch Modeling & Training Framework 2015 USA Tools & Libraries. A rich ecosystem of tools and libraries extends PyTorch and supports development in computer vision, NLP and more
PyTorch Lightning Modeling & Training Framework 2019 USA The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Qlik Data Integration Data pipeline ETL Platofrm/Tools Commercial SaaS Deliver analytics-ready data to the cloud in real-time with modern DataOps for analytics from Qlik.
Qri Data pipeline Versioning 2016 USA Bigger than a spreadsheet, smaller than a database, datasets are all around us. Use Qri to browse, download, create, fork, & publish datasets across a network of peers.
Quilt Data Data pipeline Versioning 2015 USA Quilt is a versioned data portal for AWS
Quobyte Data pipeline Storage 2013 Germany Quobyte is software defined storage that turns commodity servers into a reliable and highly automated data center file system.
Rasa Modeling & Training NLU 2016 Germany Build contextual AI assistants and chatbots in text and voice with our open source machine learning framework. Scale it with our enterprise grade platform.
Ray Modeling & Training Distributed 2016 USA Ray is a fast and simple framework for building and running distributed applications.
Relational Junction ETL Manager All-in-one ETL Platofrm/Tools Commercial On Premises Relational Junction ETL Manager lets you extract, transform, and load production data into your data warehouse.
RelicX Serving CI/CD 2020 USA RelicX is a venture funded startup building an AI DevOps platform that brings CX intelligence into the CI/CD pipeline to ensure software release readiness based on real user behavior and customer experience.
Replicate Modeling & Training Versioning 2020 USA Version control for machine learning
Riko All-in-one ETL Platofrm/Tools Open Source SaaS/On Premises Riko is an apt replacement for Yahoo Pipes. It continues to be ideal for startups possessing low technological expertise.
River Modeling & Training Online learning 2017 France A Python package for online/streaming machine learning.
Rivery All-in-one ETL Platofrm/Tools Commercial SaaS Rivery is a SaaS integration tool that lets you consolidate all your data from both internal and external sources into a single data platform in the cloud.
Robust AI All-in-one Robotics 2019 USA Robust.AI: Creating a New Foundation for the Future of Robotics.
Rockset Data pipeline Database/Query 2016 USA Rockset: The Real-Time Indexing Database in the Cloud Rockset allows you to build data-driven applications on MongoDB, DynamoDB, ... AI. Test, validate and deploy models faster by analyzing live data in real-time.
Rubrik Data pipeline Data management 2013 USA We provide a powerful, policy-driven platform to simplify recovery and unlock insights from data residing in the data center and cloud.
RudderStack All-in-one ELT & Reverse-ETL Commercial SaaS All your customer data pipelines in one platform
Sagent Data Flow All-in-one ETL Platofrm/Tools Commercial On Premises Sagent Data Flow from Pitney Bowes Software is a powerful and flexible integration engine that collates data from disparate sources and provides data transformation tools.
SambaNova Hardware Accelerator 2017 USA SambaNova Systems is a computing startup focused on building machine learning and big data analytics platforms.
SAP BusinessObjects Data Services All-in-one ETL Platofrm/Tools Commercial On Premises Unlock meaning from all of your organization’s data – structured or unstructured – with data integration, quality, cleansing, and more.
SAS Data Management All-in-one ETL Platofrm/Tools Commercial On Premises SAS Data Management helps transform, integrate, govern, and secure data while improving its overall quality and reliability.
Scale AI Data pipeline Data generation 2016 USA Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more.
scikit-learn Modeling & Training Framework 2010 Remote Machine Learning in Python
Scrapinghub Data pipeline Data generation 2010 Ireland Turn websites into data with the world's leading web scraping services & tools from the creators of Scrapy. Data extraction trusted by industry leaders.
scribble Data Modeling & Training Feature engineering 2016 India The feature store for your ML engineering needs
Scriptella All-in-one ETL Platofrm/Tools Open Source On Premises Scriptella is an open source ETL and script execution tool written in Java.
Segment All-in-one ETL Platofrm/Tools Commercial SaaS Segment collects user data with one API and sends it to hundreds of tools or a data warehouse. Data pipeline Labeling 2020 Belgium Deep learning-fueled labeling technology with a focus on instance and semantic segmentation.
Seldon Serving Serving 2011 UK Manage, serve and scale models built in any framework on Kubernetes. Take your ML projects from POC to production.
SHAP Modeling & Training Interpretability 2017 USA A game theoretic approach to explain the output of any machine learning model.
SigOpt Modeling & Training Hyperparameter tuning 2014 USA SigOpt is a standardized, scalable, enterprise-grade optimization platform and API designed to unlock the potential of your modeling pipelines. Hardware Edge devices 2018 USA Is your ML Green?TM We believe that the future of compute is high performance machine learning at the edge – and today, power is the limiter.
Singer All-in-one ETL Platofrm/Tools Open Source SaaS,On Premises Singer is an open source standard for writing scripts that move data.
Sisu Data pipeline Analytics platform 2018 USA Sisu is the fastest, most comprehensive augmented analytics platform letting you ... You can't keep up with changing metrics using manual data exploration.
Skyvia All-in-one ETL Platofrm/Tools Commercial SaaS Skyvia’s Data Integration tool contains a wide range of data-related scenarios which can be created directly from the user interface.
SnapLogic Elastic Integration Platform All-in-one ETL Platofrm/Tools Commercial SaaS SnapLogic Elastic Integration Platform handles both structured and unstructured data, with point-to-point integration functionality in hybrid integration use cases.
Snorkel Data pipeline Labeling 2016 USA Programmatically Building and Managing Training Data
Snorkel AI All-in-one AI Apps platform 2019 USA Programmatically Building and Managing Training Data
spaCy Modeling & Training NLP 2014 Germany spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Spark Data pipeline Data processing 2009 Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Spell Modeling & Training Experiment tracking 2017 USA Spell is a powerful platform for building and managing machine learning projects. Spell takes care of infrastructure, making machine learning projects easier to start, faster to get results, more organized and safer than managing infrastructure on your own.
SQLFlow Data pipeline Database/Query 2019 China Extends SQL to support AI. Extract knowledge from Data. Currently support MySQL, Apache Hive, Alibaba MaxCompute, XGBoost and TensorFlow.
Starburst Data Data pipeline Database/Query 2017 USA Limitless Queries. Break boundaries and harness the power of the world's fastest SQL query engine.
Starfish All-in-one ETL Platofrm/Tools Commercial SaaS #N/A
Stitch All-in-one ETL Platofrm/Tools Commercial SaaS Stitch is a simple, powerful ETL service for businesses of all sizes, up to and including the enterprise. Running on a scalable, fault-tolerant cloud platform, Stitch integrates data from dozens of different sources.
Storbyte Data pipeline Storage 2014 DC Storbyte designs and manufactures all-flash & hybrid flash enterprise storage arrays that offer performance, power management, availability, reliability, density, efficiency, flexibility, expandability, and affordability. Storbyte is providing innovative data storage solutions and has not lost sight of what is important to end users: a responsible, cost-correct price point.
Stradigi AI All-in-one AI apps platform 2017 Canada Stradigi AI's powerful AI business platform, Kepler, fuels tangible results for enterprises. No AI or machine learning experience required.
Streamlit Modeling & Training App interface 2018 USA Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in hours, not weeks. All in pure Python.
StreamSets All-in-one ETL Platofrm/Tools Commercial SaaS StreamSets is a DataOps and real-time Google Cloud ETL tool. It provides data monitoring and supports a variety of data sources and destinations for data integration. Many enterprises use it to integrate dozens of data sources for analysis. It supports data protectors with data security guidelines like GDPR and HIPAA.
StreamSets Data Collector All-in-one ETL Platofrm/Tools Commercial On Premises The StreamSets Data Collector is a low-latency ingest infrastructure tool that lets you create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE).
Striim Data pipeline ETL Platofrm/Tools Commercial SaaS Unify your data in Google Cloud with a full suite of real-time data integration solutions. Whether it's automated database migrations to Google Cloud or data integration for BigQuery, Striim will help you get there faster.
Superb AI Data pipeline Data management 2018 USA Create, label and manage ML training data efficiently so you can build AI faster. Fully managed workforce. Powerful labeling tools. Training data quality control.
Supermetrics All-in-one ETL Platofrm/Tools Commercial SaaS Supermetrics is a managed data pipeline that makes it easy for marketers, data analysts, and data engineers to move any marketing metrics into a data warehouse in Snowflake, BigQuery, or Azure Synapse Analytics
Supervisely All-in-one Computer vision 2017 USA First available ecosystem to cover all aspects of training data development. Manage, annotate, validate and experiment with your data without coding. Serving Monitoring 2019 Israel Monitor your AI from the moment it meets reality so you can finally trust every model
Syncsort DMX All-in-one ETL Platofrm/Tools Commercial On Premises DMX supports mainframe, legacy, and big data sources, and provides a no-code approach to join datasets.
Synthetaic Data pipeline Data generation 2019 USA We grow high-quality data that unlocks impossible AI. What if edge cases no longer existed? What if training data was no longer a constraint?
Syntiant Hardware Edge devices 2017 USA Always-On Voice powered by custom AI Silicon
Talend All-in-one ETL Platofrm/Tools Open Source On Premises Talend is a big data and cloud data integration software. Talend is built on Eclipse graphic environment. It also supports scaling massive data sets and advanced data analytics. It has partnered with leading cloud service providers, analytics platforms, data warehouses such as Google Cloud Platform, Amazon Web Services (AWS), Snowflake, etc. It acts as a connector to other software as Saas.
talos Modeling & Training Hyperparameter tuning 2018 Finland Hyperparameter Optimization for TensorFlow, Keras and PyTorch
Tamr Data pipeline Data management 2012 USA Tamr's leading data management system and services work to create a data migration strategy that simplifies your data unification process. Talk with us today. Modeling & Training AutoML 2015 Turkey TAZI’s Automated Machine Learning is understandable continuous machine learning from data and humans, enables business domain experts to use machine learning to make predictions and take actions. It also helps data analysts and scientists for their daily model creation and deployment.
Tecton All-in-one Deployment 2019 USA The Data Platform for Machine Learning. Build a library of great features. Serve them in production. Do it at scale.
Tensorboard Modeling & Training Experiment tracking 2015 USA TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. It enables tracking experiment metrics like loss and accuracy, visualizing the model graph, projecting embeddings to a lower dimensional space, and much more.
TensorFlow Modeling & Training Framework 2015 USA An end-to-end open source machine learning platform for everyone. Discover TensorFlow's flexible ecosystem of tools, libraries and community resources
TensorFlow Extended Serving Deployment 2019 USA TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines
TensorFlow Lite Serving Mobile 2019 USA TensorFlow Lite is an open source deep learning framework for on-device inference.
TensorRT Serving Inference 2019 USA NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
TerminusDB Data pipeline Database/Query 2017 Ireland TerminusDB is an open source model driven graph database for knowledge graph representation designed specifically for the web-age.
Textur Data pipeline ETL Platofrm/Tools Commercial SaaS Textur unifies data from your data silos and provides a powerful SQL interface for modeling your business.
Theano Modeling & Training Framework 2008 Canada Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
TPOT Modeling & Training AutoML 2016 USA A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
TransmogrifAI Modeling & Training AutoML 2017 USA an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Treasure Data All-in-one ETL Platofrm/Tools Commercial SaaS Treasure Data connects data and teams together with a full suite of tools that automate data collection and processing.
Trifacta All-in-one ETL Platofrm/Tools Commercial SaaS Trifacta is an interactive cloud platform for data engineers and analysts to collaboratively profile, prepare, and pipeline data for analytics and machine learning
Truera Modeling & Training Explanability 2019 USA The Truera Model Intelligence Platform powered by Enterprise-Class AI Explainability eliminates the machine learning black box with Model Intelligence.
tsfresh Modeling & Training Feature engineering 2008 Germany Automatic extraction of relevant features from time series
Tumult Labs Data pipeline Privacy 2019 USA Unleashing the power of data with ironclad privacy protection
Tune Modeling & Training Hyperparameter tuning 2017 USA Tune is a Python library for hyperparameter tuning at any scale.
Turi Create Modeling & Training Framework Open Source 2018 USA Turi Create simplifies the development of custom machine learning models.
Unravel Data Serving Monitoring 2013 USA Unravel provides full-stack visibility and AI-powered guidance to help you understand and optimize the performance of your data-driven applications.
V7Labs Data pipeline Labeling 2018 UK Create the Sense of Sight Label, train, and deploy artificial intelligence that effortlessly learns new objects from your data.
Vaex Data pipeline Data processing 2015 Netherlands Power up your business with our data driven solutions. With our unique, state-of-the-art technology, we provide fast and scalable solutions that will make you more agile, while limiting unnecessary resources.
Valohai Infrastructure Workflow orchestration 2016 Finland The MLOps platform for the whole team. Valohai takes you from POC to production while managing the whole model lifecycle.
Vearch Data pipeline Database/Query Open Source 2019 China Vearch is the vector search infrastructure for deeping learning and AI applications.
VertaAI Serving Monitoring 2019 USA Verta.AI is a Palo Alto-based startup building software infrastructure to help enterprise data science and machine learning (ML) teams rapidly develop and deploy ML models.
Vexata Data pipeline Storage 2014 USA Vexata is an active data infrastructure company that accelerates database and analytic platforms via groundbreaking storage solutions.
Vowpal Wabbit Modeling & Training Online learning 2010 Vowpal Wabbit provides a fast, flexible, online, and active learning solution that empowers you to solve complex interactive machine learning problems
Voxel51 // Scoop Data pipeline Data quality 2018 USA We build software that enables ML engineers to build better models, more quickly. Try FiftyOne, our powerful platform for dataset curation, analysis, and model
Waterline Data Data pipeline Data management 2013 USA Waterline's enterprise data catalog enables data professionals to discover, govern, and rationalize an organization's data lake.
Wave Computing Hardware Accelerator 2008 USA Wave Computing is revolutionizing AI and deep learning with its dataflow-based systems and embedded solutions.
Weights & Biases Modeling & Training Experiment tracking 2017 USA We're building developer tools for deep learning. Add a couple lines of code to your training script and we'll keep track of your hyperparameters, system metrics, and outputs so you can compare experiments,
XGBoost Modeling & Training Framework 2014 XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. Serving Model compression 2016 USA Transform your business with on-device AI.
Xpanse AI All-in-one AI Apps platform 2015 Ireland The power of AI at the click of a button. Xpanse AI brings easy to use and lightning fast analytics to your business.
Xplenty All-in-one ETL Platofrm/Tools Commercial SaaS Xplenty's data integration platform streamlines data processing, reducing time spent and allowing businesses to focus on insight over preparation.
Yellowbrick Data Data pipeline Data warehouse 2014 USA The ultimate solution for your data warehouse. Quick to deploy, easy to expand, and simple to manage. Yellowbrick Data can solve your data problems.
Zero ASIC Hardware Edge devices 2020 USA Removing the Barrier to Custom Silicon
Zilliz Data pipeline Database/Query 2017 China The company specializes in the development of open-source, AI-powered unstructured data analysis software, and is the initiator and primary contributor to the vector similarity search project Milvus.
Snowflake All-in-one Data management 2012 USA
Google BigQuery All-in-one Data management 2010 USA
dbt Data transformation ELT Tool 2016 USA
Looker BI Tool 2011 USA
Mode BI Tool 2012 USA
Census All-in-one ELT & Reverse-ETL 2018 USA
Hightouch All-in-one ELT & Reverse-ETL 2018 USA
Grouparoo All-in-one ELT & Reverse-ETL Open Source USA
Polytomic All-in-one ELT & Reverse-ETL USA Data within companies is fragmented. Sales, Marketing, Support, Finance, and Operations teams spend enormous amounts of time repeatedly hunting for data that lives outside of their home systems.
Rudderstack All-in-one ELT & Reverse-ETL 2019 USA RudderStack elegantly handles every piece of data from every source and syncs it with every tool in your stack.
Seekwell ELT & Reverse-ETL USA Get your SQL data in the places you need it like Google Sheets, Salesforce, Zendesk, and Slack.
Name Website Cat SubCat Type Deployment Started HQ Description

The first step before selecting a data stack is to make sure you have created your use cases on what you want to do with the data you have, when that is clear then you can select the tools.