What is Reverse ETL?

Sun 13 2022

What is Reverse ETL?

by bernt & torsten

The data engineering industry has evolved from Extract, Transform, Load (ETL) to ELT, where raw data is copied from the source system loaded into a data warehouse or data lake and then transformed. The current trend is to adopt a new approach, called “reverse ETL,” the process of moving data from a data warehouse into third-party systems to make data operational

In this type of data stack, the data warehouse becomes the single source of truth for data, including customer data that can be spread across different systems. Solutions that have enabled this new architecture include Fivetran, Airbyte and Cloud Function for (EL), DBT for (T), BigQuery, Snowflake and Redshift for the data warehouse.

Traditionally data stored in data warehouses were used for analytical workloads and business intelligence applications like Looker and Superset. New uses cases have appeared in that data can be further utilized for operational analytics, which drives action by automatically delivering real-time data to your organization. where it matters.

There are many use cases for reverse ETL, having a consistent view of the customer across all systems mirroring product usage data can help improve customer interactions by supporting personalized messages that include product metrics. Pushing data to Salesforce you can have an up-to-date list of high lifetime value customers or customers that spend more than a defined amount or how the customer interacts with your organization.

Syncing customer data into your support portal can save time when responding to support requests or automatically prioritize messages when they come in.

Write your own Data Connectors

You could write your own API connectors to extract or to push data from the data warehouse e.g to pipe the data into operational systems like Salesforce, Marketo, HubSpot. Writing your own data pipeline connectors can be done with Cloud Function, there is a downfall with going this route in that it can be hard to write these connectors because endpoints may be brittle and most APIs are not built to handle real-time data transfer.

Data teams must setup batching, retries, and checkpointing to avoid rate limits. Mapping fields from the data warehouse to SaaS products can take time. From there, it can be challenging to maintain the connectors over time because API specs change.

Why Reverse ETL tool

Reverse ETL solutions offer out-of-the-box connectors to numerous systems, so teams no longer need to write and maintain their own connectors. In doing it in-house with Cloud Functions teams might have only written a few connectors for systems like Salesforce, Marketo, HubSpot as it takes time, and when connectors go live time is spent on maintenance, even having to plan for regular API compliance to make sure specs has not changed.

Create customer segmentation, audiences, and lead scoring through a visual analysis interface or dbt model outputs that can be pushed downstream. Using a reverse ETL tool, your data team can now push data into more systems, getting better use of the data. reverse ETL tools provide a visual interface to choose which query output columns are used to populate standard and custom fields, allowing you to continuously sync or define what triggers the syncing between the systems.

For example, after a dbt job is run it can trigger the sync, reverse ETL solutions log and monitor sync status and progress and notify teams if they need attention.

Using a reverse ETL tool will allow data teams to maintain a single data pipeline compared to multiple. They no longer have to write scripts and have visibility and control over syncs. Sales, marketing, growth, and analytics teams can analyze and act upon the same, consistent, and reliable data. Data consistency helps create continuity across the business since functional teams are working off the same data even if using e.g. Salesforce, Marketo, HubSpot, and will accelerate decision-making.

There are over 300 companies from startups, open-source and commercial companies that offer ETL or ELT or Reverse ETL, as you can see from the table below it is not easy to find the solutions that fit your need.

Data Solutions

Name	Website	Cat	SubCat	Type	Deployment	Started	HQ	Description
Abacus AI		All-in-one	AutoML	Commercial		2019	USA	Abacus.AI makes it effortless to create large-scale customizable deep learning systems. Accurate predictions generated by our system can be easily and securely incorporated into all aspects of your customer experience and business processes
Accord		Modeling & Training	Framework	Commercial		2012	France	Machine learning, computer vision, statistics and general scientific computing for .NET
Actian		Data pipeline	Data management	Commercial	On Premises			Actian DataConnect aggregates data from any source, whether on premises or in the cloud, in a database, or in a SaaS application.
Adeptia Integration Suite		Data pipeline	Data management	Commercial	On Premises			Adeptia offers self-service ETL capabilities to business users and data scientists. Developers can use it for data validations, cleansing, routing, exception-handling, and back-end connectivity.
Aible		All-in-one	Serving	Commercial		2018	USA	Create AI that delivers impact, not accuracy, with cost-benefit tradeoffs & operational constraints, in a friendly, intuitive UI designed for real business.
AIMET		Modeling & Training	Model compression	Open Source		2020	USA	AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Airbyte		Data pipeline	ETL Platofrm/Tools	Commercial / Open Source	SaaS/On Premises			Get all your ELT data pipelines running in minutes, even your custom ones. Let your team focus on insights and innovation
Aircloak		Data pipeline	Privacy			2012	Germany	Aircloak's unique approach ensures the existing primary database is not modified in any way. Aircloak handles all data types including unstructured text.
Airflow		Infrastructure	Workflow orchestration	Open Source	Hybrid and multi-cloud	2015	USA	Airflow is a modern platform that designs, creates, and tracks workflows. It is an open-source Google Cloud ETL tool. It supports integration with cloud services, including Google Cloud Platform, Azure, and AWS. It offers a user-friendly interface and provides clear visualization. Scaling becomes very easy with Airflow due to its modular structure.
Alectio		Modeling & Training	Active learning			2019	USA	Not all data is created equal You can build better models with less data. We can show you how.
Algorithmia		Serving	Serving			2013	USA	Algorithmia makes applications smarter, by building a community around algorithm development, where state of the art algorithms are always live and accessible to anyone
Alink		Modeling & Training	Framework			2018	China	Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Allegro AI/TRAINS		Modeling & Training	Experiment tracking			2016	Israel	Deep learning platform tailored for computer vision. Allegro AI offers the first end-to-end machine learning product life-cycle management solution.
AllenNLP		Modeling & Training	NLP			2016	USA	AllenNLP is an open-source NLP research library, built on PyTorch.
Alluxio		Data pipeline	Data management			2015	USA	an open source data orchestration layer that brings data close to compute for big data and AI/ML workloads in the cloud.
Alooma		Data pipeline	Data management	Commercial	SaaS			Alooma is a real-time data pipeline that lets you integrate any data source – databases, applications, and any API - with your data warehouse.
Alteryx		Data pipeline	Data management	Commercial	On Premises	2011	USA	Alteryx allows you to prep, blend, and analyze data using a repeatable workflow, then deploy and share analytics for deeper insights in hours, not weeks.
Amazon Redshift		Data pipeline	Data warehouse			2012	USA	Amazon Redshift is a fast, fully managed, and cost-effective data warehouse that gives you petabyte scale data warehousing and exabyte scale data lake analytics together in one service. Amazon Redshift is up to ten times faster than traditional on-premises data warehouses.
Amundsen		Data pipeline	Database/Query			2019	USA	Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Angel ML		Modeling & Training	Distributed			2017	China	A Flexible and Powerful Parameter Server for large-scale machine learning
Anodot		Data pipeline	Data monitoring			2014	Israel	We monitor your business. Anodot monitors all your data in real time for lightning fast detection of the incidents that impact your revenue
Anyscale		Infrastructure	Cloud management			2019	USA	From the creators of Ray, a framework for building machine learning applications at any scale originating from the UC Berkeley RISELab.
Anyverse		All-in-one		Commercial				Accelerate advanced perception system development with hyperspectral synthetic data that mimics exactly what your sensors see
Apache Druid		Data pipeline	Database/Query			2012	USA	Apache Druid is a high performance real-time analytics database
Apache Flink		Serving	Stream processing			2011	Germany	Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities
Apache Hudi		Data pipeline	Data warehouse			2016	USA	Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores)
Apache Kafka		Serving	Stream storage	Open Source ETL		2011	USA	Apache Kafka is an open-source distributed event streaming platform used by many companies to develop high-performance data pipelines, perform streaming analytics and data integration.
Apache Mahout		Modeling & Training	Framework			2008	Remote	Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end, or can be extended to other distributed backends.
Apache MXNet		Modeling & Training	Framework			2015		A flexible and efficient library for deep learning.
Apache NiFi		Data pipeline	ETL Platofrm/Tools	Open Source ETL				Apache NiFi is an open-source ETL tool and is free for use. It allows you to visually assemble programs from boxes and run them without writing code. So, it is ideal for anyone without a background in coding. It can work with numerous different sources, including RabbitMQ, JDBC query, Hadoop, MQTT, UDP socket, etc. You can use it to filter, adjust, join, split, enhance, and verify data.
Apache ORC		Data pipeline	File format			2013		the smallest, fastest columnar storage for Hadoop workloads.
Apache Spark		Data pipeline	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Apache Spark is an excellent ETL tool for Python-based automation for people and enterprises that work with streaming data. Growth in data volume is proportional to business scalability, making automation necessary and relentless with Spark ETL.
Apache Superset		All-in-one		Open Source				Apache Superset is a modern, enterprise-ready business intelligence web application. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple pie charts to highly detailed deck.gl geospatial charts.
Apache TVM		Serving	Inference			2017		Apache TVM (incubating) is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.
Aparavi		Data pipeline	Data management			2016	USA	Aparavi's highly scalable data intelligence and automation solutions enable organizations to easily discover, classify, protect, and optimize their data.
ApatarForge		Data pipeline	ETL Platofrm/Tools	Open Source	Hybrid and multi-cloud			Apatar is an open source data integration and ETL tool written in Java.
AresDB		Data pipeline	Database/Query	Open Source		2019	USA	A GPU-powered real-time analytics storage and query engine.
Argo		Serving	CI/CD	Open Source		2018	USA	Get stuff done with Kubernetes. Open source Kubernetes native workflows, events, CI and CD
Arize AI		Serving	Monitoring			2019	USA	Arize AI is the watcher, troubleshooter and the guardrail on deployed AI
Arthur AI		Serving	Monitoring			2018	USA	Always-on Explainability, Bias, and Performance Monitoring for AI, ML, and analytics. Get up and running in minutes and start sleeping better at night. Dedicated. Innovative.
Ascend.io		Data pipeline	Data management			2015	USA	Experience continuously optimized data pipelines with less code and fewer breakages. Enter the new era of data engineering with Ascend's autonomous dataflow service.
Astera Centerprise		Data pipeline	ETL Platofrm/Tools	Commercial	On Premises			Centerprise ETL offers data warehouse loading functionality, including the Slowly Changing Dimension (SCD) transformation.
Astronomer		Data pipeline	ETL Platofrm/Tools	Commercial	Hybrid and multi-cloud			Build, run, and manage data pipelines-as-code at enterprise scale with Apache Airflow, the most popular open source orchestrator.
AtScale		Data pipeline	Data management			2013	USA	Freedom of choice for the enterprise. Break free the complexities and security risks associated with cloud migration and self-service analytics with Intelligent Data Virtualization—no matter where dat.
Backend AI		Infrastructure	Workflow orchestration			2016	South Korea	Backend.AI: Minute-made GPU clustering solution for Machine Learning.
BentoML		Modeling & Training	Pretrained models			2018	USA	BentoML makes it easy to serve and deploy machine learning models in the cloud. It is an open source framework for building cloud-native model serving services. BentoML supports most popular ML training frameworks and deployment platforms, including major cloud providers and docker/kubernetes.
Blaize		Hardware	Edge devices			2010	USA	Intelligence at the edge of everywhere. Blaize unleashes the potential of AI to drive leaps in the value that technology delivers to transform markets and improve the way we all work and live.
Blendo		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Blendo provides a data management platform that connects, reshapes, and delivers actionable data, with a focus on simple integration procedures and automated data collection.
Bonobo		Data pipeline	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Bonobo is an open-source, Python-based ETL pipeline deployment and data extraction tool. You can leverage its CLI to extract data from SQL, CSV, JSON, XML, and many other sources.
Boruta		Modeling & Training	Feature engineering	Open Source		2010		Python implementations of the Boruta all-relevant feature selection method.
Boulder AI		Hardware	Edge devices			2017	USA	Human insight and decision making on a visual sensor.
BrainChip		Hardware	Edge devices			2006	USA	BrainChip brings artificial intelligence to the edge with a high-performance, small, ultra-low power solution that enables continuous learning and inference.
Bubbles		Data pipeline	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Bubbles is a Python framework for data processing and data quality measurement. Basic concept are abstract data objects, operations and dynamic operation dispatch.
Built.io Flow		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Built.io Flow is a drag-and-drop tool for building enterprise integrations.
Cadence		Infrastructure	Workflow orchestration			2017	USA	Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
Caffe		Modeling & Training	Framework			2013	USA	Caffe: a fast open framework for deep learning.
Cambricon		Hardware	Accelerator			2016	China	Cambricon Technologies builds core processor chips for intelligent cloud servers, intelligent terminals, and intelligent robots.
Catalyst		Modeling & Training	Framework	Open Source		2018	Russia	PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write another regular train loop.
Cazena		Data pipeline	Data management			2014	USA	First Data Lake with a SaaS Experience. Cazena empowers enterprises to collect, store and analyze any data in the cloud, without any DevOps resources or admin time. Cazena's Data Lake as a Service includes everything, and is delivered as secure SaaS, ready to load, store and analyze data with any method: SQL, Spark, R, Python, and many more.
CDAP		Data pipeline	ETL Platofrm/Tools	Open Source ETL	Hybrid and multi-cloud			Interoperability across on-premises and Cloud environments; Support for all major public cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud Platform.
CData Software		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			CData Software offers data integration solutions for real-time access to online or on-prem applications, databases, and Web APIs. The vendor specializes in providing access to data through established data standards and application platforms such as ODBC, JDBC, ADO.NET, SSIS, BizTalk, and Microsoft Excel. CData Software products are broken down into six categories: driver technologies, enterprise connectors, data visualization, ETL and ELT solutions,
Cerebras		Hardware	Accelerator			2016	USA	AI insights, faster Cerebras is a computer systems company dedicated to accelerating deep learning. The pioneering Wafer-Scale Engine (WSE) – the largest chip ever built – is at the heart of our deep learning system, the Cerebras CS-1.
Chainer		Modeling & Training	Framework			2015	Japan	A Powerful, Flexible, and Intuitive Framework for Neural Networks
Civis Analytics		All-in-one	Analytics/Ai	Commercial				Civis Turns Data Into Campaigns That Drive Action
ClearSky Data		Data pipeline	Storage			2014	USA	ClearSky Data offers enterprise storage as a hybrid cloud service delivering on-demand primary storage, offsite backup, and DR as a single service.
CleverHans		Modeling & Training	Adversarial robustness			2017	USA	An adversarial example library for constructing attacks, building defenses, and benchmarking both
Clipper		Serving	Web			2017	USA	Clipper is a low-latency prediction serving system for machine learning. Clipper makes it simple to integrate machine learning into user-facing serving systems.
Cloudera		Infrastructure	Cloud management			2008	USA	Cloudera delivers an Enterprise Data Cloud for any data, anywhere, from the Edge to AI.
CloverETL		All-in-one	ETL Platofrm/Tools	Open Source	On Premises			CloverETL is a data integration software suite for data migration and data warehousing, and for feeding data into business intelligence and reporting applications.
Cohesity		Data pipeline	Data management			2013	USA	Eliminate mass data fragmentation with Cohesity's modern approach to data management, beginning with backup. Gain instant recovery. Learn more today.
Colab		Modeling & Training	Notebook			2017	USA	Colab notebooks allow you to combine executable code and rich text in a single document, along with images, HTML, LaTeX and more.
Comet		Modeling & Training	Experiment tracking			2017	USA	Comet lets you track code, experiments, and results on ML projects. It’s fast, simple, and free for open source projects.
Confluent		Data pipeline	Stream processing			2014	USA	Confluent is a fully managed Kafka service and enterprise stream processing platform. Real-time data streaming for AWS, GCP, Azure or serverless. Try free!
Core ML		Serving	Mobile			2017	USA	Use Core ML to integrate machine learning models into your app. Core ML provides a unified representation for all models.
Cortex		Serving	Web			2019	USA	Cortex is an open source platform for deploying machine learning models as production web services.
Cubonacci		All-in-one	AI Apps platform			2018	Netherlands	Machine learning lifecycle management Cubonacci enables organizations to focus on developing custom machine learning models without having to worry about peripheral matters. The Cubonacci platform manages deployment, versioning, infrastructure, monitoring and lineage for you, eliminating risk and minimizing time-to-market.
cuDF		Data pipeline	Data processing			2018	USA	Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
DAGsHub		Modeling & Training	Versioning			2019	Israel	DAGsHub is a platform for data version control and collaboration for data scientists and machine learning engineers.
DarwinAI		Modeling & Training	Explanability			2017	Canada	DarwinAI’s Generative Synthesis 'AI building AI' technology enables optimized and explainable deep learning.
Dash		Serving	App interface			2015	Canada	Dash Enterprise is the end-to-end development & deployment platform for low-code AI Dash applications.
Dask		Data pipeline	Data processing			2015	Remote	Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
Databricks		All-in-one	Data management	Commercial		2013	USA	All your data, analytics and AI on one lakehouse platform
Dataddo		All-in-one	ETL + Analytics	Commercial	SaaS			Your data, from any source, to any destination
Datadog		Infrastructure	Cloud management			2010	USA	See inside any stack, any app, at any scale, anywhere.
Datagrok		All-in-one	Data processing			2019	USA	Datagrok: Swiss Army Knife for Data. A platform for turning data into actionable insights
Dataiku		All-in-one	AI Apps platform			2013	USA	Dataiku's single, collaborative platform powers both self-service analytics and the operationalization of machine learning models in production.
DataRobot		All-in-one	AI Apps platform			2012	USA	DataRobot combines a trusted enterprise AI platform and a trusted AI-native strategic partnership for global enterprises that want to harness the power of AI and their existing teams to succeed in today's Intelligence Revolution.
Datatable		Data pipeline	Data processing			2017	USA	Python library for efficient multi-threaded data processing, with the support for out-of-memory datasets.
Datatron		Serving	Monitoring			2016	USA	Production AI Model Management at Scale. Automate the standardized deployment, monitoring, governance, and validation of all your models to be developed in any environment.
Dataturks		Data pipeline	Labeling			2018	India	ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
DataVirtuality		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Rapid data integration for analytics: Integrates multiple data sources, web services, and front ends in a snap.
Datera		Data pipeline	Storage			2013	USA	Get sub-200µS latency & millions of IOPS with 100% software-defined data automation. Save up to 70% on data infrastructure total-cost-of-ownership.
Datmo		Modeling & Training	Experiment tracking			2016	USA	Be as effective as AI engineers at Google and Facebook. Workflow tools to help you experiment, deploy, and scale. By data scientists, for data scientists.
Datorama		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Loading...
DAWNBench		Modeling & Training	Benchmarking			2018	USA	DAWNBench is a benchmark suite for end-to-end deep learning training and inference.
Deeplite		Serving	Model compression			2020	Canada	Enabling faster, smaller and more energy-efficient DNNs to run on edge devices and in the cloud
DeepNote		Modeling & Training	Notebook			2019	USA	The notebook you’ll love to use Deepnote is a new kind of data science notebook. Jupyter-compatible with real-time collaboration and easy deployment. Oh, and it's free.
DefinedCrowd		Data pipeline	Data generation			2015	USA	Leverage machine learning technology and human intelligence to source, structure, and enrich high quality training data in speech, NLP, and computer vision.
Dell Boomi		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Boomi AtomSphere lets you configure and deploy integrations at a fraction of the cost and time of traditional approaches, all from a single interface.
Delta Lake		Data pipeline	Data warehouse			2019	USA	Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
Dessa		All-in-one	Monitoring			2016	Canada	Create more with machine learning. Build, run & monitor 1000s of ML experiments with Foundations
Determined AI		Modeling & Training	AutoML			2016	USA	Our AutoML platform streamlines your deep learning workflows, tracks your work, and manages your GPU clusters.
Dialogflow		Modeling & Training	NLU			2014	USA	Dialogflow is a Google service that runs on Google Cloud Platform, letting you scale to hundreds of millions of users. Optimized for the Google Assistant.
Doccano		Data pipeline	Labeling			2018	Japan	Text annotation for Human. Just create project, upload data and start annotation. You can build dataset in hours.
Dockship		Modeling & Training	Pretrained models			2019	India	Dockship.io is a marketplace for AI models and datasets. Publish your models on Dockship for people all over the world.
Dolt		Data pipeline	Versioning			2018	USA	Liqiudata's mission is to make data move more efficiently. We built Dolt, an an open-source version-controlled SQL database with Git-like semantics.
Domino Data Lab		Infrastructure	Cloud management			2013	USA	Deliver winning models. One place for your data science tools, apps, results, models, and knowledge
Domo		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			With Domo, you can use data and insights delivered in data experiences to multiply your business impact and drive your business forward.
dotData		All-in-one	Feature engineering			2018	USA	When AutoML is enhanced with AI-powered feature engineering, the result is dotData. We focus on delivering data science automation for the enterprise. End-to-end data science automation platform accelerates, democratizes, and operationalizes the entire data science process.
Dremio		Data pipeline	Data management			2015	USA	Get more value from your data, faster. Dremio makes your data engineers more productive, and your data consumers more self-sufficient.
DVC - Iterative.ai		Data pipeline	Versioning			2017	USA	Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
EdgeQ		Hardware	Edge devices			2018	USA	EdgeQ is an information technology company that specializes in the fields of 5G chip systems.
Eight Wire Conductor		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Conductor, from New Zealand-based Eight Wire, offers point-and-click data integrations.
Elastifile		Data pipeline	Storage			2013	USA	Elastifile's cloud-native file storage helps organizations adapt and accelerate their business in the cloud era. Powered by a scalable, enterprise-grade distributed file system with intelligent object tiering, Elastifile augments existing public cloud services with a scalable, POSIX-compliant NAS, facilitating frictionless cloud adoption. With Elastifile, organizations enjoy low-touch file storage services, or deploy and manage cloud-native file storage themselves, eliminating the need for manual storage management and IT forecasting. Elastifile's unique combination of features and flexibility empowers organizations to seamlessly integrate cloud resources, with no application refactoring… thereby modernizing their infrastructure and achieving IT agility and efficiency goals.
Elementl		All-in-one	Workflow orchestration			2018	USA	Building Dagster, the data orchestrator. Dagster is a data orchestrator for machine learning, analytics, and ETL
Elixir Repertoire Data ETL		Data pipeline	ETL Platofrm/Tools	Commercial	On Premises			Elixir Data ETL provides on-demand, self-service data manipulation. It provides design, test, and implement data extraction, aggregation, and transformation.
erwin		Data pipeline	Data management			2016	USA	Integrated enterprise architecture, business process and data modeling with data cataloging and data literacy for risk management and digital transformation.
Etleap		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Etleap is a Redshift ETL tool that makes it easy to bring data from disparate data sources into a Redshift data warehouse.
Etlworks		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS,On Premises			Etlworks Integrator is a powerful and easy-to-use cloud data integration service that can work with structured and semi-structured data of any type and size.
Evidently AI		Serving	Monitoring			2020	Russia	Open-source tools to analyze, monitor, and debug machine learning model in production
Excelero		Data pipeline	Storage			2014	USA	Local NVMe performance at data center scale through true convergence. Software-defined block storage for Cloud and Enterprise applications at any scale.
explainX.ai		Modeling & Training	Interpretability			2020	USA	ExplainX enables you to explain, present, and monitor how your AI models work. We make sure your models never fail in the real-world.
Facets		Data pipeline	Visualization	Open Source		2017	USA	Facets: An Open Source Visualization Tool for Machine Learning Training Data
fastText		Modeling & Training	NLP			2016	USA	Library for fast text representation and classification.
FEAST		Data pipeline	Feature engineering			2019	Asia	Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.
Featuretools		Modeling & Training	Feature engineering			2018	USA	An open source python library for automated feature engineering
FedAI (FATE)		Modeling & Training	Framework			2019	China	FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure computing framework to support the federated AI ecosystem. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). It supports federated learning architectures and secure computation of various machine learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning.
Fiddler Labs		Modeling & Training	Interpretability			2018	USA	AI with trust, visibility, and insightts built in. Fiddler is a breakthrough AI engine with explainability at its heart.
Figure Eight		Data pipeline	Labeling			2008	USA	Figure Eight combines the best of human and machine intelligence to provide high-quality annotated training data that powers the world's most innovative machine learning and business solutions
Fivetran		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			All your data organized in a full data warehouse in minutes, not months.
flair		Modeling & Training	NLP	Open Source		2018	Germany	A very simple framework for state-of-the-art Natural Language Processing (NLP)
FloydHub		Infrastructure	Cloud management			2016	USA	FloydHub is a zero setup Deep Learning platform for productive data science teams.
Fluree		Data pipeline	Database/Query			2017	USA	Welcome to better data management. The Fluree platform organizes blockchain-secured data in a highly-scalable, highly-insightful graph database.
Flyte		Infrastructure	Workflow orchestration			2019	USA	Lyft’s Cloud Native Machine Learning and Data Processing Platform, Now Open Sourced
Formant		Serving	Robotics			2019	USA	Deploy faster. Improve uptime. Achieve scale.
Fritz AI		Serving	Mobile			2017	USA	Fritz AI is the machine learning platform for iOS and Android developers. Teach your mobile apps to see, hear, sense, and think.
Gemini Data		Data pipeline	Data management			2015	USA	Gemini Data provides Data Availability for AI/ML driven analysis and applications to enable unified enterprise knowledge and access.
Gensim		Modeling & Training	Framework			2012	Czech	Topic Modelling for Humans. Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Git LFS		Data pipeline	Versioning	Open Source		2014	Remote	Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
Gluent		Data pipeline	Visualization			2014	USA	Data virtualization software eliminates data silos. Gluent's transparent data virtualization provides virtual access to all enterprise data, with zero code changes.
GluonCV		Modeling & Training	Pretrained models			2018	USA	GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It aims to help engineers, researchers, and students quickly prototype products, validate new ideas and learn computer vision.
Google Cloud Data Fusion		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Google Cloud Data Fusion is a cloud-native data integration tool. It is a fully managed Google Cloud ETL tool that allows data integration at any scale. It is built with an open-source core, CDAP for your pipeline portability. It offers a visual point and clicks interface that allows code-free deployment of your ETL/ELT data pipelines. Apart from native integration with Google Cloud Services, it also offers 150+ pre-configured connectors and transformations at zero additional cost.
Gradio		Serving	App interface			2018	USA	Gradio allows you to quickly create customizable UI components around your TensorFlow or PyTorch models, or even arbitrary Python functions. Mix and match
Graphcore		Hardware	Accelerator			2016	UK	Graphcore has built a new type of processor for machine intelligence to accelerate machine learning and AI applications for a world of intelligent machines.
Graviti Data Platform		Data pipeline	Data management	Commercial	SaaS	2019	China	As a platform for unstructured data management, Graviti Data Platform provides services in data hosting, version control, data visualization, and collaboration. You can also integrate Graviti Data Platform into your own pipeline using developer tools.
GreenWaves Technologies		Hardware	Edge devices			2014	France	GreenWaves' GAP8 is the industry's first ultra-low-power processor enabling battery-operated AI in IoT applications.
Gretel AI		Data pipeline	Privacy			2019	USA	The first and only APIs to enable you to balance, anonymize, and share your data. With privacy guarantees.
Grid AI		Modeling & Training	Distributed training			2020	USA	Seamlessly train hundreds of Machine Learning models on the cloud from your laptop. Focus on machine learning, not infrastructure.
Groq		Hardware	Accelerator			2016	USA	The Next Generation of Computing is here.
H2O		All-in-one	AI Apps platform			2012	USA	H2O.ai is the creator of H2O the leading open source machine learning and artificial intelligence platform trusted by data scientists across 14K enterprises
Habana Labs		Hardware	Edge devices			2016	Israel	Habana Labs was founded in 2016 to create world-class AI Processors, developed from the ground-up and optimized for training deep neural networks and for inference deployment in production environments.
Hailo		Hardware	Edge devices			2017	Israel	The World’s Top Performing AI Processor for Edge Devices Hailo offers a breakthrough microprocessor uniquely designed to accelerate embedded AI applications on edge devices. Breathe life into your edge AI product today with Hailo-8.
Hammerspace		Data pipeline	Database/Query			2015	USA	Hammerspace allows data to move freely, like the air you breathe, across clouds and services. Make data accessible exactly where you need it, when you need it – on demand.
Heartex Label Studio		Data pipeline	Labeling			2018	USA	Label Studio is a multi-type data labeling and annotation tool with standardized output format
Hevo Data		Data pipeline	Data management	Commercial	SaaS			Hevo Data is a No-code Data Pipeline that offers a fully-managed solution to set up data integration from Google Cloud Platform and 100+ data sources (including 30+ free data sources) and will let you directly load data to a Data Warehouse such as Snowflake, Amazon Redshift, Google BigQuery, etc
Hitachi Vantara		Data pipeline	Data management	Commercial	SaaS			Hitachi Vantara’s Pentaho platform for data integration and analytics offers traditional capabilities and big data connectivity. The solution supports the latest Hadoop distributions from Cloudera, Hortonworks, MapR, and Amazon Web Services. However, one of the tool’s shortcomings is that its big data focus takes attention away from other use cases. Pentaho can be deployed on-prem, in the cloud, or via a hybrid model.
HIVE		All-in-one	Labeling			2013	USA	Hive is a full-stack deep learning company focused on solving visual intelligence problems. Let us help you join the AI Revolution. End-To-End Solutions. Full-Stack Approach.
Horovod		Modeling & Training	Distributed			2017	USA	Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.
Hugging Face		Modeling & Training	NLP			2016	USA	We're on a journey to solve and democratize artificial intelligence through natural language.
HYCU		Infrastructure	Cloud management			2009	USA	Keep hyper-converged infrastructure running with HYCU's powerful, simple backup & recovery and monitoring solutions. Deploy in seconds for superior results.
HyperOpt		Modeling & Training	Hyperparameter tuning	Open Source		2013	Canada	Distributed Asynchronous Hyperparameter Optimization in Python - hyperopt/hyperopt
IBM InfoSphere DataStage		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			IBM InfoSphere Information Server is a data integration platform that helps businesses understand, cleanse, transform, and deliver trusted information.
IBM Infosphere Information Server		All-in-one	Data management	Commercial	On Premises			Information Server is a branch of IBM’s product that revolves around data warehousing and data integration. It’s an enterprise product for large organizations that supports integration with cloud data storage, including Google Cloud, AWS S3, etc.
Igneous		Data pipeline	Data management			2013	USA	Igneous Unstructured Data Protection offers the scalability to handle hundreds of file systems, billions of files, and exabytes of enterprise data requiring backup
Iguazio		All-in-one	AI Apps platform			2014	Israel	The Iguazio Data Science Platform automates your machine learning pipeline, transforming AI projects into real-world business outcomes.
iMerit		Data pipeline	Labeling			2012	USA	iMerit specializes in data labeling and annotation for purposes of training models for Machine Learning and Artificial Intelligence.
Imply		Data pipeline	Data management			2015	USA	Imply delivers real-time analytics powered by Apache Druid. ... Stream or batch load data into Druid for high performance, ad-hoc analytic queries.
Improvado		All-in-one	Analytics/Ai	Commercial	SaaS			Loading...
Incorta		Data pipeline	Data processing			2013	USA	Incorta aggregates large complex business data in real time, eliminating the need to reshape it. No Data Warehouse. No Transformations. Real-Time Insight.
Inferrd		Serving	Deployment			2020	USA	You build the model, we handle the deployment. Inferrd is the easiest, cheapest and the most performant hosting provider for ML models.
Informatica		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Informatica is an enterprise on-premise Google Cloud ETL tool that can build enterprise warehouses. It also supports integration with various traditional databases. It has the capability of delivering data on-demand. Some of its key features include advanced transformation, dynamic partitioning, zero downtime, universal connectivity, data masking, etc.
integrate.io		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Turn your data warehouse into a data platform that powers all company decision making and operational systems.
InterpretML		Modeling & Training	Interpretability			2019	USA	Fit interpretable machine learning models. Explain blackbox machine learning
Jaspersoft		Data pipeline	Data management	Commercial	SaaS			Loading...
JAX		Modeling & Training	Framework			2018	USA	Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Keboola		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Keboola is a cloud-based data integration platform that connects data sources to analytics platforms. It supports the entire data workflow process, from the point of data extraction, preparation, cleansing, warehousing, and all the way to its integration, enrichment, and loading.
Kedro		All-in-one	AI Apps platform			2019	UK	Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.
Kimono Labs		Data pipeline	Data generation			2014	USA	Kimono Labs is an online platform that allows its users to convert their websites into APIs.
Kneron		Hardware	Edge devices			2015	USA	Kneron develops an application-specific integrated circuit and software that offers artificial intelligence-based tools.
Koalas		Data pipeline	Data processing	Open Source		2019	USA	The Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.
Komprise		Data pipeline	Storage			2014	USA	In 15 minutes, our free data management software trial will show you how you can save 70% on data management costs, on-premises and in the cloud.
Kubeflow		Serving	Deployment			2018	USA	The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Kyvos Insights		Data pipeline	Database/Query			2015	USA	Kyvos accelerates BI on trillions of rows of data on the cloud and on-premise platforms with a semantic layer powered by its next-generation OLAP technology.
Labelbox		Data pipeline	Labeling			2018	USA	A complete solution for your training data problem with fast labeling tools, human workforce, data management, a powerful API and automation features.
LabelImg		Data pipeline	Labeling	Open Source		2016	Canada	LabelImg is a graphical image annotation tool and label object bounding boxes in images
LeapMind		Hardware	Edge devices			2012	Japan	Ultra-low power consumption AI inference accelerator IP specialized for inference arithmetic processing of CNN that operates as a circuit on FPGA device or ASIC device .
Lightelligence		Hardware	Accelerator			2017	USA	Accelerate AI, Neuromorphic, AI Chip, Optical Computing, Lightmatter
LightGBM		Modeling & Training	Framework	Open Source		2016	USA	A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
LIME		Modeling & Training	Interpretability			2016	USA	Lime: Explaining the predictions of any machine learning classifier
Losswise		Serving	Monitoring			2017	USA	Turn your GPUs into monitored build servers from a git push with Losswise. Interactive visualization, logs, smart notifications, and more. Start free today.
Ludwig		Modeling & Training	Framework			2019	USA	Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
Luigi		Infrastructure	Workflow orchestration	Open Source	SaaS/On Premises	2012	Sweden	Luigi is a lightweight, well-functioning Python ETL framework tool that supports data visualization, CLI integration, data workflow management, ETL task success/failure monitoring, and dependency resolution.
Luminous Computing		Hardware	Accelerator			2018	USA	Hardware is bottlenecked by data movement & compute. We use photonics to solve both
Materialize		Data pipeline	Stream processing			2019	USA	Materialize delivers SQL exploration for streaming events and real-time data. Incrementally updated materialized views - in ANSI Standard SQL and in real-time. Micro-batching.
Matillion		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Matillion offers data integration software for cloud data warehouses, and was designed for Amazon Redshift, Snowflake, and Google BigQuery.
Matroid		Modeling & Training	Computer vision			2016	USA	Computer vision made simple. Deploy computer vision solutions in minutes, not months.
Metaflow		Infrastructure	Workflow orchestration			2019	USA	Metaflow makes it quick and easy to build and manage real-life data science projects. Metaflow is built for data scientists, not just for machines.
Metl		All-in-one	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Metl or Mito-ETL is a fast-proliferating Python ETL development platform used to develop bespoke code components. These code components can range from RDBMS data integrations, Flat file data integrations, API/Service-based data integrations, and Pub/Sub (Queue-based) data integrations.
Michelangelo		All-in-one	Workflow orchestration			2015	USA	Michelangelo, Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep learning models that span a myriad of use cases ranging from generating marketplace forecasts, responding to customer support tickets, to calculating accurate estimated times of arrival (ETAs) and powering our One-Click Chat feature using natural language processing (NLP) models on the driver app.
Microsoft (SQL Server Integration)		Data pipeline	Database/Query	Commercial	On Premises		USA	Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions.
Milvus		Data pipeline	Database/Query			2019	China	Milvus is an open source similarity search engine for massive feature vectors. Designed with heterogeneous computing architecture for the best cost efficiency. Searches over billion-scale vectors take only milliseconds with minimum computing resources.
Mindspore		Modeling & Training	Framework			2020	China	MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios
ML Kit		Serving	Mobile			2018	USA	ML Kit beta brings Google's machine learning expertise to mobile developers in a powerful and easy-to-use package.
ML.NET		Modeling & Training	Framework			2018	USA	ML.NET is an open source and cross-platform machine learning framework for .NET
MLFlow		All-in-one	Experiment tracking			2018	USA	An open source platform for the machine learning lifecycle
MLlib		Modeling & Training	Framework			2010		MLlib is Apache Spark's scalable machine learning library.
MLPerf		Modeling & Training	Benchmarking			2018	USA	Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.
MMdnn		Serving	Compatibility			2017	USA	MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
MNN		Serving	Inference			2019	China	MNN is a lightweight deep neural network inference engine.
Modin		Data pipeline	Data processing	Open Source		2018		Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
Mona Labs		Serving	Monitoring			2018	USA	PRODUCTION MONITORING FOR AI. With Mona, you gain complete transparency into how your data and models behave in the real world.
Mozart Data		Data pipeline	Analytics/Ai	Commercial	SaaS			Mozart isn’t strictly an ETL tool, but it can help you automate the process of extracting, transforming, and loading your data into a warehouse all in one central tool.
Mythic		Hardware	Edge devices			2012	USA	An architecture built from the ground up for AI Mythic has developed a truly unique AI compute platform that enables smart camera systems, intelligent appliances, brilliant robotics, and more.
Naveego		Data pipeline	Data processing			2014	USA	A leading provider of cloud-first, distributed data accuracy solutions for seamless, end-to-end data cleansing, Naveego enables organizations to proactively manage, detect and eliminate data accuracy issues across all enterprise data sources in real-time–regardless of structure or schema.
ncnn		Serving	Mobile			2017	USA	ncnn is a high-performance neural network inference framework optimized for the mobile platform
NeMo		Modeling & Training	NLU			2019	USA	NeMo: a toolkit for conversational AI
Neptune		Modeling & Training	Experiment tracking			2017	Poland	All experiment-related objects relevant to your projects organized, ready to be analyzed, discussed and shared with your team.
Netron		Modeling & Training	Visualization			2011	USA	Netron is a viewer for neural network, deep learning and machine learning models.
Neural Network Distiller		Serving	Model compression			2018	USA	Distiller is an open-source Python package for neural network compression research. Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.
nteract		Modeling & Training	Notebook			2015	USA	nteract is an open-source organization committed to creating fantastic interactive computing experiences that allow people to collaborate with ease. We build SDKs, applications, and libraries that help you and your team make the most of interactive (particularly Jupyter) notebooks and REPLs.
Nuvia		Hardware	Accelerator			2019	USA	Silicon design reimagined for a compute-intensive world.
Obliviously AI		All-in-one	AI Apps platform			2018	USA	The entire process of running Data Science - building Machine Learning algorithm, explaining results and predicting outcomes, packed in one single click.
OctoML		Serving	Deployment			2019	USA	Optimize machine learning and deep learning models for deployment. From the creators of Apache TVM, XGBoost and Apache MxNet, OctoML brings the cutting edge of AI, Systems, programming languages, compilers and architecture to make machine learning systems easier to optimize and deploy.
Octopai		Data pipeline	Data management			2015	Israel	An automated, centralized, cross-platform metadata search engine that enables BI groups to quickly and precisely discover and govern shared metadata.
ONNX		Serving	Compatibility			2018		ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
OpenBridge		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Openbridge is a data logistics platform that manages the real-time flow of consumer data, big or small, delivering it exactly where it needs to be to create value for customers.
OpenSeq2Seq		Modeling & Training	NLP			2017	USA	Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
OpenText Integration Center		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			A native integration platform to extract, enhance, transform, integrate, and migrate data and content across the enterprise.
Oracle Data Integrator		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			Oracle Data Integrator is a comprehensive data integration platform that covers all data integration requirements, including batch loads, integration processes, and SOA-enabled data services.
Owox		All-in-one	ETL + Analytics		SaaS
Pachyderm		Data pipeline	Versioning			2014	USA	Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. And… It's open source!
Paddle		Modeling & Training	Distributed			2016	China	PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice
Pandas		All-in-one	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Pandas is an ETL batch processing library with Python-written data structures and analysis tools. Python's Pandas expedite processing of unstructured/semi-structured data. The libraries are used for low-intensity ETL tasks including data cleansing and working with small structured datasets post-transformation from semi or unstructured sets.
Panoply.io		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Panoply automates data management tasks associated with running big data in the cloud. Smart Data Warehouse require no schema, modeling, or configuration. Panoply features an ETL-less integration pipeline that can connect to structured and semi-structured data sources. It also offers columnar storage and automatic data backup to a redundant S3 storage framework.
papermill		Modeling & Training	Notebook			2017	USA	Papermill is a tool for parameterizing and executing Jupyter Notebooks.
Paperspace		Infrastructure	Cloud management			2014	USA	GPU cloud tools built for developers. Powering next-generation workflows and the future of intelligent applications.
Apache Parquet		Data pipeline	File format			2013	USA	Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Paxata		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Paxata is the first interactive, self-service data preparation solution built for everyone who works with data, from business analysts to data scientists.
Peltarion		All-in-one	AI Apps platform			2005	Sweden	A single AI platform, for real world deployments, without code. Fast & Efficient Production of AI Applications. Rich data capability. Develop AI Services fast. Usable & Affordable AI.
PerceptiLabs		Modeling & Training	Visual modeling			2019	USA	PerceptiLabs takes the process of building and training a machine learning model to warp speed. We not only accelerate machine learning, we advance explainability in AI
Pervasive Data Integrator		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			Pervasive Data Integrator supports both data integration and application integration, and runs on premises, in the cloud, or hybrid.
Petl		Data pipeline	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Petl is a stream processing engine ideal for handling mixed quality data. This Python ETL tool helps data analysts with little to no prior coding experience quickly analyze datasets stored in CSV, XML, JSON, and many other data formats. You can sort, join, and aggregate transformations with minimal effort.
Petuum		All-in-one	Data management			2016	USA	Petuum accelerates and simplifies AI solutions so your enterprise can deploy it easily and maintain it effortlessly.
Picsell.ia		All-in-one	Computer Vision			2020	France	Picsell.ia is a development platform dedicated to Computer Vision. From open-source to business, you can create and review datasets, track your experiments and follow your project in a Lean AI mode.
Pilosa		Data pipeline	Database/Query			2017	USA	Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets.
PlaidML		Modeling & Training	Hardware compatiblity			2017	USA	PlaidML is a framework for making deep learning work everywhere
Playment		Data pipeline	Labeling			2015	India	Build high-quality ground truth datasets with ML-assisted tools, sophisticated project management software, expert human workforce, and much more.
Plotly		Serving	App interface			2013	Canada	Plotly is a data science and AI company that makes it easy to create and deploy interactive web apps in any programming language.
Polyaxon		All-in-one	Serving			2016	Germany	A platform for reproducing and managing the whole life cycle of machine learning and deep learning applications.
Precisely		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Precisely offers its data integration capabilities via two product families, Precisely Connect and Precisely Ironstream. The company’s flagship application and data integration tools are the Precisely Connect product family.
PredictionIO		Serving	Web			2013	USA	Apache PredictionIO is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture.
Prefect		Infrastructure	Workflow orchestration			2018	USA	The Global Leader in Dataflow Automation
Presto		Data pipeline	Database/Query			2012	USA	Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Prodigy		Data pipeline	Labeling			2017	Germany	Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. ... With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection.
Prometheus		Data pipeline	Monitoring			2012	Germany	An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
Prophesee		Hardware	Edge devices			2014	France	With the world’s most advanced Event-Based Vision systems, inspired by human vision and built on the foundation of neuromorphic engineering. PROPHESEE is the revolutionary system that gives Metavision to machines, revealing what was previously invisible to them.
pygrametl		Data pipeline	ETL Platofrm/Tools	Open Source	On Premises			pygrametl allows for ETL programming in Python.
Pyro		Modeling & Training	Programming language			2017	USA	Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch
PySyft		Modeling & Training	Privacy			2017	UK	PySyft is a Python library for secure and private Deep Learning. PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Multi-Party Computation (MPC) within the main Deep Learning frameworks like PyTorch and TensorFlow.
Pythia		Modeling & Training	Framework			2018	USA	A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
PyTorch		Modeling & Training	Framework			2015	USA	Tools & Libraries. A rich ecosystem of tools and libraries extends PyTorch and supports development in computer vision, NLP and more
PyTorch Lightning		Modeling & Training	Framework			2019	USA	The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Qlik Data Integration		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Deliver analytics-ready data to the cloud in real-time with modern DataOps for analytics from Qlik.
Qri		Data pipeline	Versioning			2016	USA	Bigger than a spreadsheet, smaller than a database, datasets are all around us. Use Qri to browse, download, create, fork, & publish datasets across a network of peers.
Quilt Data		Data pipeline	Versioning			2015	USA	Quilt is a versioned data portal for AWS
Quobyte		Data pipeline	Storage			2013	Germany	Quobyte is software defined storage that turns commodity servers into a reliable and highly automated data center file system.
Rasa		Modeling & Training	NLU			2016	Germany	Build contextual AI assistants and chatbots in text and voice with our open source machine learning framework. Scale it with our enterprise grade platform.
Ray		Modeling & Training	Distributed			2016	USA	Ray is a fast and simple framework for building and running distributed applications.
Relational Junction ETL Manager		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			Relational Junction ETL Manager lets you extract, transform, and load production data into your data warehouse.
RelicX		Serving	CI/CD			2020	USA	RelicX is a venture funded startup building an AI DevOps platform that brings CX intelligence into the CI/CD pipeline to ensure software release readiness based on real user behavior and customer experience.
Replicate		Modeling & Training	Versioning			2020	USA	Version control for machine learning
Riko		All-in-one	ETL Platofrm/Tools	Open Source	SaaS/On Premises			Riko is an apt replacement for Yahoo Pipes. It continues to be ideal for startups possessing low technological expertise.
River		Modeling & Training	Online learning			2017	France	A Python package for online/streaming machine learning.
Rivery		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Rivery is a SaaS integration tool that lets you consolidate all your data from both internal and external sources into a single data platform in the cloud.
Robust AI		All-in-one	Robotics			2019	USA	Robust.AI: Creating a New Foundation for the Future of Robotics.
Rockset		Data pipeline	Database/Query			2016	USA	Rockset: The Real-Time Indexing Database in the Cloud Rockset allows you to build data-driven applications on MongoDB, DynamoDB, ... AI. Test, validate and deploy models faster by analyzing live data in real-time.
Rubrik		Data pipeline	Data management			2013	USA	We provide a powerful, policy-driven platform to simplify recovery and unlock insights from data residing in the data center and cloud.
RudderStack		All-in-one	ELT & Reverse-ETL	Commercial	SaaS			All your customer data pipelines in one platform
Sagent Data Flow		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			Sagent Data Flow from Pitney Bowes Software is a powerful and flexible integration engine that collates data from disparate sources and provides data transformation tools.
SambaNova		Hardware	Accelerator			2017	USA	SambaNova Systems is a computing startup focused on building machine learning and big data analytics platforms.
SAP BusinessObjects Data Services		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			Unlock meaning from all of your organization’s data – structured or unstructured – with data integration, quality, cleansing, and more.
SAS Data Management		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			SAS Data Management helps transform, integrate, govern, and secure data while improving its overall quality and reliability.
Scale AI		Data pipeline	Data generation			2016	USA	Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more.
scikit-learn		Modeling & Training	Framework			2010	Remote	Machine Learning in Python
Scrapinghub		Data pipeline	Data generation			2010	Ireland	Turn websites into data with the world's leading web scraping services & tools from the creators of Scrapy. Data extraction trusted by industry leaders.
scribble Data		Modeling & Training	Feature engineering			2016	India	The feature store for your ML engineering needs
Scriptella		All-in-one	ETL Platofrm/Tools	Open Source	On Premises			Scriptella is an open source ETL and script execution tool written in Java.
Segment		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Segment collects user data with one API and sends it to hundreds of tools or a data warehouse.
Segments.ai		Data pipeline	Labeling			2020	Belgium	Deep learning-fueled labeling technology with a focus on instance and semantic segmentation.
Seldon		Serving	Serving			2011	UK	Manage, serve and scale models built in any framework on Kubernetes. Take your ML projects from POC to production.
SHAP		Modeling & Training	Interpretability			2017	USA	A game theoretic approach to explain the output of any machine learning model.
SigOpt		Modeling & Training	Hyperparameter tuning			2014	USA	SigOpt is a standardized, scalable, enterprise-grade optimization platform and API designed to unlock the potential of your modeling pipelines.
SiMa.ai		Hardware	Edge devices			2018	USA	Is your ML Green?TM We believe that the future of compute is high performance machine learning at the edge – and today, power is the limiter.
Singer		All-in-one	ETL Platofrm/Tools	Open Source	SaaS,On Premises			Singer is an open source standard for writing scripts that move data.
Sisu		Data pipeline	Analytics platform			2018	USA	Sisu is the fastest, most comprehensive augmented analytics platform letting you ... You can't keep up with changing metrics using manual data exploration.
Skyvia		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Skyvia’s Data Integration tool contains a wide range of data-related scenarios which can be created directly from the user interface.
SnapLogic Elastic Integration Platform		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			SnapLogic Elastic Integration Platform handles both structured and unstructured data, with point-to-point integration functionality in hybrid integration use cases.
Snorkel		Data pipeline	Labeling			2016	USA	Programmatically Building and Managing Training Data
Snorkel AI		All-in-one	AI Apps platform			2019	USA	Programmatically Building and Managing Training Data
spaCy		Modeling & Training	NLP			2014	Germany	spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Spark		Data pipeline	Data processing			2009		Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Spell		Modeling & Training	Experiment tracking			2017	USA	Spell is a powerful platform for building and managing machine learning projects. Spell takes care of infrastructure, making machine learning projects easier to start, faster to get results, more organized and safer than managing infrastructure on your own.
SQLFlow		Data pipeline	Database/Query			2019	China	Extends SQL to support AI. Extract knowledge from Data. Currently support MySQL, Apache Hive, Alibaba MaxCompute, XGBoost and TensorFlow.
Starburst Data		Data pipeline	Database/Query			2017	USA	Limitless Queries. Break boundaries and harness the power of the world's fastest SQL query engine.
Starfish		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			#N/A
Stitch		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Stitch is a simple, powerful ETL service for businesses of all sizes, up to and including the enterprise. Running on a scalable, fault-tolerant cloud platform, Stitch integrates data from dozens of different sources.
Storbyte		Data pipeline	Storage			2014	DC	Storbyte designs and manufactures all-flash & hybrid flash enterprise storage arrays that offer performance, power management, availability, reliability, density, efficiency, flexibility, expandability, and affordability. Storbyte is providing innovative data storage solutions and has not lost sight of what is important to end users: a responsible, cost-correct price point.
Stradigi AI		All-in-one	AI apps platform			2017	Canada	Stradigi AI's powerful AI business platform, Kepler, fuels tangible results for enterprises. No AI or machine learning experience required.
Streamlit		Modeling & Training	App interface			2018	USA	Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in hours, not weeks. All in pure Python.
StreamSets		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			StreamSets is a DataOps and real-time Google Cloud ETL tool. It provides data monitoring and supports a variety of data sources and destinations for data integration. Many enterprises use it to integrate dozens of data sources for analysis. It supports data protectors with data security guidelines like GDPR and HIPAA.
StreamSets Data Collector		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			The StreamSets Data Collector is a low-latency ingest infrastructure tool that lets you create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE).
Striim		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Unify your data in Google Cloud with a full suite of real-time data integration solutions. Whether it's automated database migrations to Google Cloud or data integration for BigQuery, Striim will help you get there faster.
Superb AI		Data pipeline	Data management			2018	USA	Create, label and manage ML training data efficiently so you can build AI faster. Fully managed workforce. Powerful labeling tools. Training data quality control.
Supermetrics		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Supermetrics is a managed data pipeline that makes it easy for marketers, data analysts, and data engineers to move any marketing metrics into a data warehouse in Snowflake, BigQuery, or Azure Synapse Analytics
Supervisely		All-in-one	Computer vision			2017	USA	First available ecosystem to cover all aspects of training data development. Manage, annotate, validate and experiment with your data without coding.
superwise.ai		Serving	Monitoring			2019	Israel	Monitor your AI from the moment it meets reality so you can finally trust every model
Syncsort DMX		All-in-one	ETL Platofrm/Tools	Commercial	On Premises			DMX supports mainframe, legacy, and big data sources, and provides a no-code approach to join datasets.
Synthetaic		Data pipeline	Data generation			2019	USA	We grow high-quality data that unlocks impossible AI. What if edge cases no longer existed? What if training data was no longer a constraint?
Syntiant		Hardware	Edge devices			2017	USA	Always-On Voice powered by custom AI Silicon
Talend		All-in-one	ETL Platofrm/Tools	Open Source	On Premises			Talend is a big data and cloud data integration software. Talend is built on Eclipse graphic environment. It also supports scaling massive data sets and advanced data analytics. It has partnered with leading cloud service providers, analytics platforms, data warehouses such as Google Cloud Platform, Amazon Web Services (AWS), Snowflake, etc. It acts as a connector to other software as Saas.
talos		Modeling & Training	Hyperparameter tuning			2018	Finland	Hyperparameter Optimization for TensorFlow, Keras and PyTorch
Tamr		Data pipeline	Data management			2012	USA	Tamr's leading data management system and services work to create a data migration strategy that simplifies your data unification process. Talk with us today.
Tazi.ai		Modeling & Training	AutoML			2015	Turkey	TAZI’s Automated Machine Learning is understandable continuous machine learning from data and humans, enables business domain experts to use machine learning to make predictions and take actions. It also helps data analysts and scientists for their daily model creation and deployment.
Tecton		All-in-one	Deployment			2019	USA	The Data Platform for Machine Learning. Build a library of great features. Serve them in production. Do it at scale.
Tensorboard		Modeling & Training	Experiment tracking			2015	USA	TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. It enables tracking experiment metrics like loss and accuracy, visualizing the model graph, projecting embeddings to a lower dimensional space, and much more.
TensorFlow		Modeling & Training	Framework			2015	USA	An end-to-end open source machine learning platform for everyone. Discover TensorFlow's flexible ecosystem of tools, libraries and community resources
TensorFlow Extended		Serving	Deployment			2019	USA	TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines
TensorFlow Lite		Serving	Mobile			2019	USA	TensorFlow Lite is an open source deep learning framework for on-device inference.
TensorRT		Serving	Inference			2019	USA	NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
TerminusDB		Data pipeline	Database/Query			2017	Ireland	TerminusDB is an open source model driven graph database for knowledge graph representation designed specifically for the web-age.
Textur		Data pipeline	ETL Platofrm/Tools	Commercial	SaaS			Textur unifies data from your data silos and provides a powerful SQL interface for modeling your business.
Theano		Modeling & Training	Framework			2008	Canada	Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
TPOT		Modeling & Training	AutoML			2016	USA	A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
TransmogrifAI		Modeling & Training	AutoML			2017	USA	an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Treasure Data		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Treasure Data connects data and teams together with a full suite of tools that automate data collection and processing.
Trifacta		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Trifacta is an interactive cloud platform for data engineers and analysts to collaboratively profile, prepare, and pipeline data for analytics and machine learning
Truera		Modeling & Training	Explanability			2019	USA	The Truera Model Intelligence Platform powered by Enterprise-Class AI Explainability eliminates the machine learning black box with Model Intelligence.
tsfresh		Modeling & Training	Feature engineering			2008	Germany	Automatic extraction of relevant features from time series
Tumult Labs		Data pipeline	Privacy			2019	USA	Unleashing the power of data with ironclad privacy protection
Tune		Modeling & Training	Hyperparameter tuning			2017	USA	Tune is a Python library for hyperparameter tuning at any scale.
Turi Create		Modeling & Training	Framework	Open Source		2018	USA	Turi Create simplifies the development of custom machine learning models.
Unravel Data		Serving	Monitoring			2013	USA	Unravel provides full-stack visibility and AI-powered guidance to help you understand and optimize the performance of your data-driven applications.
V7Labs		Data pipeline	Labeling			2018	UK	Create the Sense of Sight Label, train, and deploy artificial intelligence that effortlessly learns new objects from your data.
Vaex		Data pipeline	Data processing			2015	Netherlands	Power up your business with our data driven solutions. With our unique, state-of-the-art technology, we provide fast and scalable solutions that will make you more agile, while limiting unnecessary resources.
Valohai		Infrastructure	Workflow orchestration			2016	Finland	The MLOps platform for the whole team. Valohai takes you from POC to production while managing the whole model lifecycle.
Vearch		Data pipeline	Database/Query	Open Source		2019	China	Vearch is the vector search infrastructure for deeping learning and AI applications.
VertaAI		Serving	Monitoring			2019	USA	Verta.AI is a Palo Alto-based startup building software infrastructure to help enterprise data science and machine learning (ML) teams rapidly develop and deploy ML models.
Vexata		Data pipeline	Storage			2014	USA	Vexata is an active data infrastructure company that accelerates database and analytic platforms via groundbreaking storage solutions.
Vowpal Wabbit		Modeling & Training	Online learning			2010		Vowpal Wabbit provides a fast, flexible, online, and active learning solution that empowers you to solve complex interactive machine learning problems
Voxel51 // Scoop		Data pipeline	Data quality			2018	USA	We build software that enables ML engineers to build better models, more quickly. Try FiftyOne, our powerful platform for dataset curation, analysis, and model
Waterline Data		Data pipeline	Data management			2013	USA	Waterline's enterprise data catalog enables data professionals to discover, govern, and rationalize an organization's data lake.
Wave Computing		Hardware	Accelerator			2008	USA	Wave Computing is revolutionizing AI and deep learning with its dataflow-based systems and embedded solutions.
Weights & Biases		Modeling & Training	Experiment tracking			2017	USA	We're building developer tools for deep learning. Add a couple lines of code to your training script and we'll keep track of your hyperparameters, system metrics, and outputs so you can compare experiments,
XGBoost		Modeling & Training	Framework			2014		XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.
Xnor.ai		Serving	Model compression			2016	USA	Transform your business with on-device AI.
Xpanse AI		All-in-one	AI Apps platform			2015	Ireland	The power of AI at the click of a button. Xpanse AI brings easy to use and lightning fast analytics to your business.
Xplenty		All-in-one	ETL Platofrm/Tools	Commercial	SaaS			Xplenty's data integration platform streamlines data processing, reducing time spent and allowing businesses to focus on insight over preparation.
Yellowbrick Data		Data pipeline	Data warehouse			2014	USA	The ultimate solution for your data warehouse. Quick to deploy, easy to expand, and simple to manage. Yellowbrick Data can solve your data problems.
Zero ASIC		Hardware	Edge devices			2020	USA	Removing the Barrier to Custom Silicon
Zilliz		Data pipeline	Database/Query			2017	China	The company specializes in the development of open-source, AI-powered unstructured data analysis software, and is the initiator and primary contributor to the vector similarity search project Milvus.
Snowflake		All-in-one	Data management			2012	USA
Google BigQuery		All-in-one	Data management			2010	USA
dbt		Data transformation	ELT Tool			2016	USA
Looker		BI Tool				2011	USA
Mode		BI Tool				2012	USA
Census		All-in-one	ELT & Reverse-ETL			2018	USA
Hightouch		All-in-one	ELT & Reverse-ETL			2018	USA
Grouparoo		All-in-one	ELT & Reverse-ETL		Open Source		USA
Polytomic		All-in-one	ELT & Reverse-ETL				USA	Data within companies is fragmented. Sales, Marketing, Support, Finance, and Operations teams spend enormous amounts of time repeatedly hunting for data that lives outside of their home systems.
Rudderstack		All-in-one	ELT & Reverse-ETL			2019	USA	RudderStack elegantly handles every piece of data from every source and syncs it with every tool in your stack.
Seekwell			ELT & Reverse-ETL				USA	Get your SQL data in the places you need it like Google Sheets, Salesforce, Zendesk, and Slack.
Workato			ELT & Reverse-ETL				USA	Integrate your stack. Automate your work. A SINGLE PLATFORM FOR INTEGRATION & WORKFLOW AUTOMATION ACROSS YOUR ORGANIZATION
Name	Website	Cat	SubCat	Type	Deployment	Started	HQ	Description

The first step before selecting a data stack is to make sure you have created your use cases on what you want to do with the data you have, when that is clear then you can select the tools.