The data engineering industry has evolved from Extract, Transform, Load (ETL) to ELT, where raw data is copied from the source system loaded into a data warehouse or data lake and then transformed. The current trend is to adopt a new approach, called “reverse ETL,” the process of moving data from a data warehouse into third-party systems to make data operational
In this type of data stack, the data warehouse becomes the single source of truth for data, including customer data that can be spread across different systems. Solutions that have enabled this new architecture include Fivetran, Airbyte and Cloud Function for (EL), DBT for (T), BigQuery, Snowflake and Redshift for the data warehouse.
Traditionally data stored in data warehouses were used for analytical workloads and business intelligence applications like Looker and Superset. New uses cases have appeared in that data can be further utilized for operational analytics, which drives action by automatically delivering real-time data to your organization. where it matters.
There are many use cases for reverse ETL, having a consistent view of the customer across all systems mirroring product usage data can help improve customer interactions by supporting personalized messages that include product metrics. Pushing data to Salesforce you can have an up-to-date list of high lifetime value customers or customers that spend more than a defined amount or how the customer interacts with your organization.
Syncing customer data into your support portal can save time when responding to support requests or automatically prioritize messages when they come in.
Write your own Data Connectors
You could write your own API connectors to extract or to push data from the data warehouse e.g to pipe the data into operational systems like Salesforce, Marketo, HubSpot. Writing your own data pipeline connectors can be done with Cloud Function, there is a downfall with going this route in that it can be hard to write these connectors because endpoints may be brittle and most APIs are not built to handle real-time data transfer.
Data teams must setup batching, retries, and checkpointing to avoid rate limits. Mapping fields from the data warehouse to SaaS products can take time. From there, it can be challenging to maintain the connectors over time because API specs change.
Why Reverse ETL tool
Reverse ETL solutions offer out-of-the-box connectors to numerous systems, so teams no longer need to write and maintain their own connectors. In doing it in-house with Cloud Functions teams might have only written a few connectors for systems like Salesforce, Marketo, HubSpot as it takes time, and when connectors go live time is spent on maintenance, even having to plan for regular API compliance to make sure specs has not changed.
Create customer segmentation, audiences, and lead scoring through a visual analysis interface or dbt model outputs that can be pushed downstream. Using a reverse ETL tool, your data team can now push data into more systems, getting better use of the data. reverse ETL tools provide a visual interface to choose which query output columns are used to populate standard and custom fields, allowing you to continuously sync or define what triggers the syncing between the systems.
For example, after a dbt job is run it can trigger the sync, reverse ETL solutions log and monitor sync status and progress and notify teams if they need attention.
Using a reverse ETL tool will allow data teams to maintain a single data pipeline compared to multiple. They no longer have to write scripts and have visibility and control over syncs. Sales, marketing, growth, and analytics teams can analyze and act upon the same, consistent, and reliable data. Data consistency helps create continuity across the business since functional teams are working off the same data even if using e.g. Salesforce, Marketo, HubSpot, and will accelerate decision-making.
There are over 300 companies from startups, open-source and commercial companies that offer ETL or ELT or Reverse ETL, as you can see from the table below it is not easy to find the solutions that fit your need.
Data Solutions
Name
Website
Cat
SubCat
Type
Deployment
Started
HQ
Description
Abacus AI
All-in-one
AutoML
Commercial
2019
USA
Abacus.AI makes it effortless to create large-scale customizable deep learning systems. Accurate predictions generated by our system can be easily and securely incorporated into all aspects of your customer experience and business processes
Accord
Modeling & Training
Framework
Commercial
2012
France
Machine learning, computer vision, statistics and general scientific computing for .NET
Actian
Data pipeline
Data management
Commercial
On Premises
Actian DataConnect aggregates data from any source, whether on premises or in the cloud, in a database, or in a SaaS application.
Adeptia Integration Suite
Data pipeline
Data management
Commercial
On Premises
Adeptia offers self-service ETL capabilities to business users and data scientists. Developers can use it for data validations, cleansing, routing, exception-handling, and back-end connectivity.
Aible
All-in-one
Serving
Commercial
2018
USA
Create AI that delivers impact, not accuracy, with cost-benefit tradeoffs & operational constraints, in a friendly, intuitive UI designed for real business.
AIMET
Modeling & Training
Model compression
Open Source
2020
USA
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Airbyte
Data pipeline
ETL Platofrm/Tools
Commercial / Open Source
SaaS/On Premises
Get all your ELT data pipelines running in minutes, even your custom ones. Let your team focus on insights and innovation
Aircloak
Data pipeline
Privacy
2012
Germany
Aircloak's unique approach ensures the existing primary database is not modified in any way. Aircloak handles all data types including unstructured text.
Airflow
Infrastructure
Workflow orchestration
Open Source
Hybrid and multi-cloud
2015
USA
Airflow is a modern platform that designs, creates, and tracks workflows. It is an open-source Google Cloud ETL tool. It supports integration with cloud services, including Google Cloud Platform, Azure, and AWS. It offers a user-friendly interface and provides clear visualization. Scaling becomes very easy with Airflow due to its modular structure.
Alectio
Modeling & Training
Active learning
2019
USA
Not all data is created equal You can build better models with less data. We can show you how.
Algorithmia
Serving
Serving
2013
USA
Algorithmia makes applications smarter, by building a community around algorithm development, where state of the art algorithms are always live and accessible to anyone
Alink
Modeling & Training
Framework
2018
China
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Allegro AI/TRAINS
Modeling & Training
Experiment tracking
2016
Israel
Deep learning platform tailored for computer vision. Allegro AI offers the first end-to-end machine learning product life-cycle management solution.
AllenNLP
Modeling & Training
NLP
2016
USA
AllenNLP is an open-source NLP research library, built on PyTorch.
Alluxio
Data pipeline
Data management
2015
USA
an open source data orchestration layer that brings data close to compute for big data and AI/ML workloads in the cloud.
Alooma
Data pipeline
Data management
Commercial
SaaS
Alooma is a real-time data pipeline that lets you integrate any data source – databases, applications, and any API - with your data warehouse.
Alteryx
Data pipeline
Data management
Commercial
On Premises
2011
USA
Alteryx allows you to prep, blend, and analyze data using a repeatable workflow, then deploy and share analytics for deeper insights in hours, not weeks.
Amazon Redshift
Data pipeline
Data warehouse
2012
USA
Amazon Redshift is a fast, fully managed, and cost-effective data warehouse that gives you petabyte scale data warehousing and exabyte scale data lake analytics together in one service. Amazon Redshift is up to ten times faster than traditional on-premises data warehouses.
Amundsen
Data pipeline
Database/Query
2019
USA
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Angel ML
Modeling & Training
Distributed
2017
China
A Flexible and Powerful Parameter Server for large-scale machine learning
Anodot
Data pipeline
Data monitoring
2014
Israel
We monitor your business. Anodot monitors all your data in real time for lightning fast detection of the incidents that impact your revenue
Anyscale
Infrastructure
Cloud management
2019
USA
From the creators of Ray, a framework for building machine learning applications at any scale originating from the UC Berkeley RISELab.
Anyverse
All-in-one
Commercial
Accelerate advanced perception system development with hyperspectral synthetic data that mimics exactly what your sensors see
Apache Druid
Data pipeline
Database/Query
2012
USA
Apache Druid is a high performance real-time analytics database
Apache Flink
Serving
Stream processing
2011
Germany
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities
Apache Hudi
Data pipeline
Data warehouse
2016
USA
Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores)
Apache Kafka
Serving
Stream storage
Open Source ETL
2011
USA
Apache Kafka is an open-source distributed event streaming platform used by many companies to develop high-performance data pipelines, perform streaming analytics and data integration.
Apache Mahout
Modeling & Training
Framework
2008
Remote
Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end, or can be extended to other distributed backends.
Apache MXNet
Modeling & Training
Framework
2015
A flexible and efficient library for deep learning.
Apache NiFi
Data pipeline
ETL Platofrm/Tools
Open Source ETL
Apache NiFi is an open-source ETL tool and is free for use. It allows you to visually assemble programs from boxes and run them without writing code. So, it is ideal for anyone without a background in coding. It can work with numerous different sources, including RabbitMQ, JDBC query, Hadoop, MQTT, UDP socket, etc. You can use it to filter, adjust, join, split, enhance, and verify data.
Apache ORC
Data pipeline
File format
2013
the smallest, fastest columnar storage for Hadoop workloads.
Apache Spark
Data pipeline
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Apache Spark is an excellent ETL tool for Python-based automation for people and enterprises that work with streaming data. Growth in data volume is proportional to business scalability, making automation necessary and relentless with Spark ETL.
Apache Superset
All-in-one
Open Source
Apache Superset is a modern, enterprise-ready business intelligence web application. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple pie charts to highly detailed deck.gl geospatial charts.
Apache TVM
Serving
Inference
2017
Apache TVM (incubating) is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.
Aparavi
Data pipeline
Data management
2016
USA
Aparavi's highly scalable data intelligence and automation solutions enable organizations to easily discover, classify, protect, and optimize their data.
ApatarForge
Data pipeline
ETL Platofrm/Tools
Open Source
Hybrid and multi-cloud
Apatar is an open source data integration and ETL tool written in Java.
AresDB
Data pipeline
Database/Query
Open Source
2019
USA
A GPU-powered real-time analytics storage and query engine.
Argo
Serving
CI/CD
Open Source
2018
USA
Get stuff done with Kubernetes. Open source Kubernetes native workflows, events, CI and CD
Arize AI
Serving
Monitoring
2019
USA
Arize AI is the watcher, troubleshooter and the guardrail on deployed AI
Arthur AI
Serving
Monitoring
2018
USA
Always-on Explainability, Bias, and Performance Monitoring for AI, ML, and analytics. Get up and running in minutes and start sleeping better at night. Dedicated. Innovative.
Ascend.io
Data pipeline
Data management
2015
USA
Experience continuously optimized data pipelines with less code and fewer breakages. Enter the new era of data engineering with Ascend's autonomous dataflow service.
Astera Centerprise
Data pipeline
ETL Platofrm/Tools
Commercial
On Premises
Centerprise ETL offers data warehouse loading functionality, including the Slowly Changing Dimension (SCD) transformation.
Astronomer
Data pipeline
ETL Platofrm/Tools
Commercial
Hybrid and multi-cloud
Build, run, and manage data pipelines-as-code at enterprise scale with Apache Airflow, the most popular open source orchestrator.
AtScale
Data pipeline
Data management
2013
USA
Freedom of choice for the enterprise. Break free the complexities and security risks associated with cloud migration and self-service analytics with Intelligent Data Virtualization—no matter where dat.
Backend AI
Infrastructure
Workflow orchestration
2016
South Korea
Backend.AI: Minute-made GPU clustering solution for Machine Learning.
BentoML
Modeling & Training
Pretrained models
2018
USA
BentoML makes it easy to serve and deploy machine learning models in the cloud. It is an open source framework for building cloud-native model serving services. BentoML supports most popular ML training frameworks and deployment platforms, including major cloud providers and docker/kubernetes.
Blaize
Hardware
Edge devices
2010
USA
Intelligence at the edge of everywhere. Blaize unleashes the potential of AI to drive leaps in the value that technology delivers to transform markets and improve the way we all work and live.
Blendo
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Blendo provides a data management platform that connects, reshapes, and delivers actionable data, with a focus on simple integration procedures and automated data collection.
Bonobo
Data pipeline
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Bonobo is an open-source, Python-based ETL pipeline deployment and data extraction tool. You can leverage its CLI to extract data from SQL, CSV, JSON, XML, and many other sources.
Boruta
Modeling & Training
Feature engineering
Open Source
2010
Python implementations of the Boruta all-relevant feature selection method.
Boulder AI
Hardware
Edge devices
2017
USA
Human insight and decision making on a visual sensor.
BrainChip
Hardware
Edge devices
2006
USA
BrainChip brings artificial intelligence to the edge with a high-performance, small, ultra-low power solution that enables continuous learning and inference.
Bubbles
Data pipeline
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Bubbles is a Python framework for data processing and data quality measurement. Basic concept are abstract data objects, operations and dynamic operation dispatch.
Built.io Flow
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Built.io Flow is a drag-and-drop tool for building enterprise integrations.
Cadence
Infrastructure
Workflow orchestration
2017
USA
Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
Caffe
Modeling & Training
Framework
2013
USA
Caffe: a fast open framework for deep learning.
Cambricon
Hardware
Accelerator
2016
China
Cambricon Technologies builds core processor chips for intelligent cloud servers, intelligent terminals, and intelligent robots.
Catalyst
Modeling & Training
Framework
Open Source
2018
Russia
PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write another regular train loop.
Cazena
Data pipeline
Data management
2014
USA
First Data Lake with a SaaS Experience. Cazena empowers enterprises to collect, store and analyze any data in the cloud, without any DevOps resources or admin time. Cazena's Data Lake as a Service includes everything, and is delivered as secure SaaS, ready to load, store and analyze data with any method: SQL, Spark, R, Python, and many more.
CDAP
Data pipeline
ETL Platofrm/Tools
Open Source ETL
Hybrid and multi-cloud
Interoperability across on-premises and Cloud environments; Support for all major public cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud Platform.
CData Software
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
CData Software offers data integration solutions for real-time access to online or on-prem applications, databases, and Web APIs. The vendor specializes in providing access to data through established data standards and application platforms such as ODBC, JDBC, ADO.NET, SSIS, BizTalk, and Microsoft Excel. CData Software products are broken down into six categories: driver technologies, enterprise connectors, data visualization, ETL and ELT solutions,
Cerebras
Hardware
Accelerator
2016
USA
AI insights, faster Cerebras is a computer systems company dedicated to accelerating deep learning. The pioneering Wafer-Scale Engine (WSE) – the largest chip ever built – is at the heart of our deep learning system, the Cerebras CS-1.
Chainer
Modeling & Training
Framework
2015
Japan
A Powerful, Flexible, and Intuitive Framework for Neural Networks
Civis Analytics
All-in-one
Analytics/Ai
Commercial
Civis Turns Data Into Campaigns That Drive Action
ClearSky Data
Data pipeline
Storage
2014
USA
ClearSky Data offers enterprise storage as a hybrid cloud service delivering on-demand primary storage, offsite backup, and DR as a single service.
CleverHans
Modeling & Training
Adversarial robustness
2017
USA
An adversarial example library for constructing attacks, building defenses, and benchmarking both
Clipper
Serving
Web
2017
USA
Clipper is a low-latency prediction serving system for machine learning. Clipper makes it simple to integrate machine learning into user-facing serving systems.
Cloudera
Infrastructure
Cloud management
2008
USA
Cloudera delivers an Enterprise Data Cloud for any data, anywhere, from the Edge to AI.
CloverETL
All-in-one
ETL Platofrm/Tools
Open Source
On Premises
CloverETL is a data integration software suite for data migration and data warehousing, and for feeding data into business intelligence and reporting applications.
Cohesity
Data pipeline
Data management
2013
USA
Eliminate mass data fragmentation with Cohesity's modern approach to data management, beginning with backup. Gain instant recovery. Learn more today.
Colab
Modeling & Training
Notebook
2017
USA
Colab notebooks allow you to combine executable code and rich text in a single document, along with images, HTML, LaTeX and more.
Comet
Modeling & Training
Experiment tracking
2017
USA
Comet lets you track code, experiments, and results on ML projects. It’s fast, simple, and free for open source projects.
Confluent
Data pipeline
Stream processing
2014
USA
Confluent is a fully managed Kafka service and enterprise stream processing platform. Real-time data streaming for AWS, GCP, Azure or serverless. Try free!
Core ML
Serving
Mobile
2017
USA
Use Core ML to integrate machine learning models into your app. Core ML provides a unified representation for all models.
Cortex
Serving
Web
2019
USA
Cortex is an open source platform for deploying machine learning models as production web services.
Cubonacci
All-in-one
AI Apps platform
2018
Netherlands
Machine learning lifecycle management Cubonacci enables organizations to focus on developing custom machine learning models without having to worry about peripheral matters. The Cubonacci platform manages deployment, versioning, infrastructure, monitoring and lineage for you, eliminating risk and minimizing time-to-market.
cuDF
Data pipeline
Data processing
2018
USA
Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
DAGsHub
Modeling & Training
Versioning
2019
Israel
DAGsHub is a platform for data version control and collaboration for data scientists and machine learning engineers.
DarwinAI
Modeling & Training
Explanability
2017
Canada
DarwinAI’s Generative Synthesis 'AI building AI' technology enables optimized and explainable deep learning.
Dash
Serving
App interface
2015
Canada
Dash Enterprise is the end-to-end development & deployment platform for low-code AI Dash applications.
Dask
Data pipeline
Data processing
2015
Remote
Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
Databricks
All-in-one
Data management
Commercial
2013
USA
All your data, analytics and AI on one lakehouse platform
Dataddo
All-in-one
ETL + Analytics
Commercial
SaaS
Your data, from any source, to any destination
Datadog
Infrastructure
Cloud management
2010
USA
See inside any stack, any app, at any scale, anywhere.
Datagrok
All-in-one
Data processing
2019
USA
Datagrok: Swiss Army Knife for Data. A platform for turning data into actionable insights
Dataiku
All-in-one
AI Apps platform
2013
USA
Dataiku's single, collaborative platform powers both self-service analytics and the operationalization of machine learning models in production.
DataRobot
All-in-one
AI Apps platform
2012
USA
DataRobot combines a trusted enterprise AI platform and a trusted AI-native strategic partnership for global enterprises that want to harness the power of AI and their existing teams to succeed in today's Intelligence Revolution.
Datatable
Data pipeline
Data processing
2017
USA
Python library for efficient multi-threaded data processing, with the support for out-of-memory datasets.
Datatron
Serving
Monitoring
2016
USA
Production AI Model Management at Scale. Automate the standardized deployment, monitoring, governance, and validation of all your models to be developed in any environment.
Dataturks
Data pipeline
Labeling
2018
India
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
DataVirtuality
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Rapid data integration for analytics: Integrates multiple data sources, web services, and front ends in a snap.
Datera
Data pipeline
Storage
2013
USA
Get sub-200µS latency & millions of IOPS with 100% software-defined data automation. Save up to 70% on data infrastructure total-cost-of-ownership.
Datmo
Modeling & Training
Experiment tracking
2016
USA
Be as effective as AI engineers at Google and Facebook. Workflow tools to help you experiment, deploy, and scale. By data scientists, for data scientists.
Datorama
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Loading...
DAWNBench
Modeling & Training
Benchmarking
2018
USA
DAWNBench is a benchmark suite for end-to-end deep learning training and inference.
Deeplite
Serving
Model compression
2020
Canada
Enabling faster, smaller and more energy-efficient DNNs to run on edge devices and in the cloud
DeepNote
Modeling & Training
Notebook
2019
USA
The notebook you’ll love to use Deepnote is a new kind of data science notebook. Jupyter-compatible with real-time collaboration and easy deployment. Oh, and it's free.
DefinedCrowd
Data pipeline
Data generation
2015
USA
Leverage machine learning technology and human intelligence to source, structure, and enrich high quality training data in speech, NLP, and computer vision.
Dell Boomi
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Boomi AtomSphere lets you configure and deploy integrations at a fraction of the cost and time of traditional approaches, all from a single interface.
Delta Lake
Data pipeline
Data warehouse
2019
USA
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
Dessa
All-in-one
Monitoring
2016
Canada
Create more with machine learning. Build, run & monitor 1000s of ML experiments with Foundations
Determined AI
Modeling & Training
AutoML
2016
USA
Our AutoML platform streamlines your deep learning workflows, tracks your work, and manages your GPU clusters.
Dialogflow
Modeling & Training
NLU
2014
USA
Dialogflow is a Google service that runs on Google Cloud Platform, letting you scale to hundreds of millions of users. Optimized for the Google Assistant.
Doccano
Data pipeline
Labeling
2018
Japan
Text annotation for Human. Just create project, upload data and start annotation. You can build dataset in hours.
Dockship
Modeling & Training
Pretrained models
2019
India
Dockship.io is a marketplace for AI models and datasets. Publish your models on Dockship for people all over the world.
Dolt
Data pipeline
Versioning
2018
USA
Liqiudata's mission is to make data move more efficiently. We built Dolt, an an open-source version-controlled SQL database with Git-like semantics.
Domino Data Lab
Infrastructure
Cloud management
2013
USA
Deliver winning models. One place for your data science tools, apps, results, models, and knowledge
Domo
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
With Domo, you can use data and insights delivered in data experiences to
multiply your business impact and drive your business forward.
dotData
All-in-one
Feature engineering
2018
USA
When AutoML is enhanced with AI-powered feature engineering, the result is dotData. We focus on delivering data science automation for the enterprise. End-to-end data science automation platform accelerates, democratizes, and operationalizes the entire data science process.
Dremio
Data pipeline
Data management
2015
USA
Get more value from your data, faster. Dremio makes your data engineers more productive, and your data consumers more self-sufficient.
DVC - Iterative.ai
Data pipeline
Versioning
2017
USA
Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
EdgeQ
Hardware
Edge devices
2018
USA
EdgeQ is an information technology company that specializes in the fields of 5G chip systems.
Eight Wire Conductor
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Conductor, from New Zealand-based Eight Wire, offers point-and-click data integrations.
Elastifile
Data pipeline
Storage
2013
USA
Elastifile's cloud-native file storage helps organizations adapt and accelerate their business in the cloud era. Powered by a scalable, enterprise-grade distributed file system with intelligent object tiering, Elastifile augments existing public cloud services with a scalable, POSIX-compliant NAS, facilitating frictionless cloud adoption. With Elastifile, organizations enjoy low-touch file storage services, or deploy and manage cloud-native file storage themselves, eliminating the need for manual storage management and IT forecasting. Elastifile's unique combination of features and flexibility empowers organizations to seamlessly integrate cloud resources, with no application refactoring… thereby modernizing their infrastructure and achieving IT agility and efficiency goals.
Elementl
All-in-one
Workflow orchestration
2018
USA
Building Dagster, the data orchestrator. Dagster is a data orchestrator for machine learning, analytics, and ETL
Elixir Repertoire Data ETL
Data pipeline
ETL Platofrm/Tools
Commercial
On Premises
Elixir Data ETL provides on-demand, self-service data manipulation. It provides design, test, and implement data extraction, aggregation, and transformation.
erwin
Data pipeline
Data management
2016
USA
Integrated enterprise architecture, business process and data modeling with data cataloging and data literacy for risk management and digital transformation.
Etleap
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Etleap is a Redshift ETL tool that makes it easy to bring data from disparate data sources into a Redshift data warehouse.
Etlworks
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS,On Premises
Etlworks Integrator is a powerful and easy-to-use cloud data integration service that can work with structured and semi-structured data of any type and size.
Evidently AI
Serving
Monitoring
2020
Russia
Open-source tools to analyze, monitor, and debug machine learning model in production
Excelero
Data pipeline
Storage
2014
USA
Local NVMe performance at data center scale through true convergence. Software-defined block storage for Cloud and Enterprise applications at any scale.
explainX.ai
Modeling & Training
Interpretability
2020
USA
ExplainX enables you to explain, present, and monitor how your AI models work. We make sure your models never fail in the real-world.
Facets
Data pipeline
Visualization
Open Source
2017
USA
Facets: An Open Source Visualization Tool for Machine Learning Training Data
fastText
Modeling & Training
NLP
2016
USA
Library for fast text representation and classification.
FEAST
Data pipeline
Feature engineering
2019
Asia
Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.
Featuretools
Modeling & Training
Feature engineering
2018
USA
An open source python library for automated feature engineering
FedAI (FATE)
Modeling & Training
Framework
2019
China
FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure computing framework to support the federated AI ecosystem. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). It supports federated learning architectures and secure computation of various machine learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning.
Fiddler Labs
Modeling & Training
Interpretability
2018
USA
AI with trust, visibility, and insightts built in. Fiddler is a breakthrough AI engine with explainability at its heart.
Figure Eight
Data pipeline
Labeling
2008
USA
Figure Eight combines the best of human and machine intelligence to provide high-quality annotated training data that powers the world's most innovative machine learning and business solutions
Fivetran
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
All your data organized in a full data warehouse in minutes, not months.
flair
Modeling & Training
NLP
Open Source
2018
Germany
A very simple framework for state-of-the-art Natural Language Processing (NLP)
FloydHub
Infrastructure
Cloud management
2016
USA
FloydHub is a zero setup Deep Learning platform for productive data science teams.
Fluree
Data pipeline
Database/Query
2017
USA
Welcome to better data management. The Fluree platform organizes blockchain-secured data in a highly-scalable, highly-insightful graph database.
Flyte
Infrastructure
Workflow orchestration
2019
USA
Lyft’s Cloud Native Machine Learning and Data Processing Platform, Now Open Sourced
Formant
Serving
Robotics
2019
USA
Deploy faster. Improve uptime. Achieve scale.
Fritz AI
Serving
Mobile
2017
USA
Fritz AI is the machine learning platform for iOS and Android developers. Teach your mobile apps to see, hear, sense, and think.
Gemini Data
Data pipeline
Data management
2015
USA
Gemini Data provides Data Availability for AI/ML driven analysis and applications to enable unified enterprise knowledge and access.
Gensim
Modeling & Training
Framework
2012
Czech
Topic Modelling for Humans. Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Git LFS
Data pipeline
Versioning
Open Source
2014
Remote
Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
Gluent
Data pipeline
Visualization
2014
USA
Data virtualization software eliminates data silos. Gluent's transparent data virtualization provides virtual access to all enterprise data, with zero code changes.
GluonCV
Modeling & Training
Pretrained models
2018
USA
GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It aims to help engineers, researchers, and students quickly prototype products, validate new ideas and learn computer vision.
Google Cloud Data Fusion
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Google Cloud Data Fusion is a cloud-native data integration tool. It is a fully managed Google Cloud ETL tool that allows data integration at any scale. It is built with an open-source core, CDAP for your pipeline portability. It offers a visual point and clicks interface that allows code-free deployment of your ETL/ELT data pipelines. Apart from native integration with Google Cloud Services, it also offers 150+ pre-configured connectors and transformations at zero additional cost.
Gradio
Serving
App interface
2018
USA
Gradio allows you to quickly create customizable UI components around your TensorFlow or PyTorch models, or even arbitrary Python functions. Mix and match
Graphcore
Hardware
Accelerator
2016
UK
Graphcore has built a new type of processor for machine intelligence to accelerate machine learning and AI applications for a world of intelligent machines.
Graviti Data Platform
Data pipeline
Data management
Commercial
SaaS
2019
China
As a platform for unstructured data management, Graviti Data Platform provides services in data hosting, version control, data visualization, and collaboration. You can also integrate Graviti Data Platform into your own pipeline using developer tools.
GreenWaves Technologies
Hardware
Edge devices
2014
France
GreenWaves' GAP8 is the industry's first ultra-low-power processor enabling battery-operated AI in IoT applications.
Gretel AI
Data pipeline
Privacy
2019
USA
The first and only APIs to enable you to balance, anonymize, and share your data. With privacy guarantees.
Grid AI
Modeling & Training
Distributed training
2020
USA
Seamlessly train hundreds of Machine Learning models on the cloud from your laptop. Focus on machine learning, not infrastructure.
Groq
Hardware
Accelerator
2016
USA
The Next Generation of Computing is here.
H2O
All-in-one
AI Apps platform
2012
USA
H2O.ai is the creator of H2O the leading open source machine learning and artificial intelligence platform trusted by data scientists across 14K enterprises
Habana Labs
Hardware
Edge devices
2016
Israel
Habana Labs was founded in 2016 to create world-class AI Processors, developed from the ground-up and optimized for training deep neural networks and for inference deployment in production environments.
Hailo
Hardware
Edge devices
2017
Israel
The World’s Top Performing AI Processor for Edge Devices Hailo offers a breakthrough microprocessor uniquely designed to accelerate embedded AI applications on edge devices. Breathe life into your edge AI product today with Hailo-8.
Hammerspace
Data pipeline
Database/Query
2015
USA
Hammerspace allows data to move freely, like the air you breathe, across clouds and services. Make data accessible exactly where you need it, when you need it – on demand.
Heartex Label Studio
Data pipeline
Labeling
2018
USA
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Hevo Data
Data pipeline
Data management
Commercial
SaaS
Hevo Data is a No-code Data Pipeline that offers a fully-managed solution to set up data integration from Google Cloud Platform and 100+ data sources (including 30+ free data sources) and will let you directly load data to a Data Warehouse such as Snowflake, Amazon Redshift, Google BigQuery, etc
Hitachi Vantara
Data pipeline
Data management
Commercial
SaaS
Hitachi Vantara’s Pentaho platform for data integration and analytics offers traditional capabilities and big data connectivity. The solution supports the latest Hadoop distributions from Cloudera, Hortonworks, MapR, and Amazon Web Services. However, one of the tool’s shortcomings is that its big data focus takes attention away from other use cases. Pentaho can be deployed on-prem, in the cloud, or via a hybrid model.
HIVE
All-in-one
Labeling
2013
USA
Hive is a full-stack deep learning company focused on solving visual intelligence problems. Let us help you join the AI Revolution. End-To-End Solutions. Full-Stack Approach.
Horovod
Modeling & Training
Distributed
2017
USA
Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.
Hugging Face
Modeling & Training
NLP
2016
USA
We're on a journey to solve and democratize artificial intelligence through natural language.
HYCU
Infrastructure
Cloud management
2009
USA
Keep hyper-converged infrastructure running with HYCU's powerful, simple backup & recovery and monitoring solutions. Deploy in seconds for superior results.
HyperOpt
Modeling & Training
Hyperparameter tuning
Open Source
2013
Canada
Distributed Asynchronous Hyperparameter Optimization in Python - hyperopt/hyperopt
IBM InfoSphere DataStage
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
IBM InfoSphere Information Server is a data integration platform that helps businesses understand, cleanse, transform, and deliver trusted information.
IBM Infosphere Information Server
All-in-one
Data management
Commercial
On Premises
Information Server is a branch of IBM’s product that revolves around data warehousing and data integration. It’s an enterprise product for large organizations that supports integration with cloud data storage, including Google Cloud, AWS S3, etc.
Igneous
Data pipeline
Data management
2013
USA
Igneous Unstructured Data Protection offers the scalability to handle hundreds of file systems, billions of files, and exabytes of enterprise data requiring backup
Iguazio
All-in-one
AI Apps platform
2014
Israel
The Iguazio Data Science Platform automates your machine learning pipeline, transforming AI projects into real-world business outcomes.
iMerit
Data pipeline
Labeling
2012
USA
iMerit specializes in data labeling and annotation for purposes of training models for Machine Learning and Artificial Intelligence.
Imply
Data pipeline
Data management
2015
USA
Imply delivers real-time analytics powered by Apache Druid. ... Stream or batch load data into Druid for high performance, ad-hoc analytic queries.
Improvado
All-in-one
Analytics/Ai
Commercial
SaaS
Loading...
Incorta
Data pipeline
Data processing
2013
USA
Incorta aggregates large complex business data in real time, eliminating the need to reshape it. No Data Warehouse. No Transformations. Real-Time Insight.
Inferrd
Serving
Deployment
2020
USA
You build the model, we handle the deployment. Inferrd is the easiest, cheapest and the most performant hosting provider for ML models.
Informatica
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Informatica is an enterprise on-premise Google Cloud ETL tool that can build enterprise warehouses. It also supports integration with various traditional databases. It has the capability of delivering data on-demand. Some of its key features include advanced transformation, dynamic partitioning, zero downtime, universal connectivity, data masking, etc.
integrate.io
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Turn your data warehouse into a data platform that powers all company decision making and operational systems.
InterpretML
Modeling & Training
Interpretability
2019
USA
Fit interpretable machine learning models. Explain blackbox machine learning
Jaspersoft
Data pipeline
Data management
Commercial
SaaS
Loading...
JAX
Modeling & Training
Framework
2018
USA
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Keboola
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Keboola is a cloud-based data integration platform that connects data sources to analytics platforms. It supports the entire data workflow process, from the point of data extraction, preparation, cleansing, warehousing, and all the way to its integration, enrichment, and loading.
Kedro
All-in-one
AI Apps platform
2019
UK
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.
Kimono Labs
Data pipeline
Data generation
2014
USA
Kimono Labs is an online platform that allows its users to convert their websites into APIs.
Kneron
Hardware
Edge devices
2015
USA
Kneron develops an application-specific integrated circuit and software that offers artificial intelligence-based tools.
Koalas
Data pipeline
Data processing
Open Source
2019
USA
The Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.
Komprise
Data pipeline
Storage
2014
USA
In 15 minutes, our free data management software trial will show you how you can save 70% on data management costs, on-premises and in the cloud.
Kubeflow
Serving
Deployment
2018
USA
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Kyvos Insights
Data pipeline
Database/Query
2015
USA
Kyvos accelerates BI on trillions of rows of data on the cloud and on-premise platforms with a semantic layer powered by its next-generation OLAP technology.
Labelbox
Data pipeline
Labeling
2018
USA
A complete solution for your training data problem with fast labeling tools, human workforce, data management, a powerful API and automation features.
LabelImg
Data pipeline
Labeling
Open Source
2016
Canada
LabelImg is a graphical image annotation tool and label object bounding boxes in images
LeapMind
Hardware
Edge devices
2012
Japan
Ultra-low power consumption AI inference accelerator IP specialized for inference arithmetic processing of CNN that operates as a circuit on FPGA device or ASIC device .
Lightelligence
Hardware
Accelerator
2017
USA
Accelerate AI, Neuromorphic, AI Chip, Optical Computing, Lightmatter
LightGBM
Modeling & Training
Framework
Open Source
2016
USA
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
LIME
Modeling & Training
Interpretability
2016
USA
Lime: Explaining the predictions of any machine learning classifier
Losswise
Serving
Monitoring
2017
USA
Turn your GPUs into monitored build servers from a git push with Losswise. Interactive visualization, logs, smart notifications, and more. Start free today.
Ludwig
Modeling & Training
Framework
2019
USA
Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
Luigi
Infrastructure
Workflow orchestration
Open Source
SaaS/On Premises
2012
Sweden
Luigi is a lightweight, well-functioning Python ETL framework tool that supports data visualization, CLI integration, data workflow management, ETL task success/failure monitoring, and dependency resolution.
Luminous Computing
Hardware
Accelerator
2018
USA
Hardware is bottlenecked by data movement & compute. We use photonics to solve both
Materialize
Data pipeline
Stream processing
2019
USA
Materialize delivers SQL exploration for streaming events and real-time data. Incrementally updated materialized views - in ANSI Standard SQL and in real-time. Micro-batching.
Matillion
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Matillion offers data integration software for cloud data warehouses, and was designed for Amazon Redshift, Snowflake, and Google BigQuery.
Matroid
Modeling & Training
Computer vision
2016
USA
Computer vision made simple. Deploy computer vision solutions in minutes, not months.
Metaflow
Infrastructure
Workflow orchestration
2019
USA
Metaflow makes it quick and easy to build and manage real-life data science projects. Metaflow is built for data scientists, not just for machines.
Metl
All-in-one
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Metl or Mito-ETL is a fast-proliferating Python ETL development platform used to develop bespoke code components. These code components can range from RDBMS data integrations, Flat file data integrations, API/Service-based data integrations, and Pub/Sub (Queue-based) data integrations.
Michelangelo
All-in-one
Workflow orchestration
2015
USA
Michelangelo, Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep learning models that span a myriad of use cases ranging from generating marketplace forecasts, responding to customer support tickets, to calculating accurate estimated times of arrival (ETAs) and powering our One-Click Chat feature using natural language processing (NLP) models on the driver app.
Microsoft (SQL Server Integration)
Data pipeline
Database/Query
Commercial
On Premises
USA
Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions.
Milvus
Data pipeline
Database/Query
2019
China
Milvus is an open source similarity search engine for massive feature vectors. Designed with heterogeneous computing architecture for the best cost efficiency. Searches over billion-scale vectors take only milliseconds with minimum computing resources.
Mindspore
Modeling & Training
Framework
2020
China
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios
ML Kit
Serving
Mobile
2018
USA
ML Kit beta brings Google's machine learning expertise to mobile developers in a powerful and easy-to-use package.
ML.NET
Modeling & Training
Framework
2018
USA
ML.NET is an open source and cross-platform machine learning framework for .NET
MLFlow
All-in-one
Experiment tracking
2018
USA
An open source platform for the machine learning lifecycle
MLlib
Modeling & Training
Framework
2010
MLlib is Apache Spark's scalable machine learning library.
MLPerf
Modeling & Training
Benchmarking
2018
USA
Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services.
MMdnn
Serving
Compatibility
2017
USA
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
MNN
Serving
Inference
2019
China
MNN is a lightweight deep neural network inference engine.
Modin
Data pipeline
Data processing
Open Source
2018
Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
Mona Labs
Serving
Monitoring
2018
USA
PRODUCTION MONITORING FOR AI. With Mona, you gain complete transparency into how your data and models behave in the real world.
Mozart Data
Data pipeline
Analytics/Ai
Commercial
SaaS
Mozart isn’t strictly an ETL tool, but it can help you automate the process of extracting, transforming, and loading your data into a warehouse all in one central tool.
Mythic
Hardware
Edge devices
2012
USA
An architecture built from the ground up for AI Mythic has developed a truly unique AI compute platform that enables smart camera systems, intelligent appliances, brilliant robotics, and more.
Naveego
Data pipeline
Data processing
2014
USA
A leading provider of cloud-first, distributed data accuracy solutions for seamless, end-to-end data cleansing, Naveego enables organizations to proactively manage, detect and eliminate data accuracy issues across all enterprise data sources in real-time–regardless of structure or schema.
ncnn
Serving
Mobile
2017
USA
ncnn is a high-performance neural network inference framework optimized for the mobile platform
NeMo
Modeling & Training
NLU
2019
USA
NeMo: a toolkit for conversational AI
Neptune
Modeling & Training
Experiment tracking
2017
Poland
All experiment-related objects relevant to your projects organized, ready to be analyzed, discussed and shared with your team.
Netron
Modeling & Training
Visualization
2011
USA
Netron is a viewer for neural network, deep learning and machine learning models.
Neural Network Distiller
Serving
Model compression
2018
USA
Distiller is an open-source Python package for neural network compression research. Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.
nteract
Modeling & Training
Notebook
2015
USA
nteract is an open-source organization committed to creating fantastic interactive computing experiences that allow people to collaborate with ease. We build SDKs, applications, and libraries that help you and your team make the most of interactive (particularly Jupyter) notebooks and REPLs.
Nuvia
Hardware
Accelerator
2019
USA
Silicon design reimagined for a compute-intensive world.
Obliviously AI
All-in-one
AI Apps platform
2018
USA
The entire process of running Data Science - building Machine Learning algorithm, explaining results and predicting outcomes, packed in one single click.
OctoML
Serving
Deployment
2019
USA
Optimize machine learning and deep learning models for deployment. From the creators of Apache TVM, XGBoost and Apache MxNet, OctoML brings the cutting edge of AI, Systems, programming languages, compilers and architecture to make machine learning systems easier to optimize and deploy.
Octopai
Data pipeline
Data management
2015
Israel
An automated, centralized, cross-platform metadata search engine that enables BI groups to quickly and precisely discover and govern shared metadata.
ONNX
Serving
Compatibility
2018
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
OpenBridge
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Openbridge is a data logistics platform that manages the real-time flow of consumer data, big or small, delivering it exactly where it needs to be to create value for customers.
OpenSeq2Seq
Modeling & Training
NLP
2017
USA
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
OpenText Integration Center
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
A native integration platform to extract, enhance, transform, integrate, and migrate data and content across the enterprise.
Oracle Data Integrator
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
Oracle Data Integrator is a comprehensive data integration platform that covers all data integration requirements, including batch loads, integration processes, and SOA-enabled data services.
Owox
All-in-one
ETL + Analytics
SaaS
Pachyderm
Data pipeline
Versioning
2014
USA
Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. And… It's open source!
Paddle
Modeling & Training
Distributed
2016
China
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice
Pandas
All-in-one
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Pandas is an ETL batch processing library with Python-written data structures and analysis tools. Python's Pandas expedite processing of unstructured/semi-structured data. The libraries are used for low-intensity ETL tasks including data cleansing and working with small structured datasets post-transformation from semi or unstructured sets.
Panoply.io
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Panoply automates data management tasks associated with running big data in the cloud. Smart Data Warehouse require no schema, modeling, or configuration. Panoply features an ETL-less integration pipeline that can connect to structured and semi-structured data sources. It also offers columnar storage and automatic data backup to a redundant S3 storage framework.
papermill
Modeling & Training
Notebook
2017
USA
Papermill is a tool for parameterizing and executing Jupyter Notebooks.
Paperspace
Infrastructure
Cloud management
2014
USA
GPU cloud tools built for developers. Powering next-generation workflows and the future of intelligent applications.
Apache Parquet
Data pipeline
File format
2013
USA
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Paxata
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Paxata is the first interactive, self-service data preparation solution built for everyone who works with data, from business analysts to data scientists.
Peltarion
All-in-one
AI Apps platform
2005
Sweden
A single AI platform, for real world deployments, without code. Fast & Efficient Production of AI Applications. Rich data capability. Develop AI Services fast. Usable & Affordable AI.
PerceptiLabs
Modeling & Training
Visual modeling
2019
USA
PerceptiLabs takes the process of building and training a machine learning model to warp speed. We not only accelerate machine learning, we advance explainability in AI
Pervasive Data Integrator
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
Pervasive Data Integrator supports both data integration and application integration, and runs on premises, in the cloud, or hybrid.
Petl
Data pipeline
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Petl is a stream processing engine ideal for handling mixed quality data. This Python ETL tool helps data analysts with little to no prior coding experience quickly analyze datasets stored in CSV, XML, JSON, and many other data formats. You can sort, join, and aggregate transformations with minimal effort.
Petuum
All-in-one
Data management
2016
USA
Petuum accelerates and simplifies AI solutions so your enterprise can deploy it easily and maintain it effortlessly.
Picsell.ia
All-in-one
Computer Vision
2020
France
Picsell.ia is a development platform dedicated to Computer Vision. From open-source to business, you can create and review datasets, track your experiments and follow your project in a Lean AI mode.
Pilosa
Data pipeline
Database/Query
2017
USA
Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets.
PlaidML
Modeling & Training
Hardware compatiblity
2017
USA
PlaidML is a framework for making deep learning work everywhere
Playment
Data pipeline
Labeling
2015
India
Build high-quality ground truth datasets with ML-assisted tools, sophisticated project management software, expert human workforce, and much more.
Plotly
Serving
App interface
2013
Canada
Plotly is a data science and AI company that makes it easy to create and deploy interactive web apps in any programming language.
Polyaxon
All-in-one
Serving
2016
Germany
A platform for reproducing and managing the whole life cycle of machine learning and deep learning applications.
Precisely
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Precisely offers its data integration capabilities via two product families, Precisely Connect and Precisely Ironstream. The company’s flagship application and data integration tools are the Precisely Connect product family.
PredictionIO
Serving
Web
2013
USA
Apache PredictionIO is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture.
Prefect
Infrastructure
Workflow orchestration
2018
USA
The Global Leader in Dataflow Automation
Presto
Data pipeline
Database/Query
2012
USA
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Prodigy
Data pipeline
Labeling
2017
Germany
Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. ... With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection.
Prometheus
Data pipeline
Monitoring
2012
Germany
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
Prophesee
Hardware
Edge devices
2014
France
With the world’s most advanced Event-Based Vision systems, inspired by human vision and built on the foundation of neuromorphic engineering. PROPHESEE is the revolutionary system that gives Metavision to machines, revealing what was previously invisible to them.
pygrametl
Data pipeline
ETL Platofrm/Tools
Open Source
On Premises
pygrametl allows for ETL programming in Python.
Pyro
Modeling & Training
Programming language
2017
USA
Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch
PySyft
Modeling & Training
Privacy
2017
UK
PySyft is a Python library for secure and private Deep Learning. PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Multi-Party Computation (MPC) within the main Deep Learning frameworks like PyTorch and TensorFlow.
Pythia
Modeling & Training
Framework
2018
USA
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
PyTorch
Modeling & Training
Framework
2015
USA
Tools & Libraries. A rich ecosystem of tools and libraries extends PyTorch and supports development in computer vision, NLP and more
PyTorch Lightning
Modeling & Training
Framework
2019
USA
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Qlik Data Integration
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Deliver analytics-ready data to the cloud in real-time with modern DataOps for analytics from Qlik.
Qri
Data pipeline
Versioning
2016
USA
Bigger than a spreadsheet, smaller than a database, datasets are all around us. Use Qri to browse, download, create, fork, & publish datasets across a network of peers.
Quilt Data
Data pipeline
Versioning
2015
USA
Quilt is a versioned data portal for AWS
Quobyte
Data pipeline
Storage
2013
Germany
Quobyte is software defined storage that turns commodity servers into a reliable and highly automated data center file system.
Rasa
Modeling & Training
NLU
2016
Germany
Build contextual AI assistants and chatbots in text and voice with our open source machine learning framework. Scale it with our enterprise grade platform.
Ray
Modeling & Training
Distributed
2016
USA
Ray is a fast and simple framework for building and running distributed applications.
Relational Junction ETL Manager
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
Relational Junction ETL Manager lets you extract, transform, and load production data into your data warehouse.
RelicX
Serving
CI/CD
2020
USA
RelicX is a venture funded startup building an AI DevOps platform that brings CX intelligence into the CI/CD pipeline to ensure software release readiness based on real user behavior and customer experience.
Replicate
Modeling & Training
Versioning
2020
USA
Version control for machine learning
Riko
All-in-one
ETL Platofrm/Tools
Open Source
SaaS/On Premises
Riko is an apt replacement for Yahoo Pipes. It continues to be ideal for startups possessing low technological expertise.
River
Modeling & Training
Online learning
2017
France
A Python package for online/streaming machine learning.
Rivery
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Rivery is a SaaS integration tool that lets you consolidate all your data from both internal and external sources into a single data platform in the cloud.
Robust AI
All-in-one
Robotics
2019
USA
Robust.AI: Creating a New Foundation for the Future of Robotics.
Rockset
Data pipeline
Database/Query
2016
USA
Rockset: The Real-Time Indexing Database in the Cloud Rockset allows you to build data-driven applications on MongoDB, DynamoDB, ... AI. Test, validate and deploy models faster by analyzing live data in real-time.
Rubrik
Data pipeline
Data management
2013
USA
We provide a powerful, policy-driven platform to simplify recovery and unlock insights from data residing in the data center and cloud.
RudderStack
All-in-one
ELT & Reverse-ETL
Commercial
SaaS
All your customer data pipelines in one platform
Sagent Data Flow
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
Sagent Data Flow from Pitney Bowes Software is a powerful and flexible integration engine that collates data from disparate sources and provides data transformation tools.
SambaNova
Hardware
Accelerator
2017
USA
SambaNova Systems is a computing startup focused on building machine learning and big data analytics platforms.
SAP BusinessObjects Data Services
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
Unlock meaning from all of your organization’s data – structured or unstructured – with data integration, quality, cleansing, and more.
SAS Data Management
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
SAS Data Management helps transform, integrate, govern, and secure data while improving its overall quality and reliability.
Scale AI
Data pipeline
Data generation
2016
USA
Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more.
scikit-learn
Modeling & Training
Framework
2010
Remote
Machine Learning in Python
Scrapinghub
Data pipeline
Data generation
2010
Ireland
Turn websites into data with the world's leading web scraping services & tools from the creators of Scrapy. Data extraction trusted by industry leaders.
scribble Data
Modeling & Training
Feature engineering
2016
India
The feature store for your ML engineering needs
Scriptella
All-in-one
ETL Platofrm/Tools
Open Source
On Premises
Scriptella is an open source ETL and script execution tool written in Java.
Segment
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Segment collects user data with one API and sends it to hundreds of tools or a data warehouse.
Segments.ai
Data pipeline
Labeling
2020
Belgium
Deep learning-fueled labeling technology with a focus on instance and semantic segmentation.
Seldon
Serving
Serving
2011
UK
Manage, serve and scale models built in any framework on Kubernetes. Take your ML projects from POC to production.
SHAP
Modeling & Training
Interpretability
2017
USA
A game theoretic approach to explain the output of any machine learning model.
SigOpt
Modeling & Training
Hyperparameter tuning
2014
USA
SigOpt is a standardized, scalable, enterprise-grade optimization platform and API designed to unlock the potential of your modeling pipelines.
SiMa.ai
Hardware
Edge devices
2018
USA
Is your ML Green?TM We believe that the future of compute is high performance machine learning at the edge – and today, power is the limiter.
Singer
All-in-one
ETL Platofrm/Tools
Open Source
SaaS,On Premises
Singer is an open source standard for writing scripts that move data.
Sisu
Data pipeline
Analytics platform
2018
USA
Sisu is the fastest, most comprehensive augmented analytics platform letting you ... You can't keep up with changing metrics using manual data exploration.
Skyvia
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Skyvia’s Data Integration tool contains a wide range of data-related scenarios which can be created directly from the user interface.
SnapLogic Elastic Integration Platform
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
SnapLogic Elastic Integration Platform handles both structured and unstructured data, with point-to-point integration functionality in hybrid integration use cases.
Snorkel
Data pipeline
Labeling
2016
USA
Programmatically Building and Managing Training Data
Snorkel AI
All-in-one
AI Apps platform
2019
USA
Programmatically Building and Managing Training Data
spaCy
Modeling & Training
NLP
2014
Germany
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Spark
Data pipeline
Data processing
2009
Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Spell
Modeling & Training
Experiment tracking
2017
USA
Spell is a powerful platform for building and managing machine learning projects. Spell takes care of infrastructure, making machine learning projects easier to start, faster to get results, more organized and safer than managing infrastructure on your own.
SQLFlow
Data pipeline
Database/Query
2019
China
Extends SQL to support AI. Extract knowledge from Data. Currently support MySQL, Apache Hive, Alibaba MaxCompute, XGBoost and TensorFlow.
Starburst Data
Data pipeline
Database/Query
2017
USA
Limitless Queries. Break boundaries and harness the power of the world's fastest SQL query engine.
Starfish
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
#N/A
Stitch
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Stitch is a simple, powerful ETL service for businesses of all sizes, up to and including the enterprise. Running on a scalable, fault-tolerant cloud platform, Stitch integrates data from dozens of different sources.
Storbyte
Data pipeline
Storage
2014
DC
Storbyte designs and manufactures all-flash & hybrid flash enterprise storage arrays that offer performance, power management, availability, reliability, density, efficiency, flexibility, expandability, and affordability. Storbyte is providing innovative data storage solutions and has not lost sight of what is important to end users: a responsible, cost-correct price point.
Stradigi AI
All-in-one
AI apps platform
2017
Canada
Stradigi AI's powerful AI business platform, Kepler, fuels tangible results for enterprises. No AI or machine learning experience required.
Streamlit
Modeling & Training
App interface
2018
USA
Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in hours, not weeks. All in pure Python.
StreamSets
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
StreamSets is a DataOps and real-time Google Cloud ETL tool. It provides data monitoring and supports a variety of data sources and destinations for data integration. Many enterprises use it to integrate dozens of data sources for analysis. It supports data protectors with data security guidelines like GDPR and HIPAA.
StreamSets Data Collector
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
The StreamSets Data Collector is a low-latency ingest infrastructure tool that lets you create continuous data ingest pipelines using a drag and drop UI within an integrated development environment (IDE).
Striim
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Unify your data in Google Cloud with a full suite of real-time data integration solutions. Whether it's automated database migrations to Google Cloud or data integration for BigQuery, Striim will help you get there faster.
Superb AI
Data pipeline
Data management
2018
USA
Create, label and manage ML training data efficiently so you can build AI faster. Fully managed workforce. Powerful labeling tools. Training data quality control.
Supermetrics
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Supermetrics is a managed data pipeline that makes it easy for marketers, data analysts, and data engineers to move any marketing metrics into a data warehouse in Snowflake, BigQuery, or Azure Synapse Analytics
Supervisely
All-in-one
Computer vision
2017
USA
First available ecosystem to cover all aspects of training data development. Manage, annotate, validate and experiment with your data without coding.
superwise.ai
Serving
Monitoring
2019
Israel
Monitor your AI from the moment it meets reality so you can finally trust every model
Syncsort DMX
All-in-one
ETL Platofrm/Tools
Commercial
On Premises
DMX supports mainframe, legacy, and big data sources, and provides a no-code approach to join datasets.
Synthetaic
Data pipeline
Data generation
2019
USA
We grow high-quality data that unlocks impossible AI. What if edge cases no longer existed? What if training data was no longer a constraint?
Syntiant
Hardware
Edge devices
2017
USA
Always-On Voice powered by custom AI Silicon
Talend
All-in-one
ETL Platofrm/Tools
Open Source
On Premises
Talend is a big data and cloud data integration software. Talend is built on Eclipse graphic environment. It also supports scaling massive data sets and advanced data analytics. It has partnered with leading cloud service providers, analytics platforms, data warehouses such as Google Cloud Platform, Amazon Web Services (AWS), Snowflake, etc. It acts as a connector to other software as Saas.
talos
Modeling & Training
Hyperparameter tuning
2018
Finland
Hyperparameter Optimization for TensorFlow, Keras and PyTorch
Tamr
Data pipeline
Data management
2012
USA
Tamr's leading data management system and services work to create a data migration strategy that simplifies your data unification process. Talk with us today.
Tazi.ai
Modeling & Training
AutoML
2015
Turkey
TAZI’s Automated Machine Learning is understandable continuous machine learning from data and humans, enables business domain experts to use machine learning to make predictions and take actions. It also helps data analysts and scientists for their daily model creation and deployment.
Tecton
All-in-one
Deployment
2019
USA
The Data Platform for Machine Learning. Build a library of great features. Serve them in production. Do it at scale.
Tensorboard
Modeling & Training
Experiment tracking
2015
USA
TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. It enables tracking experiment metrics like loss and accuracy, visualizing the model graph, projecting embeddings to a lower dimensional space, and much more.
TensorFlow
Modeling & Training
Framework
2015
USA
An end-to-end open source machine learning platform for everyone. Discover TensorFlow's flexible ecosystem of tools, libraries and community resources
TensorFlow Extended
Serving
Deployment
2019
USA
TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines
TensorFlow Lite
Serving
Mobile
2019
USA
TensorFlow Lite is an open source deep learning framework for on-device inference.
TensorRT
Serving
Inference
2019
USA
NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
TerminusDB
Data pipeline
Database/Query
2017
Ireland
TerminusDB is an open source model driven graph database for knowledge graph representation designed specifically for the web-age.
Textur
Data pipeline
ETL Platofrm/Tools
Commercial
SaaS
Textur unifies data from your data silos and provides a powerful SQL interface for modeling your business.
Theano
Modeling & Training
Framework
2008
Canada
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
TPOT
Modeling & Training
AutoML
2016
USA
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
TransmogrifAI
Modeling & Training
AutoML
2017
USA
an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Treasure Data
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Treasure Data connects data and teams together with a full suite of tools that automate data collection and processing.
Trifacta
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Trifacta is an interactive cloud platform for data engineers and analysts to collaboratively profile, prepare, and pipeline data for analytics and machine learning
Truera
Modeling & Training
Explanability
2019
USA
The Truera Model Intelligence Platform powered by Enterprise-Class AI Explainability eliminates the machine learning black box with Model Intelligence.
tsfresh
Modeling & Training
Feature engineering
2008
Germany
Automatic extraction of relevant features from time series
Tumult Labs
Data pipeline
Privacy
2019
USA
Unleashing the power of data with ironclad privacy protection
Tune
Modeling & Training
Hyperparameter tuning
2017
USA
Tune is a Python library for hyperparameter tuning at any scale.
Turi Create
Modeling & Training
Framework
Open Source
2018
USA
Turi Create simplifies the development of custom machine learning models.
Unravel Data
Serving
Monitoring
2013
USA
Unravel provides full-stack visibility and AI-powered guidance to help you understand and optimize the performance of your data-driven applications.
V7Labs
Data pipeline
Labeling
2018
UK
Create the Sense of Sight Label, train, and deploy artificial intelligence that effortlessly learns new objects from your data.
Vaex
Data pipeline
Data processing
2015
Netherlands
Power up your business with our data driven solutions. With our unique, state-of-the-art technology, we provide fast and scalable solutions that will make you more agile, while limiting unnecessary resources.
Valohai
Infrastructure
Workflow orchestration
2016
Finland
The MLOps platform for the whole team. Valohai takes you from POC to production while managing the whole model lifecycle.
Vearch
Data pipeline
Database/Query
Open Source
2019
China
Vearch is the vector search infrastructure for deeping learning and AI applications.
VertaAI
Serving
Monitoring
2019
USA
Verta.AI is a Palo Alto-based startup building software infrastructure to help enterprise data science and machine learning (ML) teams rapidly develop and deploy ML models.
Vexata
Data pipeline
Storage
2014
USA
Vexata is an active data infrastructure company that accelerates database and analytic platforms via groundbreaking storage solutions.
Vowpal Wabbit
Modeling & Training
Online learning
2010
Vowpal Wabbit provides a fast, flexible, online, and active learning solution that empowers you to solve complex interactive machine learning problems
Voxel51 // Scoop
Data pipeline
Data quality
2018
USA
We build software that enables ML engineers to build better models, more quickly. Try FiftyOne, our powerful platform for dataset curation, analysis, and model
Waterline Data
Data pipeline
Data management
2013
USA
Waterline's enterprise data catalog enables data professionals to discover, govern, and rationalize an organization's data lake.
Wave Computing
Hardware
Accelerator
2008
USA
Wave Computing is revolutionizing AI and deep learning with its dataflow-based systems and embedded solutions.
Weights & Biases
Modeling & Training
Experiment tracking
2017
USA
We're building developer tools for deep learning. Add a couple lines of code to your training script and we'll keep track of your hyperparameters, system metrics, and outputs so you can compare experiments,
XGBoost
Modeling & Training
Framework
2014
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.
Xnor.ai
Serving
Model compression
2016
USA
Transform your business with on-device AI.
Xpanse AI
All-in-one
AI Apps platform
2015
Ireland
The power of AI at the click of a button. Xpanse AI brings easy to use and lightning fast analytics to your business.
Xplenty
All-in-one
ETL Platofrm/Tools
Commercial
SaaS
Xplenty's data integration platform streamlines data processing, reducing time spent and allowing businesses to focus on insight over preparation.
Yellowbrick Data
Data pipeline
Data warehouse
2014
USA
The ultimate solution for your data warehouse. Quick to deploy, easy to expand, and simple to manage. Yellowbrick Data can solve your data problems.
Zero ASIC
Hardware
Edge devices
2020
USA
Removing the Barrier to Custom Silicon
Zilliz
Data pipeline
Database/Query
2017
China
The company specializes in the development of open-source, AI-powered unstructured data analysis software, and is the initiator and primary contributor to the vector similarity search project Milvus.
Snowflake
All-in-one
Data management
2012
USA
Google BigQuery
All-in-one
Data management
2010
USA
dbt
Data transformation
ELT Tool
2016
USA
Looker
BI Tool
2011
USA
Mode
BI Tool
2012
USA
Census
All-in-one
ELT & Reverse-ETL
2018
USA
Hightouch
All-in-one
ELT & Reverse-ETL
2018
USA
Grouparoo
All-in-one
ELT & Reverse-ETL
Open Source
USA
Polytomic
All-in-one
ELT & Reverse-ETL
USA
Data within companies is fragmented. Sales, Marketing, Support, Finance, and Operations teams spend enormous amounts of time repeatedly hunting for data that lives outside of their home systems.
Rudderstack
All-in-one
ELT & Reverse-ETL
2019
USA
RudderStack elegantly handles every piece of data from every source and syncs it with every tool in your stack.
Seekwell
ELT & Reverse-ETL
USA
Get your SQL data in the places you need it like Google Sheets, Salesforce, Zendesk, and Slack.
Workato
ELT & Reverse-ETL
USA
Integrate your stack. Automate your work. A SINGLE PLATFORM FOR INTEGRATION & WORKFLOW AUTOMATION ACROSS YOUR ORGANIZATION
Name
Website
Cat
SubCat
Type
Deployment
Started
HQ
Description
The first step before selecting a data stack is to make sure you have created your use cases on what you want to do with the data you have, when that is clear then you can select the tools.