🦉 ML Experiments and Data Management with Git
Data Version Control or DVC is a command line tool and VS Code Extension to help you develop reproducible machine learning projects:
statistical data visualization.
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Transform Data in Your Warehouse. Build trusted data products faster.
Accelerate your data transformation process with dbt Cloud and start delivering data that you and your team can rely on. dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. Analysts using dbt can transform their data by simply writing select statements, while dbt handles turning these statements into tables and views in a data warehouse.
Data Observability Platform for Modern Data Teams. Trust the data that powers your business.
Automated end-to-end data observability — so data teams are the first to know about data issues.
The Unified Apache Beam Model. The easiest way to do batch and streaming data processing. Write once, run anywhere data processing for mission-critical production workloads.
Apache Beam is a unified programming model for Batch and Streaming data processing.
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow, and Hazelcast Jet.
Unified stream and batch data processing that's serverless, fast, and cost-effective.
Turns Data and AI algorithms into production-ready web applications in no time. Taipy is an open-source Python library for building production-ready front-end & back-end in no time.
Taipy is an open-source Python library for easy, end-to-end application development,
featuring what-if analyses, smart pipeline execution, built-in scheduling, and deployment tools.
The unified data layer
Connect your APIs, databases and microservices to a unified API at the edge. Delight your users with fast response times globally. Deploy globally fast GraphQL APIs with a top-notch developer experience.
The most powerful vector database for building AI applications. Open-source PostgreSQL database extension for vector data and vector search operations.
Lantern is an open-source PostgreSQL database extension to store vector data, generate embeddings, and handle vector search operations.
Your Data Pipeline, Simplified. GlareDB: An analytics DBMS for distributed data.
Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.
GlareDB is designed to query your data wherever it lives using SQL that you already know.
Dolt is Git for data. The world's first and only version-controlled SQL database.
Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a Git repository.
Connect to Dolt just like any MySQL database to read or modify schema and data. Version control functionality is exposed in SQL via system tables, functions, and procedures.
Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in a Leaflet map via Folium.
Moses, the machine translation system.
Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices.
text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).
library and tools for information extraction.
This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.
Amplify the Impact of Your People, Expertise & Data.
Altair and RapidMiner share the same vision to make data analytics simple enough for all users, but scalable, governed, and safe enough for all enterprises. RapidMiner is the enterprise-ready data science platform that amplifies the collective impact of your people, expertise and data for breakthrough competitive advantage.
KNIME offers a complete platform for end-to-end data science, from creating analytic models, to deploying them and sharing insights within the organization, through to data apps and services.