wrangle data.
sq is a command line tool that provides jq-style access to structured data sources: SQL databases, or document formats like CSV or Excel. It is the lovechild of sql+jq.
An open source multi-tool for exploring and publishing data.
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
NVIDIA® TensorRT™ is an ecosystem of APIs for high-performance deep learning inference. TensorRT includes an inference runtime and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes TensorRT, TensorRT-LLM, TensorRT Model Optimizer, and TensorRT Cloud.
Upon first encountering SQL after two decades of Fortran, C, Java, and Python, I thought I had stumbled into hell. I quickly realized that was optimistic: after all, hell has rules.
I have since realized that SQL does too, and that they are no more confusing or contradictory than those of most other programming languages. They only appear so because it draws on a tradition unfamiliar to those of us raised with derivatives of C. To quote Terry Pratchett, it is not mad, just differently sane.
Welcome, then, to a world in which the strange will become familiar, and the familiar, strange. Welcome, thrice welcome, to SQL.
Turn websites into LLM-ready data.
Power your AI apps with clean data crawled from any website. It's also open-source.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
ArcticDB is a DataFrame Database.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Built for the modern Python Data Science ecosystem, ArcticDB transforms your ability to handle complex real world data with Incredibly fast proven Petabyte scale.
Les technologies numériques sont incroyablement puissantes et redéfinissent le fonctionnement de notre société. Pour les acteurs qui œuvrent pour l'intérêt général, la technologie peut parfois être un levier démutiplicateur d'impacts positifs, cependant et malheureusement ces acteurs n'ont souvent pas les ressources technologiques ou humaines pour accélérer leur action citoyenne. Data for Good existe pour rétablir l'équilibre.
Interactive SQL. Analyze petabyte-scale data where it lives with ease and flexibility.
Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.
Retrieval Augmented Generation (RAG) chatbot powered by Weaviate.
Welcome to Verba: The Golden RAGtriever, an open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally with HuggingFace and Ollama or through LLM providers such as OpenAI, Cohere, and Google.
📊 Cube — The Semantic Layer for Building Data Applications.
The Universal Semantic Layer.
Build trust with a semantic layer. Connect siloed data, define consistent metrics, and power AI and analytics with context.
Cube is the semantic layer for building data applications. It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application.
Cube was designed to work with all SQL-enabled data sources, including cloud data warehouses like Snowflake or Google BigQuery, query engines like Presto or Amazon Athena, and application databases like Postgres. Cube has a built-in relational caching engine to provide sub-second latency and high concurrency for API requests.
IoT & Data Science Platform, Platform as-a-Service, Kuzzle PaaS. Activate the power of the Kuzzle IoT platform online, with no commitment.
Kuzzle is a generic backend offering the basic building blocks common to every application.
Open-source Back-end, self-hostable & ready to use - Real-time, storage, advanced search - Web, Apps, Mobile, IoT -
🦉 ML Experiments and Data Management with Git
Data Version Control or DVC is a command line tool and VS Code Extension to help you develop reproducible machine learning projects:
statistical data visualization.
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Transform Data in Your Warehouse. Build trusted data products faster.
Accelerate your data transformation process with dbt Cloud and start delivering data that you and your team can rely on. dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. Analysts using dbt can transform their data by simply writing select statements, while dbt handles turning these statements into tables and views in a data warehouse.
Data Observability Platform for Modern Data Teams. Trust the data that powers your business.
Automated end-to-end data observability — so data teams are the first to know about data issues.
The Unified Apache Beam Model. The easiest way to do batch and streaming data processing. Write once, run anywhere data processing for mission-critical production workloads.
Apache Beam is a unified programming model for Batch and Streaming data processing.
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow, and Hazelcast Jet.
Unified stream and batch data processing that's serverless, fast, and cost-effective.
Turns Data and AI algorithms into production-ready web applications in no time. Taipy is an open-source Python library for building production-ready front-end & back-end in no time.
Taipy is an open-source Python library for easy, end-to-end application development,
featuring what-if analyses, smart pipeline execution, built-in scheduling, and deployment tools.
The unified data layer
Connect your APIs, databases and microservices to a unified API at the edge. Delight your users with fast response times globally. Deploy globally fast GraphQL APIs with a top-notch developer experience.
The most powerful vector database for building AI applications. Open-source PostgreSQL database extension for vector data and vector search operations.
Lantern is an open-source PostgreSQL database extension to store vector data, generate embeddings, and handle vector search operations.