Search: [data-science] - Biapy Web Directory

The Data Engineering Handbook https://github.com/DataExpert-io/data-engineer-handbook

Fri Nov 22 15:29:52 2024

📧email

This repo has all the resources you need to become an amazing data engineer!

Monte Carlo https://www.montecarlodata.com/

Wed Nov 20 08:08:24 2024

📧email

Data and AI reliability. Delivered.

Data breaks. Monte Carlo ensures your team is the first to know and solve with end-to-end data observability.

Continuous Compliance Monitoring @ Mike Carpenter's Medium.

Databricks https://www.databricks.com/

Fri Nov 15 08:20:45 2024

📧email

The Databricks Data Intelligence Platform.
Databricks brings AI to your data to help you bring AI to the world.

PandasAI https://pandas-ai.com/

Wed Nov 13 07:48:05 2024

📧email

Conversational Data Analysis.

PandasAI is a Python platform that makes it easy to ask questions to your data in natural language. It helps non-technical users to interact with their data in a more natural way, and it helps technical users to save time, and effort when working with data.

PandasAI is a Python library that integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

PandasAI @ GitHub.

Substrait https://substrait.io/

Fri Nov 8 08:32:59 2024

📧email

Cross-Language Serialization for Relational Algebra.
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Substrait is a format for describing compute operations on structured data. It is designed for interoperability across different languages and systems.

Dagster https://dagster.io/

Fri Nov 8 08:23:41 2024

📧email

Cloud-native orchestration of data pipelines. Ship data pipelines with extraordinary velocity.
An orchestration platform for the development, production, and observation of data assets.

The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.

Dagster is a cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability.

It is designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.

Dagster @ GitHub.

OpenMetadata https://open-metadata.org/

Fri Nov 8 08:18:46 2024

📧email

Open and unified metadata platform for data discovery, observability, and governance.

A single place for all your data and all your data practitioners to build and manage high quality data assets at scale. Built by Collate and the founders of Apache Hadoop, Apache Atlas, and Uber Databook.

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration. It is one of the fastest-growing open-source projects with a vibrant community and adoption by a diverse set of companies in a variety of industry verticals. Based on Open Metadata Standards and APIs, supporting connectors to a wide range of data services, OpenMetadata enables end-to-end metadata management, giving you the freedom to unlock the value of your data assets.

OpenMetadata @ GitHub.

data stack in a box https://github.com/wisemuffin/nsw-doe-data-stack-in-a-box

Fri Nov 8 08:07:34 2024

📧email

Department of Education (DOE) for New South Wales (AUS) data stack in a box.
With the push of one button you can have your own data stack up and running in 5 mins! 🏎️.

Docling https://ds4sd.github.io/docling/

Thu Nov 7 11:05:48 2024

📧email

Docling parses documents and exports them to the desired format with ease and speed.
🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.

Docling @ GitHub.

DataChain https://datachain.ai/

Tue Nov 5 14:12:48 2024

📧email

AI Data Management at Scale - Curate, Enrich, and Version Datasets.

DataChain is a modern Pythonic data-frame library designed for artificial intelligence. It is made to organize your unstructured data into datasets and wrangle it at scale on your local machine. Datachain does not abstract or hide the AI models and API calls, but helps to integrate them into the postmodern data stack.

Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them.

DataChain @ GitHub.

CSV SQL Tool https://csvsqltool.com/

Mon Nov 4 13:46:53 2024

📧email

Run SQL queries on CSV files directly in your browser. No data leaves your browser.
Fast, private, and easy to use.

Clidey WhoDB https://whodb.clidey.com/

Mon Nov 4 09:26:12 2024

📧email

A lightweight next-gen data explorer - Postgres, MySQL, SQLite, MongoDB, Redis, MariaDB & Elastic Search with Chat interface.

WhoDB @ GitHub.

Panel https://panel.holoviz.org/

Wed Oct 30 13:39:39 2024

📧email

The powerful data exploration & web app framework for Python.

Panel is an open-source Python library designed to streamline the development of robust tools, dashboards, and complex applications entirely within Python. With a comprehensive philosophy, Panel integrates seamlessly with the PyData ecosystem, offering powerful, interactive data tables, visualizations, and much more, to unlock, visualize, share, and collaborate on your data for efficient workflows.

Panel @ GitHub.

Taipy https://taipy.io/

Wed Oct 30 13:37:14 2024

📧email

Build Python Data & AI web applications.
Turns Data and AI algorithms into production-ready web applications in no time.

Taipy is designed for data scientists and machine learning engineers to build data & AI web applications.

From simple pilots to production-ready web applications in no time. No more compromise on performance, customization, and scalability.

Taipy @ GitHub.

Marly AI https://www.marly.ai/

Mon Oct 28 15:05:36 2024

📧email

The Data Processor for Agents.

Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).

Marly @ GitHub.

Anyquery https://anyquery.dev/

Fri Oct 25 13:57:07 2024

📧email

Use SQL for everything. Query anything with old-school cool SQL.

Anyquery is a CLI tool to run SQL queries on any data source, no matter if it's a file, an API, logs, or a local app.
See the integrations for the full extent of what you can do.

Anyquery @ GitHub.

Drasi https://drasi.io/

Mon Oct 21 14:04:40 2024

📧email

Drasi makes it easy and efficient to detect and react to changes in databases.

Drasi is a data processing platform that simplifies detecting changes in data and taking immediate action. It is a comprehensive solution that provides built-in capabilities to track system logs and change feeds for specific events, evaluate them for relevance, and automatically initiate appropriate reactions.

Drasi @ GitHub.

Tabled https://github.com/VikParuchuri/tabled

Wed Oct 16 15:34:36 2024

📧email

Detect and extract tables to markdown and csv.

Tabled is a small library for detecting and extracting tables. It uses surya to find all the tables in a PDF, identifies the rows/columns, and formats cells into markdown, csv, or html.

Vortex https://github.com/spiraldb/vortex

Tue Oct 15 14:11:23 2024

📧email

"The LLVM of columnar file formats". A toolkit for working with compressed Arrow on-disk, in-memory, and over-the-wire.

Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.

Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously, what LLVM + Clang are to compilers): a highly extensible & extremely fast framework for building a modern columnar file format, with a state-of-the-art, "batteries included" reference implementation.

Snowflake https://www.snowflake.com/

Thu Oct 10 14:06:52 2024

📧email

The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI.
Snowflake delivers ease of use, instant elasticity, and lower TCO.

How to make Product give a shit about your architecture proposal @ Andy G's Blog.

Links per page

Filters