Search: [big-data] - Biapy Web Directory

Bufstream https://buf.build/product/bufstream

Thu Nov 14 08:33:59 2024

📧email

The best way of working with Protocol Buffers. Elastic, self-hosted Kafka with Advanced Semantic Intelligence
Guarantee streaming data quality and slash cloud costs 10x with Bufstream, a drop-in replacement for Apache Kafka®.

Bufstream is a Kafka-compatible streaming system which stores records directly in an object storage service like S3.

Apache CouchDB https://couchdb.apache.org/

Mon Nov 4 09:30:15 2024

📧email

Seamless multi-master sync, that scales from Big Data to Mobile, with an Intuitive HTTP/JSON API and designed for Reliability.

CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. CouchDB works well with modern web and mobile apps. You can distribute your data, efficiently using CouchDB’s incremental replication. CouchDB supports master-master setups with automatic conflict detection.

Trench https://www.trench.dev/

Tue Oct 29 14:11:08 2024

📧email

Open source analytics infrastructure. Fast and scalable. No bloat. GDPR compliant.

A single production-ready Docker image built on ClickHouse, Kafka, and Node.js for tracking events, users, page views, and interactions.

Trench @ GitHub.

Apache Hadoop https://hadoop.apache.org/

Fri Oct 11 14:04:52 2024

📧email

The Apache® Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Apache Hudi https://hudi.apache.org/

Mon Sep 9 13:53:23 2024

📧email

An Open Source Data Lake Platform.

Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.

Data For Good https://dataforgood.fr/

Thu Aug 22 19:04:11 2024

📧email

Les technologies numériques sont incroyablement puissantes et redéfinissent le fonctionnement de notre société. Pour les acteurs qui œuvrent pour l'intérêt général, la technologie peut parfois être un levier démutiplicateur d'impacts positifs, cependant et malheureusement ces acteurs n'ont souvent pas les ressources technologiques ou humaines pour accélérer leur action citoyenne. Data for Good existe pour rétablir l'équilibre.

Amazon Athena https://aws.amazon.com/athena/

Thu Aug 22 17:04:53 2024

📧email

Interactive SQL. Analyze petabyte-scale data where it lives with ease and flexibility.

Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.

286 - Data & Dev - Christophe Blefari @ <ifttd> :fr:.

Apache DataFusion https://datafusion.apache.org/

Mon Jul 15 08:53:23 2024

📧email

DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.

DataFusion is great for building projects such as domain specific query engines, new database platforms and data pipelines, query languages and more. It lets you start quickly from a fully working engine, and then customize those features specific to your use.

DataFusion @ GitHub.

JuiceFS https://juicefs.com/en/

Sun Jun 2 09:12:30 2024

📧email

Open Source Distributed POSIX File System for Cloud. JuiceFS is a distributed POSIX file system built on top of Redis and S3.

JuiceFS is a high-performance POSIX file system released under Apache License 2.0, particularly designed for the cloud-native environment. The data, stored via JuiceFS, will be persisted in Object Storage (e.g. Amazon S3), and the corresponding metadata can be persisted in various compatible database engines such as Redis, MySQL, and TiKV based on the scenarios and requirements.

With JuiceFS, massive cloud storage can be directly connected to big data, machine learning, artificial intelligence, and various application platforms in production environments. Without modifying code, the massive cloud storage can be used as efficiently as local storage.

JuiceFS @ GitHub.

Trunk Data Platform (TDP) https://www.trunkdataplatform.io/

Mon Nov 27 12:01:20 2023

📧email

open source big data platform.

Trunk Data Platform is an Open Source, free, Hadoop distribution.

Apache Druid https://druid.apache.org/

Thu Aug 24 10:11:00 2023

📧email

Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load.

XetHub: fast, frictionless collaboration at scale https://xethub.com/

Wed Jan 4 13:46:49 2023

📧email

XetHub brings speedy access and Git-based collaboration to large scale repositories of data, code, or any combination of files.
Our instant mount feature makes it possible to access GBs and TBs of data in seconds at the speed of localhost, while our de-duplication algorithm stores data and differences efficiently to save money and speed up development cycles.
XetHub is ideal for teams who already use Git to track their code changes, and want to leverage the power of infinite history, pull requests, and difference-based tracking for larger assets such as datasets or media files. Managing complete projects with familiar Git semantics makes change tracking and continuous integration a breeze, especially for workflows that use code to generate or augment assets.

Neo4j https://neo4j.com/

Fri Oct 21 09:11:59 2022

📧email

Graph Database Management System.
Neo4j Graph Data Platform. Blazing-Fast Graph, Petabyte Scale.
With proven trillion+ entity performance, developers, data scientists, and enterprises rely on Neo4j as the top choice for high-performance, scalable analytics, intelligent app development, and advanced AI/ML pipelines.

ClickHouse https://github.com/ClickHouse/ClickHouse

Fri Oct 14 08:48:20 2022

📧email

ClickHouse® is a free analytics DBMS for big data.
ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

Konbert https://konbert.com/

Mon Oct 10 08:27:23 2022

📧email

Open big JSON, CSV Files: Online Viewer, Explorer and Converter.
View and convert big data files.
View large or small files right in your browser and export them in any format.

Planet https://www.planet.com/

Wed Feb 2 10:36:38 2022

📧email

Daily Earth Data to See Change and Make Better Decisions.
Planet provides daily satellite data that helps businesses, governments, researchers, and journalists understand the physical world and take action.

Climate TRACE https://www.climatetrace.org/

Fri Jan 28 12:31:53 2022

📧email

Climate TRACE was built to collect and share greenhouse gas emissions from anthropogenic (human) activities to facilitate climate action .

Robtex https://www.robtex.com/

Tue Jan 25 09:55:01 2022

📧email

Robtex is used for various kinds of research of IP numbers, Domain names, etc.
Robtex uses various sources to gather public information about IP numbers, domain names, host names, Autonomous systems, routes etc. It then indexes the data in a big database and provide free access to the data.
We aim to make the fastest and most comprehensive free DNS lookup tool on the Internet.
Our database now contains billions of documents of internet data collected over more than a decade.

Luna https://www.luna-lang.org/

Tue Aug 7 22:27:45 2018

📧email

A WYSIWYG language for data processing.

EveryPolitician http://everypolitician.org/

Sat Dec 31 02:38:26 2016

📧email

Political data for 233 countries.
The world’s richest open dataset on politicians

Links per page

Filters