data-analytics
The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI. Snowflake delivers ease of use, instant elasticity, and lower TCO.
Redash helps you make sense of your data. Make Your Company Data Driven. Connect and query your data sources, build dashboards to visualize data and share them with your company.
Redash is designed to enable anyone, regardless of the level of technical sophistication, to harness the power of data big and small. SQL users leverage Redash to explore, query, visualize, and share data from any data sources. Their work in turn enables anybody in their organization to use the data. Every day, millions of users at thousands of organizations around the world use Redash to develop insights and make data-driven decisions.
Kylin is a high concurrency, high performance and intelligent OLAP engine that provides low-cost and ultimate data analytics experience.
Proof of SQL is a high performance zero knowledge (ZK) prover developed by the Space and Time team, which cryptographically guarantees SQL queries were computed accurately against untampered data. It targets online latencies while proving computations over entire chain histories, an order of magnitude faster than state-of-the art zkVMs and coprocessors.
Zircolite is a standalone tool written in Python 3. It allows to use SIGMA rules on : MS Windows EVTX (EVTX, XML and JSONL format), Auditd logs, Sysmon for Linux and EVTXtract logs.
Les technologies numériques sont incroyablement puissantes et redéfinissent le fonctionnement de notre société. Pour les acteurs qui œuvrent pour l'intérêt général, la technologie peut parfois être un levier démutiplicateur d'impacts positifs, cependant et malheureusement ces acteurs n'ont souvent pas les ressources technologiques ou humaines pour accélérer leur action citoyenne. Data for Good existe pour rétablir l'équilibre.
Interactive SQL. Analyze petabyte-scale data where it lives with ease and flexibility.
Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats.
The Apache Tikaâ„¢ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Contribute to krishnaik06/The-Grand-Complete-Data-Science-Materials development by creating an account on GitHub.
Your Data Pipeline, Simplified. GlareDB: An analytics DBMS for distributed data.
Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.
GlareDB is designed to query your data wherever it lives using SQL that you already know.
Protect your business, scale your security. Open Source Vulnerability Management Platform.
Security has two difficult tasks: designing smart ways of getting new information, and keeping track of findings to improve remediation efforts. With Faraday, you may focus on discovering vulnerabilities while we help you with the rest. Just use it in your terminal and get your work organized on the run. Faraday was made to let you take advantage of the available tools in the community in a truly multiuser way.
Faraday aggregates and normalizes the data you load, allowing exploring it into different visualizations that are useful to managers and analysts alike.
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.
Rapidly Search and Hunt through Windows Forensic Artefacts.
Chainsaw provides a powerful ‘first-response’ capability to quickly identify threats within Windows forensic artefacts such as Event Logs and MFTs. Chainsaw offers a generic and fast method of searching through event logs for keywords, and by identifying threats using built-in support for Sigma detection rules, and via custom Chainsaw detection rules.
CSVs sliced, diced & analyzed.
qsv (pronounced "Quicksilver") is a command line program for indexing, slicing, analyzing, filtering, enriching, validating & joining CSV files.
text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).
library and tools for information extraction.
This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.
Amplify the Impact of Your People, Expertise & Data.
Altair and RapidMiner share the same vision to make data analytics simple enough for all users, but scalable, governed, and safe enough for all enterprises. RapidMiner is the enterprise-ready data science platform that amplifies the collective impact of your people, expertise and data for breakthrough competitive advantage.
KNIME offers a complete platform for end-to-end data science, from creating analytic models, to deploying them and sharing insights within the organization, through to data apps and services.
KNIME Analytics Platform is free and open source, which ensures users remain on the bleeding edge of data science, 300+ connectors to data sources, and integrations to all popular machine learning libraries.
dbtâ„¢ is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. Now anyone on the data team can safely contribute to production-grade data pipelines.
🦘 Explore multimedia datasets at scale.
Kangas is a tool for exploring, analyzing, and visualizing large-scale multimedia data. It provides a straightforward Python API for logging large tables of data, along with an intuitive visual interface for performing complex queries against your dataset.
Volatile memory extraction utility framework - An advanced memory forensics framework.
The Volatility Framework is a completely open collection of tools, implemented in Python under the GNU General Public License, for the extraction of digital artifacts from volatile memory (RAM) samples. The extraction techniques are performed completely independent of the system being investigated but offer visibilty into the runtime state of the system.
StreamAlert is a serverless, real-time data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using data sources and alerting logic you define. Computer security teams use StreamAlert to scan terabytes of log data every day for incident detection and response.
DuckDB is an in-process SQL OLAP database management system.
DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs, maps), and several extensions designed to make SQL easier to use.
Related contents:
- DuckDB - Le moteur SQL qui transforme vos données @ Korben :fr:.
- Why DuckDB is my first choice for data processing @ >robinlinacre.
- DuckDB is Probably the Most Important Geospatial Software of the Last Decade @ dbreunig.com.
- Why Semantic Layers Matter — and How to Build One with DuckDB @ MotherDuck.
- Querying Billions of GitHub Events Using Modal and DuckDB (Part 1: Ingesting Data) @ noreasontopanic.
- DuckDB beats Polars for 1TB of data @ Confessions of a Data Guy.
- Building Your Modern Data Analytics Stack with Python, Parquet, and DuckDB @ KD nuggets.
- Building an Obsidian RAG with DuckDB and MotherDuck @ MotherDuck.
Open Source Business Intelligence
The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋
Keshif is a web-based tool that lets you browse and understand datasets easily.