data-analytics
Discord AnalyticsMade Simple
Cially is an open-source Discord Server Stats dashboard that provides real-time insights, member activity tracking, and detailed server statistics to help you understand and optimize your Discord community.
🪼Cially is a powerful, open-source dashboard designed to provide in-depth insights, real-time analytics, and detailed statistics for your Discord server. Monitor member activity, track engagement trends, and make data-driven decisions with ease.
PyPI Package Statistics & Analytics
Track downloads, analyze trends, and gain insights into the Python ecosystem
SedonaDB is an open-source single-node analytical database engine with geospatial as a first-class citizen. It aims to deliver the fastest spatial analytics query speed and the most comprehensive function coverage available.
Related contents:
Semantic Data Processing. Build data processing and data analysis pipelines that leverage the power of LLMs 🧠
Semlib is a Python library for building data processing and data analysis pipelines that leverage the power of large language models (LLMs). Semlib provides, as building blocks, familiar functional programming primitives like map, reduce, sort, and filter, but with a twist: Semlib's implementation of these operations are programmed with natural language descriptions rather than code. Under the hood, Semlib handles complexities such as prompting, parsing, concurrency control, caching, and cost tracking.
Python Data Analysis Library.
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Related contents:
⚡ Dynamically generated stats for your github readmes.
Related contents:
The best dashboards are built with code. Create fast, beautiful data apps, dashboards, and reports from the command line. Write Markdown, JavaScript, SQL, Python, R… and any language you like. Free and open-source.
A static site generator for data apps, dashboards, reports, and more. Observable Framework combines JavaScript on the front-end for interactive graphics with any language on the back-end for data analysis.
Data analysis & OSINT tool for everyone.
warning: created by ex-employee of the FSB
Related contents:
Simple way to access various statistics in git repository. Git quick statistics is a simple and efficient way to access various statistics in git repository.
Any git repository may contain tons of information about commits, contributors, and files. Extracting this information is not always trivial, mostly because there are a gadzillion options to a gadzillion git commands - I don't think there is a single person alive who knows them all. Probably not even Linus Torvalds himself :).
AI Call Analytics. Clean, annotate, and summarize call transcripts with GPT-4.5.
Open Source AI Calling Transcriptions, Summaries, and Analytics built on OpenAI Whisper.
Index your Gmail account to a SQLite DB and play with the data.
This is a script to download emails from Gmail and store them in a SQLite database for further analysis. I find it extremely useful to have all my emails in a database to run queries on them. For example, I can find out how many emails I received per sender, which emails take the most space, and which emails from which sender I never read.
All-in-One Desktop App to Analyze Data Locally.
TextQuery is an all-in-one desktop app to import, query, modify, and visualize your raw data with SQL.
Look At Your Data 👀.
Data quality is the most important factor in machine learning success. Hyperparam brings exploration and analysis of massive text datasets to the browser.
Open source data warehouse for real time data analytics.
Apache Doris is an easy-to-use, high-performance and real-time analytical database based on MPP architecture, known for its extreme speed and ease of use. It only requires a sub-second response time to return query results under massive data and can support not only high-concurrency point query scenarios but also high-throughput complex analysis scenarios.
Moose lets you develop analytical backends in pure TypeScript or Python code. The developer framework for your data & analytics stack.
Moose is an open source developer framework for building analytical backends. Moose is designed to help you quickly prototype, productionize, and scale data products, data pipelines, and data APIs - on OLAP and streaming infrastructure - using native TypeScript or Python.
Quacklytics is an open-source analytics service built using DuckDB and designed to run analytical queries directly inside your browser. It provides a seamless, lightweight, and high-performance way to process your data without the need for expensive server-side compute resources.
The R Project for Statistical Computing.
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
Related contents:
Visualise your CSV files in seconds without sending your data anywhere.
BirdNET-Analyzer is an open source tool for analyzing bird calls using machine learning models. It can process large amounts of audio recordings and identify (bird) species based on their calls.
Download and parse data from Garmin Connect or a Garmin watch, FitBit CSV, and MS Health CSV files into and analyze data in Sqlite serverless databases with Jupyter notebooks.
Python scripts for parsing health data into and manipulating data in a SQLite database. SQLite is a light weight database that doesn't require a server.
Related contents:
Count your code, quickly.
Tokei is a program that displays statistics about your code. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can efficiently run and deploy data transformations written in SQL or Python with visibility and control at any size.
Related contents:
Research and data to make progress against the world’s largest problems.
To make progress against the pressing problems the world faces, we need to be informed by the best research and data.
Our World in Data makes this knowledge accessible and understandable, to empower those working to build a better world.
Your data tell a story. Explore. Visualize. Model. Make a difference. Better insight starts with Stata.
Stata is statistical software for data science.
Insights, Unlocked in Real Time.
Apache Pinot™: The real-time analytics open source platform for lightning-fast insights, effortless scaling, and cost-effective data-driven decisions.
Related contents:
structured-logprobs is an open-source Python library that enhances OpenAI's structured outputs by providing detailed information about token log probabilities.
This library is designed to offer valuable insights into the reliability of an LLM's structured outputs. It works with OpenAI's Structured Outputs, a feature that ensures the model consistently generates responses adhering to a supplied JSON Schema. This eliminates concerns about missing required keys or hallucinating invalid values.
Slice and dice log files on the command line.
Angle-grinder allows you to parse, aggregate, sum, average, min/max, percentile, and sort your data. You can see it, live-updating, in your terminal. Angle grinder is designed for when, for whatever reason, you don't have your data in graphite/honeycomb/kibana/sumologic/splunk/etc. but still want to be able to do sophisticated analytics.
Related contents:
Online PCAP Analysis and Network Traffic Insights.
Effortless PCAP File Analysis in Your Browser
Explore and analyze PCAP files online using A-Packets, designed to provide comprehensive insights into network protocols like IPv4/IPv6, HTTP, Telnet, FTP, DNS, SSDP, and WPA2. This tool allows users to easily view details of network communications and dissect layers of data transmission.
An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.
It features: Automode, Manual Mode, Ollama and OpenAI, a Chat function to query your documents with AI, a modern and intuitive Webinterface.
Empower your testing with AI & usage insights.
Gravity monitors real-world user behaviors and usage patterns in live production and test environments to generate quality analytics, identify test coverage gaps, and assist in prioritizing and generating test cases.
AIL-Framework is a powerful open-source project designed for online data analysis and web crawling, tailored for cybersecurity researchers and analysts.
Related contents:
Know Your User™
Open source user analytics for sovereign cybersecurity.
Tirreno is open-source user analytics software.
Tirreno is a universal analytic tool for monitoring online platforms, web applications, SaaS, communities, IoT, mobile applications, intranets, and e-commerce websites. It is effective against external threats associated with partners or customers, as well as internal risks posed by employees or suppliers.
System for collecting, deriving and querying facts about source code.
Glean is a system for working with facts about source code. You can use it for:
-
Collecting and storing detailed information about code structure. Glean is designed around an efficient storage model that enables storing information about code at scale.
-
Querying information about code, to power tools and experiences from online IDE features to offline code analysis.
Source: Indexing code at scale with Glean @ Engineering at Meta.
Data Runs Better on SDF. Transform Data Better with SDF. SDF is the fastest way to build a scalable, reliable, and optimized data warehouse.
SDF is a developer platform for data that scales SQL understanding across an organization, empowering all data teams to unlock the full potential of their data.
SDF is a multi-dialect SQL compiler, transformation framework, and analytical database engine. It natively compiles SQL dialects, like Snowflake, and connects to their corresponding data warehouses to materialize models.
Open-source sQL AI Agent. Text2SQL made Easy!
Wren AI is an open-source SQL AI Agent that empowers data, product, and business teams to access insights through AI chat, built-in well designed intuitive UI and UX, integrating seamlessly with tools like Excel and Google Sheets.
The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Related contents:
PostgreSQL log analyzer.
pgBadger is a PostgreSQL log analyzer built for speed with fully detailed reports and professional rendering.
The Databricks Data Intelligence Platform. Databricks brings AI to your data to help you bring AI to the world.
Related contents:
Graphic Walker is a different open-source alternative to Tableau. It allows data scientists to analyze data and visualize patterns with simple drag-and-drop / natural language query operations.
A project providing a Graphic Walker Pane for use with HoloViz Panel.
A simple way to explore your data through a Tableau-like interface directly in your Panel data applications.
panel-graphic-walker brings the power of Graphic Walker to your data science workflow, seamlessly integrating interactive data exploration into notebooks and Panel applications. Effortlessly create dynamic visualizations, analyze datasets, and build dashboards—all within a Pythonic, intuitive interface.
DataEase is an open source data visualization analysis tool that helps users quickly analyze data and gain insights into business trends, thereby improving and optimizing their business. DataEase supports a wide range of data source connections, can quickly create charts by dragging and dropping, and can be easily shared with others.
Zero-ETL data analytics with Postgres.
Simple and cost-effective cloud analytics platform automatically synced with your data sources.
BemiDB is a Postgres read replica optimized for analytics. It consists of a single binary that seamlessly connects to a Postgres database, replicates the data in a compressed columnar format, and allows you to run complex queries using its Postgres-compatible analytical query engine.
Business Intelligence as Code. Build polished data products with SQL. Build fast, interactive data visualizations in pure SQL and markdown.
Evidence is a lightweight framework for building data apps. It's open source and free to get started.
Open and unified metadata platform for data discovery, observability, and governance.
A single place for all your data and all your data practitioners to build and manage high quality data assets at scale. Built by Collate and the founders of Apache Hadoop, Apache Atlas, and Uber Databook.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration. It is one of the fastest-growing open-source projects with a vibrant community and adoption by a diverse set of companies in a variety of industry verticals. Based on Open Metadata Standards and APIs, supporting connectors to a wide range of data services, OpenMetadata enables end-to-end metadata management, giving you the freedom to unlock the value of your data assets.
Department of Education (DOE) for New South Wales (AUS) data stack in a box. With the push of one button you can have your own data stack up and running in 5 mins! 🏎️.
Use SQL for everything. Query anything with old-school cool SQL.
Anyquery is a CLI tool to run SQL queries on any data source, no matter if it's a file, an API, logs, or a local app. See the integrations for the full extent of what you can do.
The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI. Snowflake delivers ease of use, instant elasticity, and lower TCO.
Redash helps you make sense of your data. Make Your Company Data Driven. Connect and query your data sources, build dashboards to visualize data and share them with your company.
Redash is designed to enable anyone, regardless of the level of technical sophistication, to harness the power of data big and small. SQL users leverage Redash to explore, query, visualize, and share data from any data sources. Their work in turn enables anybody in their organization to use the data. Every day, millions of users at thousands of organizations around the world use Redash to develop insights and make data-driven decisions.
Kylin is a high concurrency, high performance and intelligent OLAP engine that provides low-cost and ultimate data analytics experience.
Proof of SQL is a high performance zero knowledge (ZK) prover developed by the Space and Time team, which cryptographically guarantees SQL queries were computed accurately against untampered data. It targets online latencies while proving computations over entire chain histories, an order of magnitude faster than state-of-the art zkVMs and coprocessors.
Zircolite is a standalone tool written in Python 3. It allows to use SIGMA rules on : MS Windows EVTX (EVTX, XML and JSONL format), Auditd logs, Sysmon for Linux and EVTXtract logs.
Les technologies numériques sont incroyablement puissantes et redéfinissent le fonctionnement de notre société. Pour les acteurs qui œuvrent pour l'intérêt général, la technologie peut parfois être un levier démutiplicateur d'impacts positifs, cependant et malheureusement ces acteurs n'ont souvent pas les ressources technologiques ou humaines pour accélérer leur action citoyenne. Data for Good existe pour rétablir l'équilibre.
Interactive SQL. Analyze petabyte-scale data where it lives with ease and flexibility.
Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats.
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Contribute to krishnaik06/The-Grand-Complete-Data-Science-Materials development by creating an account on GitHub.
Your Data Pipeline, Simplified. GlareDB: An analytics DBMS for distributed data.
Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.
GlareDB is designed to query your data wherever it lives using SQL that you already know.
Protect your business, scale your security. Open Source Vulnerability Management Platform.
Security has two difficult tasks: designing smart ways of getting new information, and keeping track of findings to improve remediation efforts. With Faraday, you may focus on discovering vulnerabilities while we help you with the rest. Just use it in your terminal and get your work organized on the run. Faraday was made to let you take advantage of the available tools in the community in a truly multiuser way.
Faraday aggregates and normalizes the data you load, allowing exploring it into different visualizations that are useful to managers and analysts alike.
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.