Biapy's Bookmarks

Tokei

https://github.com/XAMPPRocky/tokei

Count your code, quickly.

Tokei is a program that displays statistics about your code. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language.

Tokei - Enfin des stats sur votre code @ Korben :fr:.

command-line continuous-integration data-analytics development foss open-source

Added 1 year ago

qsv

https://github.com/jqnatividad/qsv

CSVs sliced, diced & analyzed.

qsv (pronounced "Quicksilver") is a command line program for indexing, slicing, analyzing, filtering, enriching, validating & joining CSV files.

command-line csv data-analytics miller mlr open-source

Added 2 years ago

RapidMiner

https://rapidminer.com/

Amplify the Impact of Your People, Expertise & Data.

Altair and RapidMiner share the same vision to make data analytics simple enough for all users, but scalable, governed, and safe enough for all enterprises. RapidMiner is the enterprise-ready data science platform that amplifies the collective impact of your people, expertise and data for breakthrough competitive advantage.

commercial data-analytics data-mining data-science web-service

Added 2 years ago

Faraday Security

https://faradaysec.com/

Protect your business, scale your security. Open Source Vulnerability Management Platform.

Security has two difficult tasks: designing smart ways of getting new information, and keeping track of findings to improve remediation efforts. With Faraday, you may focus on discovering vulnerabilities while we help you with the rest. Just use it in your terminal and get your work organized on the run. Faraday was made to let you take advantage of the available tools in the community in a truly multiuser way.

Faraday aggregates and normalizes the data you load, allowing exploring it into different visualizations that are useful to managers and analysts alike.

Faraday @ GitHub.

commercial data-analytics data-visualization open-source security self-hosted vulnerability web-app

Added 2 years ago

Apache Arrow

https://arrow.apache.org/

The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Related contents:

columnar data-analytics data-science format foss open-source

Added 1 year ago

pepy.tech

https://pepy.tech/

PyPI Package Statistics & Analytics

Track downloads, analyze trends, and gain insights into the Python ecosystem

data-analytics python web-service

Added 6 months ago

SDF Labs

https://www.sdf.com/

Data Runs Better on SDF. Transform Data Better with SDF. SDF is the fastest way to build a scalable, reliable, and optimized data warehouse.

SDF is a developer platform for data that scales SQL understanding across an organization, empowering all data teams to unlock the full potential of their data.

SDF is a multi-dialect SQL compiler, transformation framework, and analytical database engine. It natively compiles SQL dialects, like Snowflake, and connects to their corresponding data warehouses to materialize models.

Source: Testing is Not Enough: Transforming Data Quality with Write, Audit, Publish using SDF Build @ SDF Blog.

command-line data-analytics database data-science data-transformation framework sql

Added 1 year ago

Tirreno

https://www.tirreno.com/

Know Your User™

Open source user analytics for sovereign cybersecurity.

Tirreno is open-source user analytics software.

Tirreno is a universal analytic tool for monitoring online platforms, web applications, SaaS, communities, IoT, mobile applications, intranets, and e-commerce websites. It is effective against external threats associated with partners or customers, as well as internal risks posed by employees or suppliers.

Tirreno @ GitHub.

data-analytics foss open-source secint security self-hosted web-app

Added 1 year ago

DuckDB

https://duckdb.org/

DuckDB is an in-process SQL OLAP database management system.

DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs, maps), and several extensions designed to make SQL easier to use.

DuckDB @ GitHub.

Related contents:

data-analytics database foss java mit-licensed olap open-source python r sql wasm

Added 3 years ago

KNIME Analytics Platform

https://www.knime.com/knime-analytics-platform

KNIME Analytics Platform is free and open source, which ensures users remain on the bleeding edge of data science, 300+ connectors to data sources, and integrations to all popular machine learning libraries.

analytics data-analytics data-science open-source software

Added 2 years ago

Anyquery

https://anyquery.dev/

Use SQL for everything. Query anything with old-school cool SQL.

Anyquery is a CLI tool to run SQL queries on any data source, no matter if it's a file, an API, logs, or a local app. See the integrations for the full extent of what you can do.

Anyquery @ GitHub.

api command-line data-analytics database data-science mysql open-source self-hosted sql

Added 1 year ago

Shaper

https://taleshape.com/shaper/docs/

Open Source, SQL-driven Data Dashboards powered by DuckDB.

Build analytics dashboards simply by writing SQL.

Shaper @ GitHub.

Related contents:

Digest #202: Terraform Claude Skills, FinOps FOCUS 1.2, AI Fatigue for Cloud Engineers, and MCP for Web Data Extraction @ DevOps Bulletin.

dashboard data-analytics data-visualization duckdb foss mpl2-licensed open-source sql web-app

Added 1 month ago

Observable Framework

https://observablehq.com/framework/

The best dashboards are built with code. Create fast, beautiful data apps, dashboards, and reports from the command line. Write Markdown, JavaScript, SQL, Python, R… and any language you like. Free and open-source.

A static site generator for data apps, dashboards, reports, and more. Observable Framework combines JavaScript on the front-end for interactive graphics with any language on the back-end for data analysis.

Observable Framework @ GitHub.

dashboard data-analytics data-visualization foss isc-licensed javascript markdown open-source python r sql static-site-generator web

Added 8 months ago

SedonaDB

https://sedona.apache.org/sedonadb/latest/

SedonaDB is an open-source single-node analytical database engine with geospatial as a first-class citizen. It aims to deliver the fastest spatial analytics query speed and the most comprehensive function coverage available.

SedonaDB @ GitHub.

Related contents:

Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen @ Apache Sedona.

apache2-licensed data-analytics database foss geospatial open-source

Added 6 months ago

Kangas

https://github.com/comet-ml/kangas

🦘 Explore multimedia datasets at scale.

Kangas is a tool for exploring, analyzing, and visualizing large-scale multimedia data. It provides a straightforward Python API for logging large tables of data, along with an intuitive visual interface for performing complex queries against your dataset.

data-analytics data-science data-visualization gui open-source python software

Added 3 years ago

Moose

https://docs.fiveonefour.com/moose

Moose lets you develop analytical backends in pure TypeScript or Python code. The developer framework for your data & analytics stack.

Moose is an open source developer framework for building analytical backends. Moose is designed to help you quickly prototype, productionize, and scale data products, data pipelines, and data APIs - on OLAP and streaming infrastructure - using native TypeScript or Python.

Moose @ GitHub.

data-analytics development foss framework mit-licensed open-source python typescript

Added 11 months ago

KNIME

https://www.knime.com/

KNIME offers a complete platform for end-to-end data science, from creating analytic models, to deploying them and sharing insights within the organization, through to data apps and services.

KNIME @ GitHub

commercial data-analytics data-mining data-science open-source self-hosted

Added 2 years ago

GIT quick statistics

https://git-quick-stats.sh/

Simple way to access various statistics in git repository. Git quick statistics is a simple and efficient way to access various statistics in git repository.

Any git repository may contain tons of information about commits, contributors, and files. Extracting this information is not always trivial, mostly because there are a gadzillion options to a gadzillion git commands - I don't think there is a single person alive who knows them all. Probably not even Linus Torvalds himself :).

GIT quick statistics @ GitHub.

command-line data-analytics development foss git mit-licensed open-source statistics

Added 9 months ago

Gmail to SQLite

https://github.com/marcboeker/gmail-to-sqlite

Index your Gmail account to a SQLite DB and play with the data.

This is a script to download emails from Gmail and store them in a SQLite database for further analysis. I find it extremely useful to have all my emails in a database to run queries on them. For example, I can find out how many emails I received per sender, which emails take the most space, and which emails from which sender I never read.

data-analytics database email foss gmail mit-licensed open-source sql sqlite

Added 11 months ago

pgBadger

https://pgbadger.darold.net/

PostgreSQL log analyzer.

pgBadger is a PostgreSQL log analyzer built for speed with fully detailed reports and professional rendering.

data-analytics database foss logs open-source optimization postgresql

Added 1 year ago

kube-opex-analytics

https://github.com/rchakode/kube-opex-analytics

Kubernetes usage analytics for CPU, Memory, and GPU — track costs and optimize cluster resources.

kube-opex-analytics is a Kubernetes usage accounting and analytics tool that helps organizations track CPU, Memory, and GPU resources consumed by their clusters over time (hourly, daily, monthly).

apache2-licensed data-analytics foss kubernetes observability open-source self-hosted web-app

Added 2 months ago

TextQuery

https://textquery.app/

All-in-One Desktop App to Analyze Data Locally.

TextQuery is an all-in-one desktop app to import, query, modify, and visualize your raw data with SQL.

commercial csv data-analytics json macos software sql windows

Added 11 months ago

Zircolite

https://wagga40.github.io/Zircolite/#/

Zircolite is a standalone tool written in Python 3. It allows to use SIGMA rules on : MS Windows EVTX (EVTX, XML and JSONL format), Auditd logs, Sysmon for Linux and EVTXtract logs.

Zircolite @ GitHub.

data-analytics evtx logs open-source security windows

Added 1 year ago

git-of-theseus

https://github.com/erikbern/git-of-theseus

Analyze how a Git repo grows over time.

command-line data-analytics data-visualization foss git open-source python

Added 1 year ago

DataEase

https://dataease.io/

DataEase is an open source data visualization analysis tool that helps users quickly analyze data and gain insights into business trends, thereby improving and optimizing their business. DataEase supports a wide range of data source connections, can quickly create charts by dragging and dropping, and can be easily shared with others.

DataEase @ GitHub.

business-intelligence data-analytics data-visualization foss open-source self-hosted web-app

Added 1 year ago

The Grand Complete Data Science Guide With Videos And Materials

https://github.com/krishnaik06/The-Grand-Complete-Data-Science-Materials

Contribute to krishnaik06/The-Grand-Complete-Data-Science-Materials development by creating an account on GitHub.

aws data-analytics data-science deep-learning eda e-learning git github machine-learning mlops nlp pyspark python sagemaker sql

Added 2 years ago

GarminDB

https://github.com/tcgoetz/GarminDB

Download and parse data from Garmin Connect or a Garmin watch, FitBit CSV, and MS Health CSV files into and analyze data in Sqlite serverless databases with Jupyter notebooks.

Python scripts for parsing health data into and manipulating data in a SQLite database. SQLite is a light weight database that doesn't require a server.

Related contents:

Episode 601: Taming the Demons @ Linux Unplugged.

data-analytics foss garmin open-source smart-watch sqlite

Added 1 year ago

Sortarr

https://github.com/Jaredharper1/Sortarr

Sonarr & Radarr Media Library Insights.

Sortarr is a lightweight web dashboard for Sonarr and Radarr that helps you understand how your media library uses storage. It is not a Plex tool, but it is useful in Plex setups for spotting oversized series or movies and comparing quality vs. size trade-offs.

data-analytics foss mit-licensed open-source radarr seedbox self-hosted sonarr storage web-app

Added 3 months ago

SQLMesh

https://sqlmesh.readthedocs.io/en/stable/

Efficient data transformation and modeling framework that is backwards compatible with dbt.

SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can efficiently run and deploy data transformations written in SQL or Python with visibility and control at any size.

SQLMesh @ GitHub.

Related contents:

Why SQLMesh Might be The Best dbt Alternative @ The Data Toolbox.

data-analytics data-pipeline data-transformation dbt foss open-source self-hosted sql web-app

Added 1 year ago

Graphic Walker

https://docs.kanaries.net/graphic-walker

Graphic Walker is a different open-source alternative to Tableau. It allows data scientists to analyze data and visualize patterns with simple drag-and-drop / natural language query operations.

Graphic Walker @ GitHub.

data-analytics data-visualization foss open-source self-hosted web-app

Added 1 year ago

First Pull Request

https://firstpr.me/

What was the first pull request you sent on GitHub?

data-analytics github web-service

Added 2 years ago

Glean

https://glean.software/

System for collecting, deriving and querying facts about source code.

Glean is a system for working with facts about source code. You can use it for:

Collecting and storing detailed information about code structure. Glean is designed around an efficient storage model that enables storing information about code at scale.
Querying information about code, to power tools and experiences from online IDE features to offline code analysis.
Glean @ GitHub.

Source: Indexing code at scale with Glean @ Engineering at Meta.

data-analytics development foss open-source static-code-analyzer

Added 1 year ago

Paperless-AI

https://clusterzx.github.io/paperless-ai/

An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.

It features: Automode, Manual Mode, Ollama and OpenAI, a Chat function to query your documents with AI, a modern and intuitive Webinterface.

Paperless-AI @ GitHub.

ai data-analytics foss llm ollama open-source paperless web-app

Added 1 year ago

claude-pulse

https://github.com/NoobyGains/claude-pulse

Real-time usage monitor for Claude Code — session limits, weekly limits, and plan tier with colour-coded progress bars

claude-code command-line data-analytics source-available tui

Added 3 weeks ago

Apache Spark

https://spark.apache.org/

Unified Engine for large-scale data analytics.

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

Apache Spark @ GitHub.

Related contents:

apache2-licensed apache-spark big-data data-analytics data-science foss machine-learning open-source sql

Added 3 months ago

Lampyre :ru:

https://lampyre.io/

Data analysis & OSINT tool for everyone.

warning: created by ex-employee of the FSB

Related contents:

Episode #509: Les dangers de l’OSINT @ NoLimitSecu :fr:.

commercial data-analytics osint web-service

Added 8 months ago

Chainsaw

https://github.com/WithSecureLabs/chainsaw

Rapidly Search and Hunt through Windows Forensic Artefacts.

Chainsaw provides a powerful ‘first-response’ capability to quickly identify threats within Windows forensic artefacts such as Event Logs and MFTs. Chainsaw offers a generic and fast method of searching through event logs for keywords, and by identifying threats using built-in support for Sigma detection rules, and via custom Chainsaw detection rules.

command-line data-analytics first-response logs open-source rust security

Added 2 years ago

nao

https://getnao.io/

the Analytics Agent built for context engineering. Build your agent context like a file system.

Deploy a chat UI for anyone to run analytics on your data.

nao @ GitHub.

Related contents:

SQL Is Solved. Here's Where Chat-BI Still Breaks @ Ju Data Engineering Newsletter.

ai apache2-licensed business-intelligence data-analytics foss llm open-source self-hosted web-app

Added 1 month ago

Angle-grinder

https://github.com/rcoh/angle-grinder

Slice and dice log files on the command line.

Angle-grinder allows you to parse, aggregate, sum, average, min/max, percentile, and sort your data. You can see it, live-updating, in your terminal. Angle grinder is designed for when, for whatever reason, you don't have your data in graphite/honeycomb/kibana/sumologic/splunk/etc. but still want to be able to do sophisticated analytics.

Related contents:

A list of new(ish) command line tools @ Julia Evans.

command-line data-analytics foss logging open-source

Added 1 year ago

Deepnote

https://deepnote.com/

Analytics and data science notebook for teams. Jupyter notebook for the AI era.

Link Snowflake, BigQuery, CSVs, and 60+ data sources
Write in Python, SQL, R — or just prompt Deepnote Agent
Build powerful data apps and dashboards with AI
Deepnote @ GitHub.

apache2-licensed bigquery csv data-analytics foss jupyter notebook open-source python r self-hosted snowflake sql

Added 5 months ago

pgit

https://github.com/ImGajeed76/pgit

Git-like version control CLI backed by PostgreSQL with pg-xpatch delta compression.

Related contents:

pgit: What If Your Git History Was a SQL Database? @ oseifert.

data-analytics foss git mit-licensed open-source postgresql

Added 4 weeks ago

MITIE

https://github.com/mit-nlp/MITIE

library and tools for information extraction.

This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.

data-analytics data-science information-extraction machine-learning nlp

Added 2 years ago

data stack in a box

https://github.com/wisemuffin/nsw-doe-data-stack-in-a-box

Department of Education (DOE) for New South Wales (AUS) data stack in a box. With the push of one button you can have your own data stack up and running in 5 mins! 🏎️.

data-analytics data-pipeline data-science data-stack data-stream open-source self-hosted

Added 1 year ago

Visprex

https://www.visprex.com/

Visualise your CSV files in seconds without sending your data anywhere.

csv data-analytics data-visualization foss open-source self-hosted web-app

Added 1 year ago

OpenMetadata

https://open-metadata.org/

Open and unified metadata platform for data discovery, observability, and governance.

A single place for all your data and all your data practitioners to build and manage high quality data assets at scale. Built by Collate and the founders of Apache Hadoop, Apache Atlas, and Uber Databook.

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration. It is one of the fastest-growing open-source projects with a vibrant community and adoption by a diverse set of companies in a variety of industry verticals. Based on Open Metadata Standards and APIs, supporting connectors to a wide range of data services, OpenMetadata enables end-to-end metadata management, giving you the freedom to unlock the value of your data assets.

OpenMetadata @ GitHub.

data-analytics data-science foss metadata open-source self-hosted web-app

Added 1 year ago

Semlib

https://semlib.anish.io/

Semantic Data Processing. Build data processing and data analysis pipelines that leverage the power of LLMs 🧠

Semlib is a Python library for building data processing and data analysis pipelines that leverage the power of large language models (LLMs). Semlib provides, as building blocks, familiar functional programming primitives like map, reduce, sort, and filter, but with a twist: Semlib's implementation of these operations are programmed with natural language descriptions rather than code. Under the hood, Semlib handles complexities such as prompting, parsing, concurrency control, caching, and cost tracking.

Semlib @ GitHub.

data-analytics data-pipeline foss library llm mit-licensed open-source python semantic

Added 7 months ago

Apache Pinot™

https://pinot.apache.org/

Insights, Unlocked in Real Time.

Apache Pinot™: The real-time analytics open source platform for lightning-fast insights, effortless scaling, and cost-effective data-driven decisions.

Apache Pinot @ GitHub.

Related contents:

Serving Millions of Apache Pinot™ Queries with Neutrino @ Uber Blog.

big-data data-analytics data-science foss open-source self-hosted

Added 1 year ago

Metabase

https://www.metabase.com/

Open Source Business Intelligence

The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋

Metabase @ GitHub.

business-intelligence data-analytics data-science data-visualization metabase open-source self-hosted web-app web-service

Added 4 years ago

Panel Graphic Walker

https://github.com/panel-extensions/panel-graphic-walker

A project providing a Graphic Walker Pane for use with HoloViz Panel.

A simple way to explore your data through a Tableau-like interface directly in your Panel data applications.

panel-graphic-walker brings the power of Graphic Walker to your data science workflow, seamlessly integrating interactive data exploration into notebooks and Panel applications. Effortlessly create dynamic visualizations, analyze datasets, and build dashboards—all within a Pythonic, intuitive interface.

data-analytics data-visualization foss open-source python

Added 1 year ago

dbt

https://www.getdbt.com/product/what-is-dbt/

dbt™ is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. Now anyone on the data team can safely contribute to production-grade data pipelines.

dbt @ GitHub.

commercial data-analytics data-science open-source sql workflow

Added 2 years ago

Apache Tika bindings for PHP

https://github.com/vaites/php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats.

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

apache content-analysis data-analytics development library open-source php tika

Added 2 years ago

Wren AI

https://getwren.ai/oss

Open-source sQL AI Agent. Text2SQL made Easy!

Wren AI is an open-source SQL AI Agent that empowers data, product, and business teams to access insights through AI chat, built-in well designed intuitive UI and UX, integrating seamlessly with tools like Excel and Google Sheets.

Wren AI @ GitHub.

ai-agent data-analytics database foss llm open-source sql

Added 1 year ago

text2vec

https://text2vec.org/

text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).

text2vec @ GitHub.

data-analytics data-science information-extraction nlp open-source r

Added 2 years ago

OpenBB

https://openbb.co/

Financial data platform for analysts, quants and AI agents. The AI Workspace for Finance.

Bridge your data with AI. Build AI-powered analytics applications, faster, securely and on your terms.

OpenBB @ GitHub.

agpl3-licensed ai business business-intelligence data-analytics finance open-source self-hosted web-app

Added 2 weeks ago

Amazon Athena

https://aws.amazon.com/athena/

Interactive SQL. Analyze petabyte-scale data where it lives with ease and flexibility.

Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.

286 - Data & Dev - Christophe Blefari @ <ifttd> :fr:.

amazon aws big-data commercial data data-analytics data-science serverless web-service

Added 1 year ago

Apache Kylin

https://kylin.apache.org/

Kylin is a high concurrency, high performance and intelligent OLAP engine that provides low-cost and ultimate data analytics experience.

data-analytics data-science foss olap open-source

Added 1 year ago

Shinar

https://github.com/Chivo-Systems/Shinar/

AI Call Analytics. Clean, annotate, and summarize call transcripts with GPT-4.5.

Open Source AI Calling Transcriptions, Summaries, and Analytics built on OpenAI Whisper.

audio-transcription data-analytics foss gpl3-licensed open-source self-hosted speech-to-text web-app

Added 10 months ago

Rudel

https://rudel.ai/

Understand how your team codes with AI. Coding Agent Analytics for Claude Code.

Rudel gives engineering leaders visibility into Claude Code usage across their team. Track productivity, quantify ROI, and surface quality signals, automatically.

Rudel @ GitHub.

claude-code clickhouse data-analytics foss mit-licensed open-source self-hosted web-app

Added 1 month ago

phptop

https://github.com/bearstech/phptop

PHP basic ressource profiler (CPU/memory), safe and useful for production sites.

phptop prints per query and average metrics comparable to 'time' (wallclock, user and system CPU time) along with memory and other ressource usages.

It can be easily globally activated on a LAMP server and requires little resources and a single line configuration change in your php.ini. It has been used by Bearstech on many production servers for years without any problems.

apache command-line cpu data-analytics foss gpl3-licensed http-server memory open-source php resources

Added 2 months ago

BemiDB

https://bemidb.com/

Zero-ETL data analytics with Postgres.

Simple and cost-effective cloud analytics platform automatically synced with your data sources.

BemiDB is a Postgres read replica optimized for analytics. It consists of a single binary that seamlessly connects to a Postgres database, replicates the data in a compressed columnar format, and allows you to run complex queries using its Postgres-compatible analytical query engine.

BemiDB @ GitHub.

columnar data-analytics database open-source postgresql

Added 1 year ago