The R Project for Statistical Computing.
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
Related contents:
Visualise your CSV files in seconds without sending your data anywhere.
BirdNET-Analyzer is an open source tool for analyzing bird calls using machine learning models. It can process large amounts of audio recordings and identify (bird) species based on their calls.
Download and parse data from Garmin Connect or a Garmin watch, FitBit CSV, and MS Health CSV files into and analyze data in Sqlite serverless databases with Jupyter notebooks.
Python scripts for parsing health data into and manipulating data in a SQLite database. SQLite is a light weight database that doesn't require a server.
Related contents:
Count your code, quickly.
Tokei is a program that displays statistics about your code. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can efficiently run and deploy data transformations written in SQL or Python with visibility and control at any size.
Related contents:
Research and data to make progress against the world’s largest problems.
To make progress against the pressing problems the world faces, we need to be informed by the best research and data.
Our World in Data makes this knowledge accessible and understandable, to empower those working to build a better world.
Insights, Unlocked in Real Time.
Apache Pinot: The real-time analytics open source platform for lightning-fast insights, effortless scaling, and cost-effective data-driven decisions.
Related contents:
structured-logprobs is an open-source Python library that enhances OpenAI's structured outputs by providing detailed information about token log probabilities.
This library is designed to offer valuable insights into the reliability of an LLM's structured outputs. It works with OpenAI's Structured Outputs, a feature that ensures the model consistently generates responses adhering to a supplied JSON Schema. This eliminates concerns about missing required keys or hallucinating invalid values.
Slice and dice log files on the command line.
Angle-grinder allows you to parse, aggregate, sum, average, min/max, percentile, and sort your data. You can see it, live-updating, in your terminal. Angle grinder is designed for when, for whatever reason, you don't have your data in graphite/honeycomb/kibana/sumologic/splunk/etc. but still want to be able to do sophisticated analytics.
Related contents:
Online PCAP Analysis and Network Traffic Insights.
Effortless PCAP File Analysis in Your Browser
Explore and analyze PCAP files online using A-Packets, designed to provide comprehensive insights into network protocols like IPv4/IPv6, HTTP, Telnet, FTP, DNS, SSDP, and WPA2. This tool allows users to easily view details of network communications and dissect layers of data transmission.
An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.
It features: Automode, Manual Mode, Ollama and OpenAI, a Chat function to query your documents with AI, a modern and intuitive Webinterface.
Analyze how a Git repo grows over time.
AIL-Framework is a powerful open-source project designed for online data analysis and web crawling, tailored for cybersecurity researchers and analysts.
Related contents:
Know Your User
Open source user analytics
for sovereign cybersecurity.
Tirreno is open-source user analytics software.
Tirreno is a universal analytic tool for monitoring online platforms, web applications, SaaS, communities, IoT, mobile applications, intranets, and e-commerce websites. It is effective against external threats associated with partners or customers, as well as internal risks posed by employees or suppliers.
System for collecting, deriving and querying facts about source code.
Glean is a system for working with facts about source code. You can use it for:
Collecting and storing detailed information about code structure. Glean is designed around an efficient storage model that enables storing information about code at scale.
Querying information about code, to power tools and experiences from online IDE features to offline code analysis.
Source: Indexing code at scale with Glean @ Engineering at Meta.
Data Runs Better on SDF. Transform Data Better with SDF.
SDF is the fastest way to build a scalable, reliable, and optimized data warehouse.
SDF is a developer platform for data that scales SQL understanding across an organization, empowering all data teams to unlock the full potential of their data.
SDF is a multi-dialect SQL compiler, transformation framework, and analytical database engine. It natively compiles SQL dialects, like Snowflake, and connects to their corresponding data warehouses to materialize models.