benchmark
A BIg Bench for Large-Scale Relational Database Grounded Text-to-SQLs.
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.
Related contents:
Challenging Software Optimization Tasks for Evaluating SWE-Agents.
A benchmark for evaluating language models' capabilities in developing high-performance software.
GSO (Global Software Optimization) is a benchmark for evaluating language models' capabilities in developing high-performance software. We present 100+ challenging optimization tasks across 10 codebases spanning diverse domains and programming languages. Each task provides a codebase and performance test as a precise specification, with agents required to optmize the codebase and measured against expert developer commits.
What's the best JavaScript minifier?
🏃♂️🏃♀️🏃 JS minification benchmarks: babel-minify, esbuild, terser, uglify-js, swc, google closure compiler, tdewolff/minify, oxc-minify
Multi-DBMS SQL Benchmarking Framework via JDBC.
BenchBase (formerly OLTPBench) is a Multi-DBMS SQL Benchmarking Framework via JDBC.
Related contents:
JavaScript Benchmarking. Browser-based JavaScript benchmarking tool.
Run, compare, and share JavaScript benchmarks in your browser.
Statistically Sound Performance Evaluation.
Stabilizer is a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly rerandomizing layouts of code, stack, and heap objects at runtime.
Statistically rigorous benchmark runner for the web.
tachometer is a tool for running benchmarks in web browsers. It uses repeated sampling and statistics to reliably identify even tiny differences in runtime.
benchmark tooling that loves you ❤️
Mitata is a benchmark tooling library for JavaScript and C++ that offers accurate timing down to picoseconds, helpful visualizations, and features like automatic garbage collection and argument handling for benchmarks.
Benchmark Throughput Performance with running local large language models (LLMs) via ollama.
Browser Benchmarks
Speedometer is a browser benchmark that measures the responsiveness of Web applications. It uses demo web applications to simulate user actions such as adding to-do items.
JS performance - Dev tool. Benchmark your JS snippets for an optimized performance.
UserBenchmark Speed test your PC in less than a minute.
An Open, Collaborative Testing Platform For Benchmarking & Performance Analysis
Test your human brain processing capacities.
Tsung is a high-performance benchmark framework for various protocols including HTTP, XMPP, LDAP, etc.
Related contents: