As data volumes continue to grow in fields like machine learning and scientific computing, optimizing fundamental operations like matrix multiplication becomes increasingly critical. Blosc2's chunk-based approach offers a new path to efficiency in these scenarios.
Blosc is a high performance compressor optimized for binary data (i.e. floating point numbers, integers and booleans, although it can handle string data too). It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc main goal is not just to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.
Related contents:
The Arc Virtual Cell Atlas is a collection of high quality, curated, open datasets assembled for the purpose of accelerating the creation of virtual cell models. The Atlas includes both observational and perturbational data from over 300 million cells (and growing).
Easy web apps for data science without the compromises.
No web development skills required.
Related contents:
Reproducible Data Science Environments with Nix.
{rix} is an R package that leverages Nix, a package manager focused on reproducible builds. With Nix, you can create project-specific environments with a custom version of R, its packages, and all system dependencies (e.g., GDAL). Nix ensures full reproducibility, which is crucial for research and development projects.
Related contents:
The R Project for Statistical Computing.
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
Related contents:
For better or for worse, LLMs are here to stay. We all read content that they produce online, most of us interact with LLM chatbots, and many of us use them to produce content of our own.
In a series of five- to ten-minute lessons, we will explain what these machines are, how they work, and how to thrive in a world where they are everywhere.
You will learn when these systems can save you a lot of time and effort. You will learn when they are likely to steer you wrong. And you will discover how to see through the hype to tell the difference. ?
Research and data to make progress against the world’s largest problems.
To make progress against the pressing problems the world faces, we need to be informed by the best research and data.
Our World in Data makes this knowledge accessible and understandable, to empower those working to build a better world.
A faster way to build and share data apps.
Streamlit turns data scripts into shareable web apps in minutes.
All in pure Python. No front‑end experience required.
Streamlit lets you transform Python scripts into interactive web apps in minutes, instead of weeks. Build dashboards, generate reports, or create chat apps. Once you’ve created an app, you can use our Community Cloud platform to deploy, manage, and share your app.
Multi-modal modular data ingestion and retrieval.
DataBridge is an open source library for natural language search and management of multi-modal data. Get started by installing databridge now!
DataBridge is a powerful document processing and retrieval system designed for building intelligent document-based applications. It provides a robust foundation for semantic search, document processing, and AI-powered document interactions.
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL.
Related contents:
Open-source document processing platform built for knowledge workers.
Rowfill helps extract, analyze, and process data from complex documents, images, PDFs and more with advanced AI capabilities.
Insights, Unlocked in Real Time.
Apache Pinot: The real-time analytics open source platform for lightning-fast insights, effortless scaling, and cost-effective data-driven decisions.
Related contents:
High performance array computing.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Related contents: