Biapy's Bookmarks

Gradio

https://www.gradio.app/

Build machine learning apps in Python.

Create web interfaces for your ML models in minutes. Deploy anywhere, share with anyone.

Gradio is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model, API, or any arbitrary Python function. You can then share a link to your demo or web application in just a few seconds using Gradio's built-in sharing features. No JavaScript, CSS, or web hosting experience needed!

Gradio @ GitHub.

Related contents:

Gradio 6 débarque pour créer des interfaces encore plus fluides @ Korben :fr:.

apache2-licensed development foss machine-learning open-source python web-ui

Added 5 days ago

Magika

https://securityresearch.google/magika/introduction/overview

Fast and accurate AI powered file content types detection.

Magika is a novel AI-powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized model that only weighs about a few MBs, and enables precise file identification within milliseconds, even when running on a single CPU. Magika has been trained and evaluated on a dataset of ~100M samples across 200+ content types (covering both binary and textual file formats), and it achieves an average ~99% accuracy on our test set.

Magika @ GitHub.

apache2-licensed files foss machine-learning open-source

Added 3 weeks ago

OpenPCC

https://github.com/openpcc/openpcc

An open-source framework for provably private AI inference.

OpenPCC is an open-source framework for provably private AI inference, inspired by Apple’s Private Cloud Compute but fully open, auditable, and deployable on your own infrastructure. It allows anyone to run open or custom AI models without exposing prompts, outputs, or logs - enforcing privacy with encrypted streaming, hardware attestation, and unlinkable requests.

ai apache2-licensed foss machine-learning open-source

Added 4 weeks ago

Rmlx

https://hughjonesd.github.io/Rmlx/index.html

R interface to Apple’s MLX (Machine Learning eXchange) library.

Rmlx provides an R interface to Apple’s MLX framework, enabling high-performance GPU computing on Apple Silicon.

Rmlx @ GitHub.

apple apple-silicon machine-learning mit-licensed mlx open-source r

Added 1 month ago

Coral NPU

https://github.com/google-coral/coralnpu

A machine learning accelerator core designed for energy-efficient AI at the edge.

Coral NPU is a hardware accelerator for ML inferencing. Coral NPU is an Open Source IP designed by Google Research and is freely available for integration into ultra-low-power System-on-Chips (SoCs) targeting wearable devices such as hearables, augmented reality (AR) glasses and smart watches.

ai apache2-licensed foss machine-learning open-hardware open-source

Added 1 month ago

Lance

https://lancedb.github.io/lance/

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Lance is a modern columnar data format optimized for machine learning and AI applications. It efficiently handles diverse multimodal data types while providing high-performance querying and versioning capabilities.

Lance @ GitHub.

Related contents:

Lance takes aim at Parquet in file format joust @ The Register.

apache2-licensed columnar data-science duckdb format foss lance llm machine-learning open-source pandas parquet polars pytorch

Added 1 month ago

Effort.jl

https://github.com/CosmologicalEmulators/Effort.jl

EFfective Field theORy surrogaTe.:

Related contents:

Quand l'IA apprend à simuler l'univers sur un simple laptop @ Korben :fr:.

ai foss machine-learning mit-licensed open-source science space

Added 2 months ago

Prophet

https://facebook.github.io/prophet/

Forecasting at scale.

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Prophet: Automatic Forecasting Procedure @ GitHub.

Related contents:

Predictive Autoscaling in Kubernetes with Keda and Prophet @ Minimal Devops' Medium.

forecasting foss library machine-learning mit-licensed open-source python r time-series

Added 2 months ago

The Coding Train

https://thecodingtrain.com/

Welcome to the Coding Train with Daniel Shiffman! A community dedicated to learning creative coding with beginner-friendly tutorials and projects on YouTube and more.

The Coding Train: Machine-Learning @ GitHub.

Related contents:

5 GitHub Repositories for an Instant Knowledge Boost @ Surajondev.

development e-learning machine-learning

Added 2 months ago

OpenVision 2

https://ucsc-vlaa.github.io/OpenVision2/

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning.

OpenVision 2: A Family of Generative Pretrained Visual Encoders that removes the text encoder and contrastive loss, training with caption-only supervision.

OpenVision & OpenVision 2 @ GitHub.

ai apache2-licensed computer-vision foss machine-learning open-source

Added 2 months ago

S3GD

https://github.com/WhyPhyLabs/s3gd

S3GD is a highly optimized, PyTorch-compatible Triton implementation of the Smoothed SignSGD optimizer, meant for reinforcement learning post-training.

Related contents:

S3GD Optimizer Algorithm @ WhyPhyLabs.

data-science foss machine-learning mit-licensed open-source pytorch

Added 2 months ago

Chronon

https://github.com/airbnb/chronon

Chronon is a data platform for serving for AI/ML applications.

Chronon is a platform that abstracts away the complexity of data computation and serving for AI/ML applications. Users define features as transformation of raw data, then Chronon can perform batch and streaming computation, scalable backfills, low-latency serving, guaranteed correctness and consistency, as well as a host of observability and monitoring tools.

It allows you to utilize all of the data within your organization, from batch tables, event streams or services to power your AI/ML projects, without needing to worry about all the complex orchestration that this would usually entail.

ai apache2-licensed data-platform data-transformation foss llm machine-learning open-source

Added 2 months ago

HAMi

https://project-hami.io/

Open, Device Virtualization, VGPU, Heterogeneous AI Computing.

HAMi (Heterogeneous AI Computing Virtualization Middleware) formerly known as k8s-vGPU-scheduler, is an 'all-in-one' chart designed to manage Heterogeneous AI Computing Devices in a k8s cluster. It can provide the ability to share Heterogeneous AI devices and provide resource isolation among tasks.

HAMi @ GitHub.

ai apache2-licensed foss gpu helm-chart kubernetes machine-learning open-source

Added 2 months ago

GeoAI

https://opengeoai.org/

A powerful Python package for integrating artificial intelligence with geospatial data analysis and visualization.

GeoAI @ GitHub.

Related contents:

GeoAI Workshop: Unlocking the Power of GeoAI with Python @ Open Geospatial Solutions' YouTube.

ai foss geospatial machine-learning mit-licensed open-source python

Added 2 months ago

ToddlerBot

https://toddlerbot.github.io/

Open-Source ML-Compatible Humanoid Platform for Loco-Manipulation.

ToddlerBot is a low-cost, open-source humanoid robot platform designed for scalable policy learning and research in robotics and AI.

This codebase includes low-level control, RL training, DP training, real-world deployment and basically EVERYTHING you need to run ToddlerBot in the real world!

ToddlerBot @ GitHub.

Related contents:

ToddlerBot - Le robot humanoïde à 4300 $ qui ridiculise les géants de la tech @ Korben :fr:.

ai foss machine-learning mit-licensed open-source robotics

Added 2 months ago

TimesFM (Time Series Foundation Model)

https://github.com/google-research/timesfm

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.

Related contents:

Google sort TimesFM, son modèle IA qui prédit l'avenir des séries temporelles @ Korben :fr:.

ai apache2-licensed foss machine-learning open-source time-series

Added 2 months ago

PassGAN

https://github.com/brannondorsey/PassGAN

A Deep Learning Approach for Password Guessing.

Related contents:

brute-force deep-learning foss machine-learning mit-licensed open-source pentest python security

Added 2 months ago

Enhance Lab :fr:

https://enhancelab.fr/

AI and inverse problems for a revolution in digital photography.

Related contents:

S5E21 - On a reçu le génie français qui révolutionne la vision artificielle @ Underscore_ :fr:.

ai commercial computer-vision france image-manipulation machine-learning photography

Added 3 months ago

spaCy

https://spacy.io/

💫 Industrial-strength Natural Language Processing (NLP) in Python.

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products.

spaCy comes with pretrained pipelines and currently supports tokenization and training for 70+ languages. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification and more, multi-task learning with pretrained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management. spaCy is commercial open-source software, released under the MIT license.

spaCy @ GitHub.

Related contents:

foss library llm machine-learning mit-licensed nlp open-source python

Added 3 months ago

Qwen/Qwen3-Embedding-0.6B @ Hugging Face

https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

Related contents:

Embedding Millions of Text Documents With Qwen3 @ daft.

ai embeddings llm machine-learning qwen

Added 3 months ago

Monarch

https://github.com/pytorch-labs/monarch

PyTorch Single Controller.

Monarch is a distributed execution engine for PyTorch. Our overall goal is to deliver the high-quality user experience that people get from single-GPU PyTorch, but at cluster scale.

bsd3-licensed foss machine-learning open-source python pytorch

Added 5 months ago

Mojo 🔥

https://www.modular.com/mojo

Powerful CPU+GPU Programming. Mojo is a pythonic language for blazing-fast CPU+GPU execution without CUDA. Optionally use it with MAX for insanely fast AI inference.

Modular Platform @ GitHub.

Related contents:

Python can run Mojo now @ koaning.io.

apache2-licensed gpu language machine-learning open-source python

Added 5 months ago

Featureform

https://www.featureform.com/

The Data Layer for Agentic Enrichment and ML Features. The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

Featureform is a virtual feature store. It enables data scientists to define, manage, and serve their ML model's features. Featureform sits atop your existing infrastructure and orchestrates it to work like a traditional feature store. By using Featureform, a data science team can solve the following organizational problems:

Featureform @ GitHub.

ai-agent data machine-learning mpl2-licensed open-source

Added 5 months ago

KServe

https://kserve.github.io/website/

Standardized Serverless ML Inference Platform on Kubernetes. Highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI.

KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It aims to solve production model serving use cases by providing high abstraction interfaces for Tensorflow, XGBoost, ScikitLearn, PyTorch, Huggingface Transformer/LLM models using standardized data plane protocols.

KServe @ GitHub.

Related contents:

KServe becomes a CNCF incubating project @ CNCF Blog.

ai apache2-licensed foss kubernetes llm llmops machine-learning open-source

Added 5 months ago

sports

https://github.com/roboflow/sports

computer vision and sports.

In sports, every centimeter and every second matter. That's why Roboflow decided to use sports as a testing ground to push our object detection, image segmentation, keypoint detection, and foundational models to their limits. This repository contains reusable tools that can be applied in sports and beyond.

computer-vision foss machine-learning mit-licensed object-detection open-source python

Added 6 months ago

micrograd

https://github.com/karpathy/micrograd?tab=readme-ov-file

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API.

A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API. Both are tiny, with about 100 and 50 lines of code respectively. The DAG only operates over scalar values, so e.g. we chop up each neuron into all of its individual tiny adds and multiplies. However, this is enough to build up entire deep neural nets doing binary classification, as the demo notebook shows. Potentially useful for educational purposes.

Related contents:

Writing that changed how I think about PL @ Max Bernstein.

foss machine-learning mit-licensed neural-network open-source python

Added 6 months ago

Ktransformers

https://kvcache-ai.github.io/ktransformers/

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations.

KTransformers, pronounced as Quick Transformers, is designed to enhance your 🤗 Transformers experience with advanced kernel optimizations and placement/parallelism strategies.

KTransformers is a flexible, Python-centric framework designed with extensibility at its core. By implementing and injecting an optimized module with a single line of code, users gain access to a Transformers-compatible interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified ChatGPT-like web UI.

Ktransformers @ GitHub.

ai apache2-licensed foss llm machine-learning open-source optimization

Added 6 months ago

ANEMLL

https://github.com/Anemll/Anemll

Artificial Neural Engine Machine Learning Library.

ANEMLL (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).

ane apple development library llm machine-learning

Added 7 months ago

Hyperparam

https://hyperparam.app/

Look At Your Data 👀.

Data quality is the most important factor in machine learning success. Hyperparam brings exploration and analysis of massive text datasets to the browser.

Hyperparam @ GitHub.

data-analytics data-explorer data-science foss machine-learning mit-licensed open-source parquet web-app

Added 7 months ago

CoRT (Chain of Recursive Thoughts) 🧠🔄

https://github.com/PhialsBasement/Chain-of-Recursive-Thoughts

I made my AI think harder by making it argue with itself repeatedly. It works stupidly well.

CoRT makes AI models recursively think about their responses, generate alternatives, and pick the best one. It's like giving the AI the ability to doubt itself and try again... and again... and again.

ai foss llm machine-learning mit-licensed open-source prompt-engineering

Added 7 months ago

node-mlx

https://github.com/frost-beta/node-mlx

A machine learning framework for Node.js, based on MLX.

development foss framework machine-learning mit-licensed mlx nodejs open-source

Added 7 months ago

Boson

https://github.com/bosonstack/boson

A self-contained, lightweight and OOB research platform for modern ML.

Boson is a lightweight, fully containerized, and feature-rich machine learning research platform. It centralizes essential tools to help teams keep projects lean, organized, and reproducible—while reducing overhead and boosting productivity. Think Databricks/Sagemaker but local and free.

Boson enables engineers and researchers to iterate faster without getting bogged down by infrastructure or tooling complexity.

ai bsl-licensed data-science llm llmops machine-learning open-source self-hosted web-app

Added 7 months ago

xorq

https://www.xorq.dev/

ML Pipelines From Another Planet.Build out-of-this-world ML pipelines.

Run-anywhere computational framework for Python that simplifies and accelerates ML workflows and development. xorq is a deferred computational framework for building, running, and serving pandas groupby-apply style pipelines common in ML workflows. xorq is built on top of Ibis and Apache DataFusion.

apache2-licensed foss framework machine-learning ml-pipeline open-source python workflow

Added 8 months ago

Jobset

https://jobset.sigs.k8s.io/

JobSet: a k8s native API for distributed ML training and HPC workloads

JobSet is a Kubernetes-native API for managing a group of k8s Jobs as a unit. It aims to offer a unified API for deploying HPC (e.g., MPI) and AI/ML training workloads (PyTorch, Jax, Tensorflow etc.) on Kubernetes.

JobSet @ GitHub.

Related contents:

Introducing JobSet @ Kubernetes blog.

ai apache2-licensed api devops foss hpc k8s kubernetes llmops machine-learning open-source

Added 8 months ago

NVIDIA Dynamo

https://developer.nvidia.com/dynamo

A Datacenter Scale Distributed Inference Serving Framework.

NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLang or others) and captures LLM-specific capabilities.

Dynamo @ GitHub.

Related contents:

A closer look at Dynamo, Nvidia's 'operating system' for AI inference @ The register.

ai apache2-licensed distributed foss genai inference llm machine-learning nvidia open-source

Added 8 months ago

Flower

https://flower.ai/

A Friendly Federated AI Framework.

A unified approach to federated learning, analytics, and evaluation. Federate any workload, any ML framework, and any programming language.

Flower @ GitHub.

ai apache2-licensed distributed federated foss framework machine-learning open-source python

Added 8 months ago

Kokoro Web

https://voice-generator.pages.dev/

Free & Open-Source AI Voice Generator.

A powerful, browser-based AI voice generator that lets you create natural-sounding voices without installing anything.

Use it directly in your browser or self-host it for your own applications with OpenAI API compatibility!

Kokoro Web @ GitHub.

ai foss machine-learning mit-licensed openai open-source self-hosted text-to-speech voice web-app

Added 8 months ago

WAGMIOS

https://github.com/mentholmike/wagmios

WAGMIOS is a self-hosted container management system with AI-powered automation. It enables you to efficiently manage your containers with W.I.L.L.O.W, an AI assistant that optimizes your workflow.

ai container foss machine-learning open-source self-hosted

Added 8 months ago

Evolving Agents Framework

https://github.com/matiasmolinas/evolving-agents

Evolving agents is a production-grade environment for orchestrating, evolving, and managing AI agents.

A production-grade framework for creating, managing, and evolving AI agents with intelligent agent-to-agent communication. The framework enables you to build collaborative agent ecosystems that can semantically understand requirements, evolve based on past experiences, and communicate effectively to solve complex tasks.

ai ai-agent foss framework llm machine-learning open-source python

Added 8 months ago

Letta

https://www.letta.com/

The Platform for Building Stateful Agents. Build agents with infinite context and human-like memory, that can learn from data and improve with experience. Letta (formerly MemGPT) is a framework for creating LLM services with memory.

👾 Letta is an open source framework for building stateful LLM applications. You can use Letta to build stateful agents with advanced reasoning capabilities and transparent long-term memory. The Letta framework is white box and model-agnostic.

Related contents:

Letta Filesystem @ Letta documentation.

ai ai-agent development foss framework llm machine-learning open-source python stateful

Added 8 months ago

CAMEL-AI Framework

https://camel-ai.org/

Finding the Scaling Laws of Agents. The first and the best multi-agent framework.

🐫 CAMEL is an open-source community dedicated to finding the scaling laws of agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.

The framework enables multi-agent systems to continuously evolve by generating data and interacting with environments. This evolution can be driven by reinforcement learning with verifiable rewards or supervised learning.

CAMEL-AI @ GitHub.

ai ai-agent foss framework llm machine-learning open-source python

Added 8 months ago

superglue

https://superglue.cloud/

superglue is an open-source server that sits as a layer between complex APIs and your application. With superglue, you always get the data that you want in the format that you expect. Fetch data from JSON and XML APIs, as well as CSV and Excel files in seconds.

superglue @ GitHub.

api-gateway data-pipeline foss legacy machine-learning open-source self-hosted web-app

Added 9 months ago

Spaces @ Hugging Face

https://huggingface.co/spaces

The AI App Directory.

Related contents:

#106 - Les news web dev pour février 2025 @ Double Slash :fr:.

ai hugging-face llm machine-learning search-engine web-service

Added 9 months ago

DeepEval

https://docs.confident-ai.com/

The Open-Source LLM Evaluation Framework.

DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation.

DeepEval @ GitHub.

foss framework llm llm-evaluation llmops machine-learning open-source python

Added 9 months ago

GenSX

https://www.gensx.com/

The TypeScript framework for agents & workflows with react-like components. Lightning fast dev loop. Easy to learn. Easy to extend.

Build complex AI applications with React-like components. GenSX is a simple typescript framework for building agents and workflows with reusable React-like components. GenSX takes a lot of inspiration from React, but the programming model is very different - it’s a Node.js framework designed for data flow.

GenSX @ GitHub.

ai ai-agent development foss framework javascript llm machine-learning nodejs open-source typescript

Added 9 months ago

Zonos

https://github.com/Zyphra/Zonos

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.

Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

Related contents:

Zonos, l’IA ultime pour cloner une voix ? @ Choses à Savoir TECH :fr:.

foss machine-learning open-source self-hosted text-to-speech

Added 9 months ago

BirdNET-Analyzer

https://kahst.github.io/BirdNET-Analyzer/

BirdNET-Analyzer is an open source tool for analyzing bird calls using machine learning models. It can process large amounts of audio recordings and identify (bird) species based on their calls.

BirdNET-Analyzer @ GitHub.

audio data-analytics ecology foss machine-learning open-source

Added 9 months ago

OptaPlanner

https://www.optaplanner.org/

The fast, Open Source and easy-to-use solver. Solve planning and scheduling problems with OptaPlanner.

A fast, easy-to-use, open source AI constraint solver for software developers

OptaPlanner @ GitHub.

Related contents:

How I built an AI company to save my open source project @ timefold.

foss machine-learning open-source scheduling

Added 9 months ago

smolmodels ✨

https://github.com/plexe-ai/smolmodels

build ml models in natural language and minimal code.

Create machine learning models with minimal code by describing what you want them to do in plain words. You explain the task, and the library builds a model for you, including data generation, feature engineering, training, and packaging.

ai foss llm machine-learning open-source

Added 9 months ago

Open LLM Lists

https://openllmlist.com/

Trending Open AI Models.

ai curated llm machine-learning web-service

Added 9 months ago

Modern-Day Oracles or Bullshit Machines ?

https://thebullshitmachines.com/

For better or for worse, LLMs are here to stay. We all read content that they produce online, most of us interact with LLM chatbots, and many of us use them to produce content of our own.

In a series of five- to ten-minute lessons, we will explain what these machines are, how they work, and how to thrive in a world where they are everywhere.

You will learn when these systems can save you a lot of time and effort. You will learn when they are likely to steer you wrong. And you will discover how to see through the hype to tell the difference. ?

ai data-science e-learning llm machine-learning web-service

Added 9 months ago

AI by Hand ✍️ Exercises in Excel

https://github.com/ImagineAILab/ai-by-hand-excel

AI by Hand ✍️ Exercises in Excel

ai data-science e-learning excel foss machine-learning open-source

Added 9 months ago

How To Scale Your Model

https://jax-ml.github.io/scaling-book/

A Systems View of LLMs on TPUs.

This book aims to demystify the art of scaling LLMs on TPUs. We try to explain how TPUs work, how LLMs actually run at scale, and how to pick parallelism schemes during training and inference that avoid communication bottlenecks.

How To Scale Your Model @ GitHub.

ebook e-learning foss llm machine-learning open-source tpu

Added 10 months ago

Oumi

https://oumi.ai/

Open Universal Machine Intellingence. E2E Foundation Model Research Platform. Everything you need to build state-of-the-art foundation models, end-to-end.

Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

Oumi @ GitHub.

ai foss llm machine-learning open-source python

Added 10 months ago

🤗 Transformers

https://huggingface.co/docs/transformers/index

State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX.

🤗 Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch.

🤗 Transformers @ GitHub.

Related contents:

Running inference in web extensions @ dist://ed.

ai foss machine-learning open-source pytorch tensorflow transformer

Added 10 months ago

ONNX Runtime

https://onnxruntime.ai/

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc.

ONNX Runtime @ GitHub.

Related contents:

Running inference in web extensions @ dist://ed.

ai foss llm machine-learning open-source pytorch tensorflow

Added 10 months ago

vLLM

https://docs.vllm.ai/en/latest/

Easy, fast, and cheap LLM serving for everyone.

vLLM is a fast and easy-to-use library for LLM inference and serving.

Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evloved into a community-driven project with contributions from both academia and industry.

vLLM @ GitHub.

Related contents:

ai apache2-licensed foss genai llm machine-learning open-source self-hosted

Added 10 months ago

Common Crawl

https://commoncrawl.org/

Open Repository of Web Crawl Data.

Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.

Related contents:

S5E7 - Sommes-nous à l'aube d'un effondrement des IA ? @ Underscore_'s acast :fr:.

crawler llm machine-learning non-profit rag scraping web-service

Added 10 months ago

FineWeb

https://huggingface.co/datasets/HuggingFaceFW/fineweb

15 trillion tokens of the finest data the 🌐 web has to offer.

The 🍷 FineWeb dataset consists of more than 15T tokens of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library.

🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release of the full dataset under the ODC-By 1.0 license. However, by carefully adding additional filtering steps, we managed to push the performance of 🍷 FineWeb well above that of the original 🦅 RefinedWeb, and models trained on our dataset also outperform models trained on other commonly used high quality web datasets (like C4, Dolma-v1.6, The Pile, SlimPajama, RedPajam2) on our aggregate group of benchmark tasks.

Related contents:

ai dataset llm machine-learning open-source

Added 10 months ago

Materia AI

https://www.trymateria.ai/

Partner of Accounting Leaders. Generative AI platform for intelligent accounting.

The preferred partner of accounting leaders.

Related contents:

#304.bin - Bilan 2024: Le début de la révolution avec Quentin Adam @ <ifttd>.

accounting business commercial finance genai machine-learning web-service

Added 10 months ago