Biapy's Bookmarks

https://github.com/experientiallabs/world-model-optimizer

Build continually improving models on your agent traces by distilling frontier open models.

wmo optimize turns collected agent traces into smaller open-source models using the Tinker API, with optional closed-loop simulation training. wmo serve exposes an endpoint that routes requests between frontier and smaller models; on RouterBench, it maintains frontier quality at 27% lower cost. Rerun the pipeline as new traces arrive to continually improve a model you own.

ai llm machine-learning open-source training

Added 10 hours ago

EvidenceForge

https://github.com/Cisco-Talos/EvidenceForge

Generate realistic synthetic security logs for cybersecurity threat hunting training and research.

Related contents:

#80 - Patch Tuesday record et messagerie Tchap compromise @ Erreur 403 :fr:.

faker foss logs mit-licensed open-source security training

Added 1 month ago

GuppyLM

https://github.com/arman-bd/guppylm

A ~9M parameter LLM that talks like a small fish.

This project exists to show that training your own language model is not magic. No PhD required. No massive GPU cluster. One Colab notebook, 5 minutes, and you have a working LLM that you built from scratch — data generation, tokenizer, model architecture, training loop, and inference. If you can run a notebook, you can train a language model.

e-learning foss llm machine-learning mit-licensed open-source training

Added 3 months ago

autoresearch

https://github.com/karpathy/autoresearch

AI agents running research on single-GPU nanochat training automatically.

The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model.

Related contents:

Autoresearch on an old research idea @ Yogesh Kumar.

foss llm mit-licensed openclaw open-source training

Added 4 months ago

Label Studio

https://labelstud.io/

Open Source Data Labeling.

The most flexible data labeling platform to fine-tune LLMs, prepare training data, or evaluate AI systems. Label Studio is a multi-type data labeling and annotation tool with standardized output format. Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.

Label Studio @ GitHub.

ai apache2-licensed computer-vision data-science foss llm machine-learning metadata ocr open-source training

Added 4 months ago

AReaL

https://inclusionai.github.io/AReaL/en/intro.html

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

AReaL is an open-source fully asynchronous reinforcement learning training system for large reasoning and agentic models, developed by members from Tsinghua IIIS and the AReaL Team at Ant Group. Built upon the open-source project ReaLHF, we are fully committed to open-source principles by providing the training details, data, and infrastructure required to reproduce our results, along with the models themselves. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable—we hope you enjoy our project just as much as you'd enjoy real milk tea. Cheers!

AReaL @ GitHub.

ai apache2-licensed foss llm open-source training

Added 4 months ago

Agent-lightning

https://microsoft.github.io/agent-lightning/stable/

Agent Lightning is the absolute trainer to light up AI agents.

agent-lightning is an open-source framework for training and optimizing AI agents—enabling reinforcement learning (RL), automatic prompt optimization, supervised fine-tuning, and more—without requiring substantial changes to existing agent code. It works with virtually any agent framework (e.g., LangChain, OpenAI Agents SDK, and AutoGen) and provides modular components to collect agent execution data and iteratively improve agent performance via a decoupled RL training loop.

Agent Lightning @ GitHub.

ai ai-agent foss microsoft mit-licensed open-source training

Added 7 months ago

Mellivora

https://github.com/Nakiami/mellivora?tab=readme-ov-file

Mellivora is a CTF engine written in PHP.

ctf foss open-source pentest php security training

Added 1 year ago

AgileFingers

https://agilefingers.com/fr

Touch typing is a method of typing that uses all your fingers without needing to look at the keyboard. It is a fast, efficient way of typing. AgileFingers is a free online practice that teaches you how to master this technique, with fast typing exercises broken down into lessons, texts, and games. Additionally, there is a typing test to measure your progress.

Related contents:

Apprends à taper en mode dactylo ou jette ton clavier @ Code avec Maximilien's YouTube :fr:.

gamification keyboard touch-typing training web-service

Added 1 year ago

CSS Flexbox Playground

https://yoavsbg.github.io/css-flexbox-playground/

Interactive CSS Flexbox Learning Tool.

Experiment with different flex properties to understand how they affect layout. Adjust the controls below to see changes in real-time and copy the generated CSS code.

CSS Flexbox Playground @ GitHub.

css e-learning flexbox foss open-source playground self-hosted training web-app web-service

Added 1 year ago

vulnerable-AD

https://github.com/safebuffer/vulnerable-AD

Create a vulnerable active directory that's allowing you to test most of the active directory attacks in a local lab

active-directory e-learning foss open-source pentest security training

Added 1 year ago

keybr.com

https://www.keybr.com/

This web application will help you to learn touch typing which means typing through muscle memory without using your eyesight to find the keys. It can improve your typing speed and accuracy dramatically. The opposite is hunt and peck typing, a method of typing in which you look at the keyboard instead of the screen, and use only the index fingers.

keybr.com @ GitHub.

foss keyboard open-source self-hosted training web-app web-service

Added 1 year ago

Monkeytype

https://monkeytype.com/

A minimalistic, customizable typing test.

The most customizable typing website with a minimalistic design and a ton of features. Test yourself in various modes, track your progress and improve your speed.

Monkeytype is a minimalistic and customizable typing test. It features many test modes, an account system to save your typing speed history, and user-configurable features such as themes, sounds, a smooth caret, and more. Monkeytype attempts to emulate a natural typing experience during a typing test by unobtrusively presenting the text prompts and displaying typed characters in place, providing straightforward, real-time feedback on typos, speed, and accuracy.

Monkeytype @ GitHub.

foss keyboard open-source self-hosted training web-app web-service

Added 1 year ago

Ngram Type

https://ranelpadon.github.io/ngram-type/

Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.

Ngram Type @ GitHub.

keyboard open-source self-hosted touch-typing training web-app

Added 1 year ago

Meta Lingua

https://github.com/facebookresearch/lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Meta Lingua is a minimal and fast LLM training and inference library designed for research. Meta Lingua uses easy-to-modify PyTorch components in order to try new architectures, losses, data, etc. We aim for this code to enable end to end training, inference and evaluation as well as provide tools to better understand speed and stability. While Meta Lingua is currently under development, we provide you with multiple apps to showcase how to use this codebase.

ai llm machine-learning open-source python pytorch training

Added 1 year ago

InstructLab

https://instructlab.ai/

A new community-based approach to build truly open-source LLMs.

InstructLab Command-Line Interface. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data.

ai llm machine-learning open-source rag training

Added 1 year ago

Vulhub

https://vulhub.org/

Docker-Compose file for vulnerability environment.

Vulhub is an open-source collection of pre-built vulnerable docker environments. No pre-existing knowledge of docker is required, just execute two simple commands and you have a vulnerable environment.

Vulhub @ GitHub.

Related contents:

docker e-learning open-source pentest security training

Added 2 years ago