Biapy's Bookmarks

📦 Repopack

https://github.com/yamadashy/repopack

📦 Repopack is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, and Gemini.

ai chatgpt claude development foss gemini git llm open-source rag

Added 1 year ago

Kotaemon

https://cinnamon.github.io/kotaemon/

An open-source RAG-based tool for chatting with your documents.

An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind.

Kotaemon @ GitHub.

ai foss llm machine-learning open-source rag self-hosted web-app

Added 1 year ago

AutoArena

https://www.kolena.com/autoarena/

Rank LLMs, RAG systems, and prompts using automated judge evaluation.

llm machine-learning rag ranking

Added 1 year ago

Trafilatura

https://trafilatura.readthedocs.io/en/latest/

A Python package & command-line tool to gather text on the Web.

Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats.

Trafilatura @ GitHub.

Related contents:

Alimenter les RAG/LLM avec Trafilatura @ DevSecOps :fr:.

apache2-licensed foss markdown open-source python rag scraping

Added 2 years ago

MarkItDown

https://github.com/microsoft/markitdown

Python tool for converting files and office documents to Markdown. MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).

Related contents:

command-line exif foss llm microsoft-office ocr open-source parser pdf python rag

Added 10 months ago

LightRAG

https://github.com/HKUDS/LightRAG

Simple and Fast Retrieval-Augmented Generation.

The LightRAG Server is designed to provide Web UI and API support. The Web UI facilitates document indexing, knowledge graph exploration, and a simple RAG query interface. LightRAG Server also provide an Ollama compatible interfaces, aiming to emulate LightRAG as an Ollama chat model. This allows AI chat bot, such as Open WebUI, to access LightRAG easily.

ai foss mit-licensed ollama open-source rag self-hosted web-app

Added 6 days ago

GraphRAG

https://microsoft.github.io/graphrag/

The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.

GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets. The GraphRAG process involves extracting a knowledge graph out of raw text, building a community hierarchy, generating summaries for these communities, and then leveraging these structures when perform RAG-based tasks.

foss knowledge-graph llm machine-learning open-source rag

Added 1 year ago

Verba

https://github.com/weaviate/Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate.

Welcome to Verba: The Golden RAGtriever, an open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally with HuggingFace and Ollama or through LLM providers such as OpenAI, Cohere, and Google.

chatbot data-science llm machine-learning open-source rag

Added 1 year ago

Pathway

https://pathway.com/

Power Your AI with Live Data.

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Pathway @ GitHub.

ai bsl-licensed etl framework llm rag source-available

Added 10 months ago

React Native ExecuTorch

https://docs.swmansion.com/react-native-executorch/

Declarative way to run AI models in React Native on device, powered by ExecuTorch.

React Native ExecuTorch is a declarative way to run AI models in React Native on device, powered by ExecuTorch 🚀. It offers out-of-the-box support for many LLMs, computer vision models, and many many more. Feel free to check them out on our HuggingFace page.

ExecuTorch is a novel framework created by Meta that enables running AI models on devices such as mobile phones or microcontrollers.

React Native ExecuTorch @ GitHub.

Related contents:

Introducing React Native RAG: Local & Offline Retrieval-Augmented Generation @ Software Mansion's Medium.

ai bsd3-licensed development executorch foss mit-licensed open-source rag react-native

Added 3 months ago

InstructLab

https://instructlab.ai/

A new community-based approach to build truly open-source LLMs.

InstructLab Command-Line Interface. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data.

ai llm machine-learning open-source rag training

Added 1 year ago

Milvus

https://milvus.io/

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.

Milvus @ GitHub.

Related contents:

RAG Against The Machine @ Quoi de neuf les devs ? :fr:.

ai apache2-licensed database data-science foss llm machine-learning open-source rag self-hosted vector

Added 2 years ago

Morphik

https://www.morphik.ai/

Build Agents that Never Hallucinate. Deploy the most accurate RAG in the world in two lines of code.

The most accurate document search and store for building AI apps.

Morphik @ GitHub.

Related contents:

Don't bother parsing: Just use images for RAG @ Morphik.

ai ai-agent bsl-licensed computer-vision llm rag source-available

Added 3 months ago

Crawl4AI

https://crawl4ai.com/mkdocs/

Open-Source LLM-Friendly Web Crawler & Scraper.

Crawl4AI delivers blazing-fast, AI-ready web crawling tailored for large language models, AI agents, and data pipelines. Fully open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.

Crawl4AI @ GitHub.

ai crawler foss llm open-source rag scraping

Added 9 months ago

🌟 Awesome LLM Apps

https://github.com/Shubhamsaboo/awesome-llm-apps

Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.

A curated collection of awesome LLM apps built with RAG and AI agents. This repository features LLM apps that use models from OpenAI, Anthropic, Google, and even open-source models like LLaMA that you can run locally on your computer.

awesome-list foss llama llm machine-learning open-source rag

Added 10 months ago

Dot

https://dotapp.uk/

Text-To-Speech, RAG, and LLMs. All local!

Dot is a standalone, open-source application designed for seamless interaction with documents and files using local LLMs and Retrieval Augmented Generation (RAG). It is inspired by solutions like Nvidia's Chat with RTX, providing a user-friendly interface for those without a programming background. Using the Phi-3 LLM by default, Dot ensures accessibility and simplicity right out of the box.

Dot – L’app IA locale pour interagir avec vos documents (RAG) @ Korben :fr:.

ai desktop llm machine-learning open-source rag

Added 1 year ago

DSPy (Declarative Self-improving Python)

https://dspy.ai/

The framework for programming—not prompting—language models

DSPy is a declarative framework for building modular AI software. It allows you to iterate fast on structured code, rather than brittle strings, and offers algorithms that compile AI programs into effective prompts and weights for your language models, whether you're building simple classifiers, sophisticated RAG pipelines, or Agent loops.

DSPy @ GitHub.

Related contents:

Building and Optimizing Multi-Agent RAG Systems with DSPy and GEPA @ Isaac Kargar's Medium.

ai development foss framework llm mit-licensed open-source python rag

Added 1 month ago

Khoj AI

https://khoj.dev/

Your AI Second Brain. Ask anything, understand documents, create new content.

Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

Khoj @ GitHub.

Related contents:

Khoj - Un assistant IA privé qui vous accompagne au quotidien @ Korben :fr:.

foss llm open-source rag self-hosted web-app

Added 9 months ago

Memvid

https://github.com/Olow304/memvid

Video-Based AI Memory 🧠📹.

Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.

Memvid revolutionizes AI memory management by encoding text data into videos, enabling lightning-fast semantic search across millions of text chunks with sub-second retrieval times. Unlike traditional vector databases that consume massive amounts of RAM and storage, Memvid compresses your knowledge base into compact video files while maintaining instant access to any piece of information.

ai foss llm mit-licensed open-source optimization python rag semantic

Added 5 months ago

Onyx

https://www.onyx.app/

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

Onyx (Formerly Danswer) is the AI Assistant connected to your company's docs, apps, and people. Onyx provides a Chat interface and plugs into any LLM of your choice. Onyx can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Onyx is dual Licensed with most of it under MIT license and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring AI Assistants.

Onyx @ GitHub.

ai business chatbot llm rag source-available

Added 10 months ago

MinerU

https://mineru.readthedocs.io/en/latest/

MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.

MinerU @ GitHub.

converter json llm markdown open-source parser pdf rag

Added 11 months ago

Common Crawl

https://commoncrawl.org/

Open Repository of Web Crawl Data.

Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.

Related contents:

S5E7 - Sommes-nous à l'aube d'un effondrement des IA ? @ Underscore_'s acast :fr:.

crawler llm machine-learning non-profit rag scraping web-service

Added 9 months ago

LEANN

https://github.com/yichuan-w/LEANN

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

LEANN is an innovative vector database that democratizes personal AI. Transform your laptop into a powerful RAG system that can index and search through millions of documents while using 97% less storage than traditional solutions without accuracy loss.

Related contents:

LEANN - L'IA personnelle qui écrase 97% de ses concurrents (en taille) @ Korben :fr:.

ai foss mit-licensed open-source python rag vector-database vector-search

Added 1 month ago

OmniParse

https://omniparse.cognitivelab.in/

Convert Anything into Structured Actionable Data.

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks.

OmniParse is a platform that ingests and parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Whether you are working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured, and ready for AI applications such as RAG, fine-tuning, and more

OmniParse @ GitHub.

data-science foss genai llm open-source parser rag

Added 10 months ago

OmniParse

https://docs.cognitivelab.in/

OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Whether working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured and ready for AI applications, such as RAG , fine-tuning and more.

OmniParse @ GitHub.

ai foss genai llm open-source python rag web-app

Added 1 year ago

Mastra

https://mastra.ai/

The Typescript AI framework.

Mastra is an opinionated Typescript framework that helps you build AI applications and features quickly. It gives you the set of primitives you need: workflows, agents, RAG, integrations, syncs and evals. You can run Mastra on your local machine, or deploy to a serverless cloud.

Mastra @ GitHub.

Related contents:

ai framework llm machine-learning open-source rag typescript

Added 11 months ago

TiDB

https://tidb.ai/

AI Assistant. Knowledge Graph based RAG built with TiDB Serverless Vector Storage and LlamaIndex.

An open source GraphRAG (Knowledge Graph) built on top of TiDB Vector and LlamaIndex and DSPy. pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage.

autoflow @ GitHub.

foss graph knowledge-graph llm machine-learning open-source rag web-app

Added 11 months ago

Gitingest

https://gitingest.com/

Turn any Git repository into a simple text digest of its codebase. Replace 'hub' with 'ingest' in any github url to get a prompt-friendly extract of a codebase.

This is useful for feeding a codebase into any LLM.

Gitingest @ GitHub.

Related contents:

GitIngest - Transformez votre code en prompts pour LLM @ Korben :fr:.

browser-addon chrome firefox foss llm open-source rag

Added 8 months ago

RAG-Anything

https://github.com/HKUDS/RAG-Anything?utm_source=tldrdevops

All-in-One RAG Framework

Modern documents increasingly contain diverse multimodal content—text, images, tables, equations, charts, and multimedia—that traditional text-focused RAG systems cannot effectively process. RAG-Anything addresses this challenge as a comprehensive All-in-One Multimodal Document Processing RAG system built on LightRAG.

As a unified solution, RAG-Anything eliminates the need for multiple specialized tools. It provides seamless processing and querying across all content modalities within a single integrated framework. Unlike conventional RAG approaches that struggle with non-textual elements, our all-in-one system delivers comprehensive multimodal retrieval capabilities.

ai foss framework llm mit-licensed open-source rag

Added 6 days ago

Agentset

https://agentset.ai/

Build Frontier RAG Apps. The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.

Ground AI agents in your knowledge base, minimize hallucinations, and impress out of the box. Agentset is the open-source platform to build, evaluate, and ship production-ready RAG and agentic applications. It provides end-to-end tooling: ingestion, vector indexing, evaluation/benchmarks, chat playground, hosting, and a clean API with first-class developer experience.

Agentset @ GitHub.

Related contents:

Production RAG: what I learned from processing 5M+ documents @ Abdellatif Abdelfattah.

ai foss knowledge-base llm mit-licensed open-source rag self-hosted web-app

Added 1 week ago

Pinecone

https://www.pinecone.io/

The vector database to build knowledgeable AI.

The vector database for machine learning applications. Build vector-based personalization, ranking, and search systems that are accurate, fast, and scalable.

Related contents:

Building a Hybrid Search RAG System with Pinecone and LangChain @ Arpan Roy's Medium.

ai commercial rag vector-data vector-database web-service

Added 1 month ago

Repomix

https://repomix.com/

Pack your codebase into AI-friendly formats.

📦 Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, and Gemini.

Repomix @ GitHub.

ai code-generator command-line development foss llm open-source rag

Added 9 months ago

Exmeralda

https://exmeralda.chat/

Exmeralda helps you ask questions about Elixir libraries and get accurate, version-specific answers. Powered by Retrieval-Augmented Generation (RAG), it combines the latest AI with real documentation to deliver helpful, grounded responses.

Exmeralda @ GitHub.

apache2-licensed elixir foss llm open-source rag web-service

Added 6 months ago

Haystack

https://haystack.deepset.ai/

The Production-Ready Open Source AI Framework.

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Haystack @ GitHub.

ai ai-agent development foss framework genai llm open-source python rag

Added 9 months ago

Vanna.AI

https://vanna.ai/

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Personalized AI SQL Agent. Let Vanna.AI write your SQL for you

The fastest way to get actionable insights from your database just by asking questions.

Vanna @ GitHub.

ai database foss llm open-source rag self-hosted sql web-app

Added 1 year ago

Docling

https://ds4sd.github.io/docling/

Docling parses documents and exports them to the desired format with ease and speed. 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.

Docling @ GitHub.

Related contents:

Docling - Pour convertir vos documents sans prise de tête @ Korben :fr:.

asciidoc data-mining data-science docx foss html llm markdown open-source parser pdf pptx python rag

Added 11 months ago

Lobe Chat

https://chat-preview.lobehub.com/welcome

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS) and plugin system. One-click FREE deployment of your private ChatGPT/ Claude application.

claude foss gemini llm ollama openai open-source rag self-hosted web-app

Added 11 months ago

DataBridge

https://databridge.gitbook.io/databridge-docs

Multi-modal modular data ingestion and retrieval.

DataBridge is an open source library for natural language search and management of multi-modal data. Get started by installing databridge now!

DataBridge is a powerful document processing and retrieval system designed for building intelligent document-based applications. It provides a robust foundation for semantic search, document processing, and AI-powered document interactions.

data-science foss library llm nlp open-source rag

Added 9 months ago

PostgresML

https://postgresml.org/

Infra for RAG apps that work in prod. You know Postgres. Now you know machine learning.

Index, filter & rank vectors. Create embeddings. Generate real-time, fact-based outputs.

Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Python, JavaScript and Rust, Korvus delivers high-performance, customizable search capabilities with minimal infrastructure concerns.

Korvus @ GitHub.

database javascript machine-learning open-source postgresql python rag rust sdk

Added 1 year ago

HelixDB

https://www.helix-db.com/

Native Graph-Vector Database.

HelixDB is a powerful, open-source, graph-vector database built in Rust for intelligent data storage for RAG and AI.

HelixDB @ GitHub.

agpl3-licensed ai database foss graph open-source rag vector

Added 5 months ago

yek

https://github.com/bodo-run/yek

A fast tool to read text-based files in a repository or directory, chunk them, and serialize them for LLM consumption.

command-line foss llm machine-learning open-source rag

Added 9 months ago

Documind

https://www.documind.xyz/

Extract structured data from PDFs. Stop wasting time extracting PDFs. Transform your PDF documents into structured data with Documind. Simple, powerful and open-source.

Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.

Documind @ GitHub.

foss llm machine-learning open-source parser pdf rag

Added 11 months ago

Chonkie

https://github.com/bhavnicksm/chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library. The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts

foss llm machine-learning open-source python rag

Added 11 months ago

Fast GraphRAG

https://github.com/circlemind-ai/fast-graphrag

RAG that intelligently adapts to your use case, data, and queries.

Streamlined and promptable Fast GraphRAG framework designed for interpretable, high-precision, agent-driven retrieval workflows.

foss llm machine-learning open-source rag

Added 11 months ago

DataFuel

https://www.datafuel.dev/

Turn websites into LLM - ready data.

DataFuel API scrapes entire websites and knowledge bases in a single query. Get clean, markdown-structured web data instantly for your RAG systems and AI models. No complex scraping code needed.

commercial llm rag scraping web-service

Added 10 months ago