Turn any Git repository into a simple text digest of its codebase.
Replace 'hub' with 'ingest' in any github url to get a prompt-friendly extract of a codebase.
This is useful for feeding a codebase into any LLM.
Related contents:
The Production-Ready Open Source AI Framework.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Open Repository of Web Crawl Data.
Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.
Related contents:
Multi-modal modular data ingestion and retrieval.
DataBridge is an open source library for natural language search and management of multi-modal data. Get started by installing databridge now!
DataBridge is a powerful document processing and retrieval system designed for building intelligent document-based applications. It provides a robust foundation for semantic search, document processing, and AI-powered document interactions.
Pack your codebase into AI-friendly formats.
Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, and Gemini.
Open-Source LLM-Friendly Web Crawler & Scraper.
Crawl4AI delivers blazing-fast, AI-ready web crawling tailored for large language models, AI agents, and data pipelines. Fully open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.
Your AI Second Brain.
Ask anything, understand documents, create new content.
Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
Related contents:
Build context-aware reasoning applications
LangChain is a framework for developing applications powered by large language models (LLMs).
Power Your AI with Live Data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Python tool for converting files and office documents to Markdown.
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc).
Related contents:
Convert Anything into Structured Actionable Data.
Ingest, parse, and optimize any data format from documents to multimedia
for enhanced compatibility with GenAI frameworks.
OmniParse is a platform that ingests and parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Whether you are working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured, and ready for AI applications such as RAG, fine-tuning, and more
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
Onyx (Formerly Danswer) is the AI Assistant connected to your company's docs, apps, and people. Onyx provides a Chat interface and plugs into any LLM of your choice. Onyx can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Onyx is dual Licensed with most of it under MIT license and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring AI Assistants.
Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.
A curated collection of awesome LLM apps built with RAG and AI agents. This repository features LLM apps that use models from OpenAI, Anthropic, Google, and even open-source models like LLaMA that you can run locally on your computer.
The Typescript AI framework.
Mastra is an opinionated Typescript framework that helps you build AI applications and features quickly. It gives you the set of primitives you need: workflows, agents, RAG, integrations, syncs and evals. You can run Mastra on your local machine, or deploy to a serverless cloud.
Related contents:
AI Assistant. Knowledge Graph based RAG built with TiDB Serverless Vector Storage and LlamaIndex.
An open source GraphRAG (Knowledge Graph) built on top of TiDB Vector and LlamaIndex and DSPy.
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage.
Extract structured data from PDFs.
Stop wasting time extracting PDFs.
Transform your PDF documents into structured data with Documind. Simple, powerful and open-source.
Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.