python
Better Project Templates.
Cookiecutter creates projects from cookiecutters (project templates), e.g. Python package projects from Python package templates.
Query your python lists.
Leopards is a way to query list of dictionaries or objects as if you are filtering in DBMS. You can get dicts/objects that are matched by OR, AND or NOT or all of them. As you can see in the comparison they are much faster than Pandas.
A command line utility to display dependency tree of the installed Python packages.
pipdeptree is a command line utility for displaying the installed python packages in form of a dependency tree. It works for packages installed globally on a machine as well as in a virtualenv. Since pip freeze shows all dependencies as a flat list, finding out which are the top level packages and which packages do they depend on requires some effort. It's also tedious to resolve conflicting dependencies that could have been installed because older version of pip didn't have true dependency resolution1. pipdeptree can help here by identifying conflicting dependencies installed in the environment.
Cloud Development Framework.
The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define cloud infrastructure in code and provision it through AWS CloudFormation.
It offers a high-level object-oriented abstraction to define AWS resources imperatively using the power of modern programming languages. Using the CDK’s library of infrastructure constructs, you can easily encapsulate AWS best practices in your infrastructure definition and share it without worrying about boilerplate logic.
the AI-native open-source embedding database. The fastest way to build Python or JavaScript LLM apps with memory! Chroma is the open-source AI application database. Batteries included.
Embeddings, vector search, document storage, full-text search, metadata filtering, and multi-modal. All in one place. Retrieval that just works. As it should be.
Related contents:
OSINT automation for hackers. A recursive internet scanner for hackers.
BEE·bot is a multipurpose scanner inspired by Spiderfoot, built to automate your Recon, Bug Bounties, and ASM!
Process Automation Solutions. Build Durable Workflows with Just a Few Lines of Code.
Developer first, open source, serverless workflow automation platform where you code the business logic and autokitteh takes care of the rest: API integration, scalability, reliability, durability, easy deployment, and monitoring.
ElectricEye is a multi-cloud, multi-SaaS Python CLI tool for Asset Management, Security Posture Management & Attack Surface Monitoring supporting 100s of services and evaluations to harden your CSP & SaaS environments with controls mapped to over 20 industry, regulatory, and best practice controls frameworks
A project providing a Graphic Walker Pane for use with HoloViz Panel.
A simple way to explore your data through a Tableau-like interface directly in your Panel data applications.
panel-graphic-walker brings the power of Graphic Walker to your data science workflow, seamlessly integrating interactive data exploration into notebooks and Panel applications. Effortlessly create dynamic visualizations, analyze datasets, and build dashboards—all within a Pythonic, intuitive interface.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python.
Dealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.
Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
Conversational Data Analysis.
PandasAI is a Python platform that makes it easy to ask questions to your data in natural language. It helps non-technical users to interact with their data in a more natural way, and it helps technical users to save time, and effort when working with data.
PandasAI is a Python library that integrates generative artificial intelligence capabilities into pandas, making dataframes conversational. Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library. The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts
AcSecurity is a Python module designed to scan applications for common security vulnerabilities. It checks for hardcoded secrets, dependency vulnerabilities, and code quality issues.
A small Python library created to help developers protect their applications from Server Side Request Forgery (SSRF) attacks. It implements an asynchronous GET method called safehttpx.get(), which is a wrapper around httpx.AsyncClient.get() while performing DNS validation on the supplied URL using Google DNS.
Flask-Vault is a robust library that empowers Flask applications to securely store and manage sensitive credentials. It provides a set of CLI commands for storing secrets using AES-GCM symmetric encryption, ensuring that vital information like API keys and database credentials remain protected.
Flask-Vault provides several cli commands and Python functions to store secrets that you do not want to keep in the clear, using symmetric encryption with AES-GCM. These commands and functions allow you to safely read/write very important credentials such as API keys, database credentials, etc.
Security tool against dependency typosquatting attacks.
Twyn is a security tool that compares the name of your dependencies against a set of the most popular ones, in order to determine if there is any similarity between them, preventing you from using a potentially illegitimate one. In short, Twyn protects you against typosquatting attacks.
Build your Python web crawlers using Crawlee. It helps you build reliable Python web crawlers. Fast.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Docling parses documents and exports them to the desired format with ease and speed. 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.
Related contents:
Tiny status page generated by a Python script.
TinyStatus is a simple, customizable status page generator that allows you to monitor the status of various services and display them on a clean, responsive web page.
Open-Source Web Automation library with any LLM.
Let LLMs interact with websites through a simple interface.
Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio.
AI Data Management at Scale - Curate, Enrich, and Version Datasets.
DataChain is a modern Pythonic data-frame library designed for artificial intelligence. It is made to organize your unstructured data into datasets and wrangle it at scale on your local machine. Datachain does not abstract or hide the AI models and API calls, but helps to integrate them into the postmodern data stack.
Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them.
SoFE performs two primary functions: it monitors non-filler episodes in Sonarr and generates Plex collections.
SoFE (Sonarr Anime Filler Excluder) is a Python application that configures Sonarr to monitor only non-filler anime episodes sourced from Anime Filler List. It also creates separate Plex collections for non-filler and filler episodes, depending on the download status.
Open-source framework for building asynchronous web services that interact with event streams.
FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
Zero shot pdf OCR with gpt-4o-mini.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!
Effortless Python web applications with the power of reactive programming.
Shiny for Python is the best way to build fast, beautiful web applications in Python. You can build quickly with Shiny and create simple interactive visualizations and prototype applications in an afternoon. But unlike other frameworks targeted at data scientists, Shiny does not limit your app's growth. Shiny remains extensible enough to power large, mission-critical applications.
Build Python Data & AI web applications. Turns Data and AI algorithms into production-ready web applications in no time.
Taipy is designed for data scientists and machine learning engineers to build data & AI web applications.
From simple pilots to production-ready web applications in no time. No more compromise on performance, customization, and scalability.
Structured text generation and robust prompting for language models.
Outlines is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.
Related contents:
Penelope Shell Handler.
Penelope is a shell handler designed to be easy to use and intended to replace netcat when exploiting RCE vulnerabilities. It is compatible with Linux and macOS and requires Python 3.6 or higher. It is a standalone script that does not require any installation or external dependencies, and it is intended to remain this way.
The Data Processor for Agents.
Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).
On-device AI across mobile, embedded and edge for PyTorch
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
🧪 Correlate Semgrep scans with Python test coverage to prioritize SAST findings and get bug fix suggestions via a self-hosted LLM.
vulncov correlates Semgrep scans with Python test code coverage to identify which vulnerable code has been executed by unit tests, helping prioritize SAST findings and reduce false positives. It also leverages a self-hosted LLM to suggest bug fixes!
Ce site propose des exercices d'apprentissage de l'algorithmique et de la programmation par le biais d'exercices variés. Le langage utilisé est Python.
Les exercices proposés ont été écrits, testés, corrigés et améliorés par des professeurs d'informatique du secondaire et du supérieur.
Aucune installation, aucune inscription ne sont nécessaires : tous les programmes sont exécutés sur votre machine, tablette ou téléphone.
Multi-vendor library to simplify Paramiko SSH connections to network devices.
Network automation to screen-scraping devices is primarily concerned with gathering output from show commands and with making configuration changes.
Netmiko aims to accomplish both of these operations and to do it across a very broad set of platforms. It seeks to do this while abstracting away low-level state control (i.e. eliminate low-level regex pattern matching to the extent practical).
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Meta Lingua is a minimal and fast LLM training and inference library designed for research. Meta Lingua uses easy-to-modify PyTorch components in order to try new architectures, losses, data, etc. We aim for this code to enable end to end training, inference and evaluation as well as provide tools to better understand speed and stability. While Meta Lingua is currently under development, we provide you with multiple apps to showcase how to use this codebase.
All Algorithms implemented in Python.
Implementations are for learning purposes only. They may be less efficient than the implementations in the Python standard library. Use them at your discretion.
YouTube, Apple Podcast (and more) to readable Markdown.
yt2doc transcribes videos & audios online into readable Markdown documents.
Open-Source ML Monitoring and LLM Observability.
Open-source evaluation and observability for ML and LLM systems Evaluate, test, and monitor AI-powered systems. From tabular data to LLMs. Built for data scientists, AI, and ML engineers.
Detect and extract tables to markdown and csv.
Tabled is a small library for detecting and extracting tables. It uses surya to find all the tables in a PDF, identifies the rows/columns, and formats cells into markdown, csv, or html.
The Open-Source Tool Democratizing Multi-Cloud Security Testing by Arpan Sarkar.
Multi-Cloud Security Testing Tool to execute a comprehensive array of attack techniques across multiple surfaces via a simple web interface.
Halberd is a powerful, multi-cloud security testing tool. Born out of the need for a unified, easy-to-use tool, Halberd enables you to proactively assess your cloud defenses by executing a comprehensive array of attack techniques across Entra ID, M365, Azure, and AWS. With its intuitive web interface, you can simulate real-world attacks, generate valuable telemetry, and validate your security controls with ease & speed.
Gato, or GitHub Attack Toolkit, is an enumeration and attack tool that allows both blue teamers and offensive security practitioners to identify and exploit pipeline vulnerabilities within a GitHub organization's public and private repositories.
OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Whether working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured and ready for AI applications, such as RAG , fine-tuning and more.
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Forget expensive NVIDIA GPUs, unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, Linux, pretty much any device!
Related contents:
🐍 A toolkit for testing, tweaking and cracking JSON Web Tokens
Meet the NiceGUI. And let any browser be the frontend of your Python code. Create web-based user interfaces with Python. The nice way.
NiceGUI is an easy-to-use, Python-based UI framework, which shows up in your web browser. You can create buttons, dialogs, Markdown, 3D scenes, plots and much more.
PyTorch implementation of PerCo (Towards Image Compression with Perfect Realism at Ultra-Low Bitrates, ICLR 2024)
A Runtime Application Self Protection agent for Python applications and serverless functions. Relies on AI, syntax analysis, and underlying OS capabilities to seamlessly provides accurate protection from within, without updates.
PyRASP is a Runtime Application Self Protection package for Python-based Web Servers (Flask, FastAPI and Django) and Serverless Functions (AWS Lambda, Azure and Google Cloud Functions).
APM for Ruby, Elixir, Node.js & Python. No-brainer monitoring for smart developers. Application Monitoring for Ruby on Rails, Elixir, Node.js & Python.
Malware analysis tool. Cuckoo3 is a Python 3 open source automated malware analysis system.
Cuckoo3 is an open-source tool to test suspicious files or links in a controlled environment. It will test them in a sandboxed platform emulator(s) and generate a report, showing what the files or websites did during the test.
Data Framework for LLM Applications.
LlamaIndex (GPT Index) is a data framework for your LLM application. Building with LlamaIndex typically involves working with LlamaIndex core and a chosen set of integrations (or plugins). There are two ways to start building with LlamaIndex in Python:
Related contents:
The largest community building the future of LLM apps
LangChain’s flexible abstractions and AI-first toolkit make it the #1 choice for developers when building with GenAI. Join 1M+ builders standardizing their LLM app development in LangChain's Python and JavaScript frameworks.
Related contents:
- #307.src - Langchain: Faire de l'IA comme des Lego avec Maxime Thoonsen @ <ifttd>.
- Tour d'horizon des frameworks pour créer des applications basées sur les LLM @ Data-Crafting.io :fr:.
- #304.bin - Bilan 2024: Le début de la révolution avec Quentin Adam @ <ifttd>.
- Construire son RAG (Retrieval Augmented Generation) grâce à langchain: L’exemple de l’Helpdesk d’OCTO @ OCTO talks :fr:.
- CLI Chatbot with LangChain and OpenAI in Node.js @ rw;eruch.
- Meetup GenAI - Découverte de LangChain @ Flint's YouTube :fr:.
- Agents 2.0: From Shallow Loops to Deep Agents @ PHILSCHMID.
- Production RAG: what I learned from processing 5M+ documents @ Abdellatif Abdelfattah.
- Intégration de Google Drive avec langchain @ Octo talks! :fr:.
- Deux techniques pour ingérer des pages web pour le RAG : BeautifulSoup vs Docling @ lbke :fr:.
- Créer un RAG avec LangChain en 5 étapes @ lbke :fr:.
- RAG Against The Machine @ Quoi de neuf les devs ? :fr:.
- LangChain, LangGraph Flaws Expose Files, Secrets, Databases in Widely Used AI Frameworks @ The Hacker News.
Claude Engineer is an advanced interactive command-line interface (CLI) that harnesses the power of Anthropic's Claude 3 and Claude 3.5 models to assist with a wide range of software development tasks. This tool seamlessly combines the capabilities of state-of-the-art large language models with practical file system operations, web search functionality, intelligent code analysis, and execution capabilities.
Credentials gathering tool automating remote procdump and parse of lsass process.
Spraykatz is a tool without any pretention able to retrieve credentials on Windows machines and large Active Directory environments.
It simply tries to procdump machines and parse dumps remotely in order to avoid detections by antivirus softwares as much as possible.
SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.
SpiderFoot is an open source intelligence (OSINT) automation tool. It integrates with just about every data source available and utilises a range of methods for data analysis, making that data easy to navigate.
SpiderFoot has an embedded web-server for providing a clean and intuitive web-based interface but can also be used completely via the command-line. It's written in Python 3 and MIT-licensed.
ArcticDB is a DataFrame Database.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem. Built for the modern Python Data Science ecosystem, ArcticDB transforms your ability to handle complex real world data with Incredibly fast proven Petabyte scale.
PyScript is an open source platform for Python in the browser.
PyScript is a framework that allows users to create rich Python applications in the browser using HTML's interface and the power of Pyodide, MicroPython and WASM, and modern web technologies.
OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched.
Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.
A free, secure, well integrated, reusable authentication solution for the Django framework, covering all functionality related to local and social user accounts, multi-factor authentication, in various configurations, with flows that just work.
Distributed Task Queue
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.