python
Open Source Intelligence Interface for Deep Web Scraping.
Darkdump is a OSINT interface for carrying out deep web investgations written in python in which it allows users to enter a search query in which darkdump provides the ability to scrape .onion sites relating to that query to try to extract emails, metadata, keywords, images, social media etc. Darkdump retrieves sites via Ahmia.fi and scrapes those .onion addresses when connected via the tor network.
Related contents:
Build Reliable Backends 10x Faster, Scale to Millions with 1 Click.
DBOS is a serverless platform for building highly reliable applications. What takes days to build on AWS takes minutes on DBOS.
Related contents:
🍦 Never use print() to debug again.
Do you ever use print() or log() to debug your code? Of course you do. IceCream, or ic for short, makes print debugging a little sweeter.
🛡️ Windows Hello™ style facial authentication for Linux.
Howdy provides Windows Hello™ style authentication for Linux. Use your built-in IR emitters and camera in combination with facial recognition to prove who you are.
Using the central authentication system (PAM), this works everywhere you would otherwise need your password: Login, lock screen, sudo, su, etc.
High-Performance Klong array language in Python.
KlongPy is a Python adaptation of the Klong array language, known for its high-performance vectorized operations that leverage the power of NumPy. Embracing a "batteries included" philosophy, KlongPy combines built-in modules with Python's expansive ecosystem, facilitating rapid application development with Klong's succinct syntax.
The web framework for perfectionists with deadlines.
Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design.
Related contents:
Open-source Headless Browser API.
🚧 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you build automate the web without worrying about infrastructure.
Simple, unified interface to multiple Generative AI providers.
aisuite makes it easy for developers to use multiple LLM through a standardized interface. Using an interface similar to OpenAI's, aisuite makes it easy to interact with the most popular LLMs and compare the results. It is a thin wrapper around python client libraries, and allows creators to seamlessly swap out and test responses from different LLM providers without changing their code. Today, the library is primarily focussed on chat completions. We will expand it cover more use cases in near future.
Pyxel is a retro game engine for Python.
With simple specifications inspired by retro gaming consoles, such as displaying only 16 colors and supporting 4 sound channels, you can easily enjoy making pixel-art-style games.
Connexion is a modern Python web framework that makes spec-first and api-first development easy. You describe your API in an OpenAPI (or swagger) specification with as much detail as you want and Connexion will guarantee that it works as you specified.
DataFrames for the new era.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust using Apache Arrow Columnar Format as the memory model.
Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.
Lightweight modern Python library to add security headers (CSP, HSTS, etc.) to Django, Flask, FastAPI, and more. Secure defaults or fully customizable.
Seed-Farmer is an orchestration tool that works with AWS CodeSeeder and acts as an orchestration tool modeled after GitOps deployments. It has a CommandLine Interface based in Python, leverages modular code deployments defined by declarative manifests, and includes change detection and deployment optimization.
Better Project Templates.
Cookiecutter creates projects from cookiecutters (project templates), e.g. Python package projects from Python package templates.
Query your python lists.
Leopards is a way to query list of dictionaries or objects as if you are filtering in DBMS. You can get dicts/objects that are matched by OR, AND or NOT or all of them. As you can see in the comparison they are much faster than Pandas.
A command line utility to display dependency tree of the installed Python packages.
pipdeptree is a command line utility for displaying the installed python packages in form of a dependency tree. It works for packages installed globally on a machine as well as in a virtualenv. Since pip freeze shows all dependencies as a flat list, finding out which are the top level packages and which packages do they depend on requires some effort. It's also tedious to resolve conflicting dependencies that could have been installed because older version of pip didn't have true dependency resolution1. pipdeptree can help here by identifying conflicting dependencies installed in the environment.
Cloud Development Framework.
The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define cloud infrastructure in code and provision it through AWS CloudFormation.
It offers a high-level object-oriented abstraction to define AWS resources imperatively using the power of modern programming languages. Using the CDK’s library of infrastructure constructs, you can easily encapsulate AWS best practices in your infrastructure definition and share it without worrying about boilerplate logic.
the AI-native open-source embedding database. The fastest way to build Python or JavaScript LLM apps with memory! Chroma is the open-source AI application database. Batteries included.
Embeddings, vector search, document storage, full-text search, metadata filtering, and multi-modal. All in one place. Retrieval that just works. As it should be.
Related contents:
OSINT automation for hackers. A recursive internet scanner for hackers.
BEE·bot is a multipurpose scanner inspired by Spiderfoot, built to automate your Recon, Bug Bounties, and ASM!
Process Automation Solutions. Build Durable Workflows with Just a Few Lines of Code.
Developer first, open source, serverless workflow automation platform where you code the business logic and autokitteh takes care of the rest: API integration, scalability, reliability, durability, easy deployment, and monitoring.
ElectricEye is a multi-cloud, multi-SaaS Python CLI tool for Asset Management, Security Posture Management & Attack Surface Monitoring supporting 100s of services and evaluations to harden your CSP & SaaS environments with controls mapped to over 20 industry, regulatory, and best practice controls frameworks
A project providing a Graphic Walker Pane for use with HoloViz Panel.
A simple way to explore your data through a Tableau-like interface directly in your Panel data applications.
panel-graphic-walker brings the power of Graphic Walker to your data science workflow, seamlessly integrating interactive data exploration into notebooks and Panel applications. Effortlessly create dynamic visualizations, analyze datasets, and build dashboards—all within a Pythonic, intuitive interface.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python.
Dealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.
Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
Conversational Data Analysis.
PandasAI is a Python platform that makes it easy to ask questions to your data in natural language. It helps non-technical users to interact with their data in a more natural way, and it helps technical users to save time, and effort when working with data.
PandasAI is a Python library that integrates generative artificial intelligence capabilities into pandas, making dataframes conversational. Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library. The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts
AcSecurity is a Python module designed to scan applications for common security vulnerabilities. It checks for hardcoded secrets, dependency vulnerabilities, and code quality issues.
A small Python library created to help developers protect their applications from Server Side Request Forgery (SSRF) attacks. It implements an asynchronous GET method called safehttpx.get(), which is a wrapper around httpx.AsyncClient.get() while performing DNS validation on the supplied URL using Google DNS.
Flask-Vault is a robust library that empowers Flask applications to securely store and manage sensitive credentials. It provides a set of CLI commands for storing secrets using AES-GCM symmetric encryption, ensuring that vital information like API keys and database credentials remain protected.
Flask-Vault provides several cli commands and Python functions to store secrets that you do not want to keep in the clear, using symmetric encryption with AES-GCM. These commands and functions allow you to safely read/write very important credentials such as API keys, database credentials, etc.
Security tool against dependency typosquatting attacks.
Twyn is a security tool that compares the name of your dependencies against a set of the most popular ones, in order to determine if there is any similarity between them, preventing you from using a potentially illegitimate one. In short, Twyn protects you against typosquatting attacks.
Build your Python web crawlers using Crawlee. It helps you build reliable Python web crawlers. Fast.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Docling parses documents and exports them to the desired format with ease and speed. 🗂️ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON.
Related contents:
Tiny status page generated by a Python script.
TinyStatus is a simple, customizable status page generator that allows you to monitor the status of various services and display them on a clean, responsive web page.
Open-Source Web Automation library with any LLM.
Let LLMs interact with websites through a simple interface.
Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio.
AI Data Management at Scale - Curate, Enrich, and Version Datasets.
DataChain is a modern Pythonic data-frame library designed for artificial intelligence. It is made to organize your unstructured data into datasets and wrangle it at scale on your local machine. Datachain does not abstract or hide the AI models and API calls, but helps to integrate them into the postmodern data stack.
Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them.
SoFE performs two primary functions: it monitors non-filler episodes in Sonarr and generates Plex collections.
SoFE (Sonarr Anime Filler Excluder) is a Python application that configures Sonarr to monitor only non-filler anime episodes sourced from Anime Filler List. It also creates separate Plex collections for non-filler and filler episodes, depending on the download status.
Open-source framework for building asynchronous web services that interact with event streams.
FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
Zero shot pdf OCR with gpt-4o-mini.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!
Effortless Python web applications with the power of reactive programming.
Shiny for Python is the best way to build fast, beautiful web applications in Python. You can build quickly with Shiny and create simple interactive visualizations and prototype applications in an afternoon. But unlike other frameworks targeted at data scientists, Shiny does not limit your app's growth. Shiny remains extensible enough to power large, mission-critical applications.
Build Python Data & AI web applications. Turns Data and AI algorithms into production-ready web applications in no time.
Taipy is designed for data scientists and machine learning engineers to build data & AI web applications.
From simple pilots to production-ready web applications in no time. No more compromise on performance, customization, and scalability.
Structured text generation and robust prompting for language models.
Outlines is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.
Related contents:
Penelope Shell Handler.
Penelope is a shell handler designed to be easy to use and intended to replace netcat when exploiting RCE vulnerabilities. It is compatible with Linux and macOS and requires Python 3.6 or higher. It is a standalone script that does not require any installation or external dependencies, and it is intended to remain this way.
The Data Processor for Agents.
Marly allows your agents to extract tables & text from your PDFs, Powerpoints, etc in a structured format making it easy for them to take subsequent actions (database call, API call, creating a chart etc).
On-device AI across mobile, embedded and edge for PyTorch
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
🧪 Correlate Semgrep scans with Python test coverage to prioritize SAST findings and get bug fix suggestions via a self-hosted LLM.
vulncov correlates Semgrep scans with Python test code coverage to identify which vulnerable code has been executed by unit tests, helping prioritize SAST findings and reduce false positives. It also leverages a self-hosted LLM to suggest bug fixes!
Ce site propose des exercices d'apprentissage de l'algorithmique et de la programmation par le biais d'exercices variés. Le langage utilisé est Python.
Les exercices proposés ont été écrits, testés, corrigés et améliorés par des professeurs d'informatique du secondaire et du supérieur.
Aucune installation, aucune inscription ne sont nécessaires : tous les programmes sont exécutés sur votre machine, tablette ou téléphone.
Multi-vendor library to simplify Paramiko SSH connections to network devices.
Network automation to screen-scraping devices is primarily concerned with gathering output from show commands and with making configuration changes.
Netmiko aims to accomplish both of these operations and to do it across a very broad set of platforms. It seeks to do this while abstracting away low-level state control (i.e. eliminate low-level regex pattern matching to the extent practical).
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Meta Lingua is a minimal and fast LLM training and inference library designed for research. Meta Lingua uses easy-to-modify PyTorch components in order to try new architectures, losses, data, etc. We aim for this code to enable end to end training, inference and evaluation as well as provide tools to better understand speed and stability. While Meta Lingua is currently under development, we provide you with multiple apps to showcase how to use this codebase.
All Algorithms implemented in Python.
Implementations are for learning purposes only. They may be less efficient than the implementations in the Python standard library. Use them at your discretion.
YouTube, Apple Podcast (and more) to readable Markdown.
yt2doc transcribes videos & audios online into readable Markdown documents.
Open-Source ML Monitoring and LLM Observability.
Open-source evaluation and observability for ML and LLM systems Evaluate, test, and monitor AI-powered systems. From tabular data to LLMs. Built for data scientists, AI, and ML engineers.
Detect and extract tables to markdown and csv.
Tabled is a small library for detecting and extracting tables. It uses surya to find all the tables in a PDF, identifies the rows/columns, and formats cells into markdown, csv, or html.
The Open-Source Tool Democratizing Multi-Cloud Security Testing by Arpan Sarkar.
Multi-Cloud Security Testing Tool to execute a comprehensive array of attack techniques across multiple surfaces via a simple web interface.
Halberd is a powerful, multi-cloud security testing tool. Born out of the need for a unified, easy-to-use tool, Halberd enables you to proactively assess your cloud defenses by executing a comprehensive array of attack techniques across Entra ID, M365, Azure, and AWS. With its intuitive web interface, you can simulate real-world attacks, generate valuable telemetry, and validate your security controls with ease & speed.
Gato, or GitHub Attack Toolkit, is an enumeration and attack tool that allows both blue teamers and offensive security practitioners to identify and exploit pipeline vulnerabilities within a GitHub organization's public and private repositories.
OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applications. Whether working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured and ready for AI applications, such as RAG , fine-tuning and more.
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Forget expensive NVIDIA GPUs, unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, Linux, pretty much any device!
Related contents:
🐍 A toolkit for testing, tweaking and cracking JSON Web Tokens
Meet the NiceGUI. And let any browser be the frontend of your Python code. Create web-based user interfaces with Python. The nice way.
NiceGUI is an easy-to-use, Python-based UI framework, which shows up in your web browser. You can create buttons, dialogs, Markdown, 3D scenes, plots and much more.
PyTorch implementation of PerCo (Towards Image Compression with Perfect Realism at Ultra-Low Bitrates, ICLR 2024)