AntiSquat leverages AI techniques such as natural language processing (NLP), large language models (ChatGPT) and more to empower detection of typosquatting and phishing domains.
Related contents:
OpenVoiceOS is an open-source platform for smart speakers and other voice-centric devices.
OpenVoiceOS is a community-driven, open-source voice AI platform for creating custom voice-controlled interfaces across devices with NLP, a customizable UI, and a focus on privacy and security.
Multi-modal modular data ingestion and retrieval.
DataBridge is an open source library for natural language search and management of multi-modal data. Get started by installing databridge now!
DataBridge is a powerful document processing and retrieval system designed for building intelligent document-based applications. It provides a robust foundation for semantic search, document processing, and AI-powered document interactions.
Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over relational data. It allows you to set up an API from your database that can answer questions in plain English.
Using Sequences of Life-events to Predict Human Lives.
We represent human lives in a way that shares structural similarity to language, and we exploit this similarity to adapt natural language processing techniques to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on a comprehensive registry dataset, which is available for Denmark across several years, and that includes information about life-events related to health, education, occupation, income, address and working hours, recorded with day-to-day resolution.
Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.
UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)".
General Architecture for Text Engineering.
GATE is an open source software toolkit capable of solving almost any text processing problem.
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Moses, the machine translation system.
Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices.
text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).
library and tools for information extraction.
This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.
PaddleNLP is an easy-to-use and powerful natural language processing development library. Aggregates high-quality pre-trained models in the industry and provides an out -of-the-box development experience. The model library covering multiple scenarios of NLP and industrial practice examples can meet the needs of developers for flexible customization .