Using Sequences of Life-events to Predict Human Lives.
We represent human lives in a way that shares structural similarity to language, and we exploit this similarity to adapt natural language processing techniques to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on a comprehensive registry dataset, which is available for Denmark across several years, and that includes information about life-events related to health, education, occupation, income, address and working hours, recorded with day-to-day resolution.
Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.
UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)".
General Architecture for Text Engineering.
GATE is an open source software toolkit capable of solving almost any text processing problem.
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Moses, the machine translation system.
Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices.
text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).
library and tools for information extraction.
This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.
PaddleNLP is an easy-to-use and powerful natural language processing development library. Aggregates high-quality pre-trained models in the industry and provides an out -of-the-box development experience. The model library covering multiple scenarios of NLP and industrial practice examples can meet the needs of developers for flexible customization .