MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young. If you encounter any issues or if the results are not as expected, please submit an issue on issue and attach the relevant PDF.
Seamless multi-master sync, that scales from Big Data to Mobile, with an Intuitive HTTP/JSON API and designed for Reliability.
CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. CouchDB works well with modern web and mobile apps. You can distribute your data, efficiently using CouchDB’s incremental replication. CouchDB supports master-master setups with automatic conflict detection.
TypeSchema is a JSON specification to describe data models.
TypeSchema is a JSON format to describe data models in a language neutral format. A TypeSchema specification can be easily transformed into code for almost any programming language. This helps to reuse core data models in different environments.
a swiss-army tool for scraping and extracting data from online assets, made for hackers.
Pipet is a command line based web scraper. It supports 3 modes of operation - HTML parsing, JSON parsing, and client-side JavaScript evaluation. It relies heavily on existing tools like curl, and it uses unix pipes for extending its built-in capabilities.
GeoJSON is a format for encoding a variety of geographic data structures.
GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon. Geometric objects with additional properties are Feature objects. Sets of features are contained by FeatureCollection objects.
A specification for building APIs in JSON.
Documentation for the application/vnd.api+json media type, a specification for APIs that use JSON.
If you’ve ever argued with your team about the way your JSON responses should be formatted, JSON:API can help you stop the bikeshedding and focus on what matters: your application.
By following shared conventions, you can increase productivity, take advantage of generalized tooling and best practices. Clients built around JSON:API are able to take advantage of its features around efficiently caching responses, sometimes eliminating network requests entirely.
the open-source database for the realtime web.
RethinkDB pushes JSON to your apps in realtime.
RethinkDB is the first open-source scalable database built for realtime applications. It exposes a new database access model, in which the developer can tell the database to continuously push updated query results to applications without polling for changes. RethinkDB allows developers to build scalable realtime apps in a fraction of the time with less effort.
While JSON is probably the most popular format for exchanging data, JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.
JSON for Humans.
JSON5 is an extension to the popular JSON file format that aims to be easier to write and maintain by hand (e.g. for config files). It is not intended to be used for machine-to-machine communication. (Keep using JSON or other file formats for that. 🙂)
Become a leader in email innovation. JMAP is the developer-friendly, open API standard for modern mail clients and applications to manage email faster.
It’s official! JMAP has been published by the Internet Engineering Task Force (IETF).
The ldap2json script allows you to extract the whole LDAP content of a Windows domain into a JSON file.
Content Management for your Codebase.
A new tool that makes Markdown, JSON and YAML content in your codebase editable by humans. Live edit content on GitHub or your local file system, without disrupting your existing code and workflows.
Valinor takes care of the construction and validation of raw inputs (JSON, plain arrays, etc.) into objects, ensuring a perfectly valid state. It allows the objects to be used without having to worry about their integrity during the whole application lifecycle.
The validation system will detect any incorrect value and help the developers by providing precise and human-readable error messages. The mapper can handle native PHP types as well as other advanced types supported by PHPStan and Psalm like shaped arrays, generics, integer ranges and more.
One toolchain for your web project. Format, lint, and more in a fraction of a second.
Biome is a performant toolchain for web projects, it aims to provide developer tools to maintain the health of said projects.
Biome is a fast formatter for JavaScript, TypeScript, JSX, and JSON that scores 96% compatibility with Prettier.
Biome is a performant linter for JavaScript, TypeScript, and JSX that features more than 170 rules from ESLint, TypeSCript ESLint, and other sources. It outputs detailed and contextualized diagnostics that help you to improve your code and become a better programmer!
Protocol Buffers are language-neutral, platform-neutral extensible mechanisms for serializing structured data.
Your Data Pipeline, Simplified. GlareDB: An analytics DBMS for distributed data.
Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.
GlareDB is designed to query your data wherever it lives using SQL that you already know.
JSON for Linking Data.
Data is messy and disconnected. JSON-LD organizes and connects it, creating a better Web.