NVIDIA® TensorRT™ is an ecosystem of APIs for high-performance deep learning inference. TensorRT includes an inference runtime and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes TensorRT, TensorRT-LLM, TensorRT Model Optimizer, and TensorRT Cloud.
Unused RAM is wasted RAM, so why not put some of that VRAM in your graphics card to work?
vramfs is a utility that uses the FUSE library to create a file system in VRAM. The idea is pretty much the same as a ramdisk, except that it uses the video RAM of a discrete graphics card to store files. It is not intented for serious use, but it does actually work fairly well, especially since consumer GPUs with 4GB or more VRAM are now available.
On the developer's system, the continuous read performance is ~2.4 GB/s and write performance 2.0 GB/s, which is about 1/3 of what is achievable with a ramdisk. That is already decent enough for a device not designed for large data transfers to the host, but future development should aim to get closer to the PCI-e bandwidth limits. See the benchmarks section for more info.