Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

How to build function calling and JSON mode for open-source and fine-tuned LLMs

Blog post from Baseten

Post Details
Company
Date Published
Author
Bryce Dubayah, Philip Kiely
Word Count
1,339
Language
English
Hacker News Points
1
Summary

NVIDIA has announced support for function calling and structured output for LLMs deployed with its TensorRT-LLM Engine Builder, adding model server level support for two key features. Function calling allows users to pass a set of defined tools to an LLM as part of the request body, while structured output enforces an output schema defined as part of the LLM input. These features are built into NVIDIA's customized version of Triton inference server and use logit biasing to ensure valid tokens are generated during LLM inference. The implementation has minimal latency impact after the first call with a given schema is completed, allowing for efficient use of these new features.