How to build function calling and JSON mode for open-source and fine-tuned LLMs

Post Details

Company

Baseten

Date Published

Sept. 12, 2024

Author

Bryce Dubayah, Philip Kiely

Word Count

1,339

Language

English

Hacker News Points

1

Source URL

www.baseten.co/blog/how-to-build-function-calling-and-json-mode-for-open-source-and-fine-tuned-llms

Summary

NVIDIA has announced support for function calling and structured output for LLMs deployed with its TensorRT-LLM Engine Builder, adding model server level support for two key features. Function calling allows users to pass a set of defined tools to an LLM as part of the request body, while structured output enforces an output schema defined as part of the LLM input. These features are built into NVIDIA's customized version of Triton inference server and use logit biasing to ensure valid tokens are generated during LLM inference. The implementation has minimal latency impact after the first call with a given schema is completed, allowing for efficient use of these new features.