Company
Date Published
Author
-
Word count
400
Language
-
Hacker News points
None

Summary

Performance tests were conducted on the NVIDIA DGX Spark using the latest firmware and Ollama version 0.12.6 to evaluate its capabilities with various models, including OpenAI's gpt-oss and other models like gemma3, llama3.1, and deepseek-r1. The tests involved generating a summary of "A Tale of Two Cities" with a constraint of 500 tokens, caching disabled, and temperatures set to zero, with each test repeated ten times. The results showed variations in token processing speeds depending on model size and quantization levels, with the gpt-oss models provided by OpenAI being tested through Ollama, which retains intended BF16 attention layers. The DGX Spark firmware can be updated via the DGX Dashboard or CLI, requiring Ubuntu distribution upgrades, and Ollama can be installed for running models. Additionally, OpenAI's Codex can be installed and used alongside Ollama for seamless integration, with the DGX Spark supporting large models like gpt-oss-120b, thanks to its substantial VRAM capacity.