FireLLaVA is the first commercially permissive open-source multi-modality model, based on the LLaVA framework, which has been released under the Llama 2 Community License. It allows for processing and analyzing data from multiple sources, such as text and images, enabling a more comprehensive understanding of input data. FireLLaVA is derived from LLaVA, a Vision-Language Model (VLM) that combines the Vicuna language model and the OpenAI CLIP-Vit vision component. The model was developed by Fireworks.ai, which recreated the LLaVA model using only open-source models for data generation and training, addressing licensing restrictions associated with the original LLaVA model trained on GPT4 data. This new version performs comparably to the original model and even surpasses it in some benchmarks. FireLLaVA is available for use through Huggingface, a fast API, and a playground, and it enables the development of vision-capable applications with APIs that are compatible with OpenAI Vision models.