Interest in fine-tuning large language models (LLMs) has surged due to advancements in open-source models like LLaMA-2 and the introduction of fine-tuning capabilities for newer models by OpenAI. Fine-tuning allows models to perform specialized tasks more effectively than generalist models, offering benefits such as cost savings, privacy, and improved performance in specific applications. The guide explores fine-tuning using LangSmith for dataset management and evaluation, highlighting its role in managing data collection, cleaning, and model evaluation workflows. It demonstrates how LLaMA2-7b-chat and GPT-3.5-turbo were fine-tuned for extracting knowledge graph triples, illustrating the process and challenges such as memory constraints and parameter-efficient fine-tuning methods like LoRA and qLoRA. Evaluation of the models using LangSmith shows that fine-tuned models can outperform larger generalist models in specific tasks, but few-shot prompting with models like GPT-4 can still yield superior results without extensive fine-tuning. The guide emphasizes considering few-shot prompting and retrieval-augmented generation (RAG) before committing to more resource-intensive fine-tuning processes and highlights the importance of task definition and dataset quality in achieving optimal fine-tuning outcomes.