Research on conversational visual modeling demonstrates the capability of large language models (LLMs) like GPT-4 to convert natural language descriptions into professional-grade diagrams using PlantUML or Graphviz syntax, offering a seamless transition from idea to visual representation. This method allows stakeholders to co-create and refine system diagrams through conversation, bypassing the complexities of modeling software syntax. The prototype integrates a multimodal framework that supports real-time rendering and feedback, enhancing the design process with instant visualization and iterative refinement. It utilizes a unified, LLM-agnostic architecture that accommodates various models and adapts to specific organizational needs. The research highlights the superiority of GPT-4 in handling complex relationships over models like Llama-2, though it underscores the necessity of automated validation and human review to mitigate potential errors. This approach lays the groundwork for developing collaborative design tools that can extend to more specialized domains, emphasizing the potential for a broader multimodal integration in visual modeling.