The exploration of combining LLMs with devtools for fixing test failures in pull requests highlights the potential of leveraging AI in development workflows. The experiment tested various LLMs, including gpt-o1-preview and others, to see if they could identify and rectify issues in PRs based on test failures and logs. The gpt-o1-preview model showed some success, particularly when given additional context about the failure's cause. The use of Replay recordings for program analysis played a crucial role in providing detailed insights into the application's behavior, allowing a comparison of DOM structures between passing and failing tests. This insight helped identify missing attributes that caused the test failures, demonstrating that the integration of LLMs with advanced analysis tools can enhance problem-solving. However, even with detailed analysis, the LLMs need to synthesize this information to create effective patches, indicating that the combination of AI and devtools is more potent than either alone. The initiative aims to further refine this approach by applying it to real-world test failures and seeks collaboration with developers facing similar challenges.