Vectara's web crawler sample application is designed to handle the most challenging scenarios for data ingestion, without relying on upstream semi-structured data or rendered tags. The crawler offers four modes of link discovery: single URL, sitemap, RSS feed, and recursive crawl. Each mode has its strengths and limitations, and the recursive mode requires careful consideration due to potential issues with rendering timeouts, uniqueness of links, memory usage, and discovering hidden content. Once a link is found, the crawler renders it using either Chrome or Qt WebKit, depending on the `–pdf-driver` parameter, which can impact accuracy and security. Finally, the rendered PDFs are submitted to Vectara's file upload API for processing, ensuring good search results.