The problem of semantically searching codebases is more complex than semantically searching books due to the differences between natural language and code. While indexing a corpus by splitting it into units, generating semantic vector embeddings for each unit, and comparing these vectors to find similar pieces of text works well for book search, it does not work as well for codebase search. The main issue is that code and natural language are not semantically similar, making it difficult to capture the meaning of code using vector embeddings. Even with simple queries, the results were not satisfactory, and the similarity between the query and the description was higher than the similarity between the query and the actual code. Chunking the codebase into smaller units, such as per-function level, rather than per-file level, can improve the retrieval quality, but adding noise to these chunks significantly reduces their semantic similarity with the query.