LangChain is a framework for building LLM-powered applications. Prior to 1.1.14, the RecursiveUrlLoader class in @langchain/community is a web crawler that recursively follows links from a starting URL. Its preventOutside option (enabled by default) is intended to restrict crawling to the same site as the base URL. The implementation used String.startsWith() to compare URLs, which does not perform semantic URL validation. An attacker who controls content on a crawled page could include links to domains that share a string prefix with the target, causing the crawler to follow links to attacker-controlled or internal infrastructure. Additionally, the crawler performed no validation against private or reserved IP addresses. A crawled page could include links targeting cloud metadata services, localhost, or RFC 1918 addresses, and the crawler would fetch them without restriction. This vulnerability is fixed in 1.1.14.
The web server receives a URL or similar request from an upstream component and retrieves the contents of this URL, but it does not sufficiently ensure that the request is being sent to the expected destination.
| Name | Vendor | Start Version | End Version |
|---|---|---|---|
| Langchain_community | Langchain | * | 1.1.14 (excluding) |