With the following crawler configuration:
|
|
An attacker in control of the contents of https://example.com
could place a malicious HTML file in there with links like https://example.completely.different/my_file.html and the crawler would proceed to download that file as well even though prevent_outside=True
.
Resolved in https://github.com/langchain-ai/langchain/pull/15559
The web server receives a URL or similar request from an upstream component and retrieves the contents of this URL, but it does not sufficiently ensure that the request is being sent to the expected destination.