Scrapy-splash is a library which provides Scrapy and JavaScript integration. In affected versions users who use HttpAuthMiddleware
(i.e. the http_user
and http_pass
spider attributes) for Splash authentication will have any non-Splash request expose your credentials to the request target. This includes robots.txt
requests sent by Scrapy when the ROBOTSTXT_OBEY
setting is set to True
. Upgrade to scrapy-splash 0.8.0 and use the new SPLASH_USER
and SPLASH_PASS
settings instead to set your Splash authentication credentials safely. If you cannot upgrade, set your Splash request credentials on a per-request basis, using the splash_headers
request parameter, instead of defining them globally using the HttpAuthMiddleware
. Alternatively, make sure all your requests go through Splash. That includes disabling the robots.txt middleware.
The product exposes sensitive information to an actor that is not explicitly authorized to have access to that information.
Name | Vendor | Start Version | End Version |
---|---|---|---|
Scrapy-splash | Zyte | * | 0.8.0 (excluding) |
There are many different kinds of mistakes that introduce information exposures. The severity of the error can range widely, depending on the context in which the product operates, the type of sensitive information that is revealed, and the benefits it may provide to an attacker. Some kinds of sensitive information include:
Information might be sensitive to different parties, each of which may have their own expectations for whether the information should be protected. These parties include:
Information exposures can occur in different ways:
It is common practice to describe any loss of confidentiality as an “information exposure,” but this can lead to overuse of CWE-200 in CWE mapping. From the CWE perspective, loss of confidentiality is a technical impact that can arise from dozens of different weaknesses, such as insecure file permissions or out-of-bounds read. CWE-200 and its lower-level descendants are intended to cover the mistakes that occur in behaviors that explicitly manage, store, transfer, or cleanse sensitive information.