The OpenRefine fork of the MIT Simile Butterfly server is a modular web application framework. The Butterfly framework uses the java.net.URL class to refer to (what are expected to be) local resource files, like images or templates. This works: opening a connection to these URLs opens the local file. However, prior to version 1.2.6, if a file:/ URL is directly given where a relative path (resource name) is expected, this is also accepted in some code paths; the app then fetches the file, from a remote machine if indicated, and uses it as if it was a trusted part of the apps codebase. This leads to multiple weaknesses and potential weaknesses. An attacker that has network access to the application could use it to gain access to files, either on the the servers filesystem (path traversal) or shared by nearby machines (server-side request forgery with e.g. SMB). An attacker that can lead or redirect a user to a crafted URL belonging to the app could cause arbitrary attacker-controlled JavaScript to be loaded in the victims browser (cross-site scripting). If an app is written in such a way that an attacker can influence the resource name used for a template, that attacker could cause the app to fetch and execute an attacker-controlled template (remote code execution). Version 1.2.6 contains a patch.
Weakness
The product uses external input to construct a pathname that should be within a restricted directory, but it does not properly neutralize absolute path sequences such as “/abs/path” that can resolve to a location that is outside of that directory.
Affected Software
| Name |
Vendor |
Start Version |
End Version |
| Butterfly |
Openrefine |
* |
1.2.6 (including) |
| Openrefine-butterfly |
Ubuntu |
oracular |
* |
| Openrefine-butterfly |
Ubuntu |
upstream |
* |
Potential Mitigations
- Assume all input is malicious. Use an “accept known good” input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does.
- When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, “boat” may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as “red” or “blue.”
- Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code’s environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
- When validating filenames, use stringent allowlists that limit the character set to be used. If feasible, only allow a single “.” character in the filename to avoid weaknesses such as CWE-23, and exclude directory separators such as “/” to avoid CWE-36. Use a list of allowable file extensions, which will help to avoid CWE-434.
- Do not rely exclusively on a filtering mechanism that removes potentially dangerous characters. This is equivalent to a denylist, which may be incomplete (CWE-184). For example, filtering “/” is insufficient protection if the filesystem also supports the use of “" as a directory separator. Another possible error could occur when the filtering is applied in a way that still produces dangerous data (CWE-182). For example, if “../” sequences are removed from the “…/…//” string in a sequential fashion, two instances of “../” would be removed from the original string, but the remaining characters would still form the “../” string.
References