LangChain through 0.1.10 allows ../ directory traversal by an actor who is able to control the final part of the path parameter in a load_chain call. This bypasses the intended behavior of loading configurations only from the hwchase17/langchain-hub GitHub repository. The outcome can be disclosure of an API key for a large language model online service, or remote code execution. (A patch is available as of release 0.1.29 of langchain-core.)
Weakness
The product uses external input to construct a pathname that should be within a restricted directory, but it does not properly neutralize ‘dir....\filename’ (multiple internal backslash dot dot) sequences that can resolve to a location that is outside of that directory.
Extended Description
This allows attackers to traverse the file system to access files or directories that are outside of the restricted directory.
The ‘dir....\filename’ manipulation is useful for bypassing some path traversal protection schemes. Sometimes a program only removes one “.." sequence, so multiple “.." can bypass that check. Alternately, this manipulation could be used to bypass a check for “.." at the beginning of the pathname, moving up more than one directory level.
Potential Mitigations
- Assume all input is malicious. Use an “accept known good” input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does.
- When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, “boat” may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as “red” or “blue.”
- Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code’s environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
- When validating filenames, use stringent allowlists that limit the character set to be used. If feasible, only allow a single “.” character in the filename to avoid weaknesses such as CWE-23, and exclude directory separators such as “/” to avoid CWE-36. Use a list of allowable file extensions, which will help to avoid CWE-434.
- Do not rely exclusively on a filtering mechanism that removes potentially dangerous characters. This is equivalent to a denylist, which may be incomplete (CWE-184). For example, filtering “/” is insufficient protection if the filesystem also supports the use of “" as a directory separator. Another possible error could occur when the filtering is applied in a way that still produces dangerous data (CWE-182). For example, if “../” sequences are removed from the “…/…//” string in a sequential fashion, two instances of “../” would be removed from the original string, but the remaining characters would still form the “../” string.
References