Deserialization of Untrusted Data, Improper Input Validation vulnerability in Apache UIMA Java SDK, Apache UIMA Java SDK, Apache UIMA Java SDK, Apache UIMA Java SDK.This issue affects Apache UIMA Java SDK: before 3.5.0.
Users are recommended to upgrade to version 3.5.0, which fixes the issue.
There are several locations in the code where serialized Java objects are deserialized without verifying the data. This affects in particular:
- the deserialization of a Java-serialized CAS, but also other binary CAS formats that include TSI information using the CasIOUtils class;
- the CAS Editor Eclipse plugin which uses the the CasIOUtils class to load data;
- the deserialization of a Java-serialized CAS of the Vinci Analysis Engine service which can receive using Java-serialized CAS objects over network connections;
- the CasAnnotationViewerApplet and the CasTreeViewerApplet;
- the checkpointing feature of the CPE module.
Note that the UIMA framework by default does not start any remotely accessible services (i.e. Vinci) that would be vulnerable to this issue. A user or developer would need to make an active choice to start such a service. However, users or developers may use the CasIOUtils in their own applications and services to parse serialized CAS data. They are affected by this issue unless they ensure that the data passed to CasIOUtils is not a serialized Java object.
When using Vinci or using CasIOUtils in own services/applications, the unrestricted deserialization of Java-serialized CAS files may allow arbitrary (remote) code execution.
As a remedy, it is possible to set up a global or context-specific ObjectInputFilter (cf. https://openjdk.org/jeps/290 and https://openjdk.org/jeps/415 ) if running UIMA on a Java version that supports it.
Note that Java 1.8 does not support the ObjectInputFilter, so there is no remedy when running on this out-of-support platform. An upgrade to a recent Java version is strongly recommended if you need to secure an UIMA version that is affected by this issue.
To mitigate the issue on a Java 9+ platform, you can configure a filter pattern through the jdk.serialFilter system property using a semicolon as a separator:
To allow deserializing Java-serialized binary CASes, add the classes:
- org.apache.uima.cas.impl.CASCompleteSerializer
- org.apache.uima.cas.impl.CASMgrSerializer
- org.apache.uima.cas.impl.CASSerializer
- java.lang.String
To allow deserializing CPE Checkpoint data, add the following classes (and any custom classes your application uses to store its checkpoints):
- org.apache.uima.collection.impl.cpm.CheckpointData
- org.apache.uima.util.ProcessTrace
- org.apache.uima.util.impl.ProcessTrace_impl
- org.apache.uima.collection.base_cpm.SynchPoint
Make sure to use !* as the final component to the filter pattern to disallow deserialization of any classes not listed in the pattern.
Apache UIMA 3.5.0 uses tightly scoped ObjectInputFilters when reading Java-serialized data depending on the type of data being expected. Configuring a global filter is not necessary with this version.
Weakness
The product receives input or data, but it does
not validate or incorrectly validates that the input has the
properties that are required to process the data safely and
correctly.
Affected Software
Name |
Vendor |
Start Version |
End Version |
Uimaj |
Apache |
* |
3.5.0 (excluding) |
Extended Description
Input validation is a frequently-used technique
for checking potentially dangerous inputs in order to
ensure that the inputs are safe for processing within the
code, or when communicating with other components.
Input can consist of:
Data can be simple or structured. Structured data
can be composed of many nested layers, composed of
combinations of metadata and raw data, with other simple or
structured data.
Many properties of raw data or metadata may need
to be validated upon entry into the code, such
as:
Implied or derived properties of data must often
be calculated or inferred by the code itself. Errors in
deriving properties may be considered a contributing factor
to improper input validation.
Potential Mitigations
- Assume all input is malicious. Use an “accept known good” input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does.
- When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, “boat” may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as “red” or “blue.”
- Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code’s environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
- For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server.
- Even though client-side checks provide minimal benefits with respect to server-side security, they are still useful. First, they can support intrusion detection. If the server receives input that should have been rejected by the client, then it may be an indication of an attack. Second, client-side error-checking can provide helpful feedback to the user about the expectations for valid input. Third, there may be a reduction in server-side processing time for accidental input errors, although this is typically a small savings.
- Inputs should be decoded and canonicalized to the application’s current internal representation before being validated (CWE-180, CWE-181). Make sure that your application does not inadvertently decode the same input twice (CWE-174). Such errors could be used to bypass allowlist schemes by introducing dangerous inputs after they have been checked. Use libraries such as the OWASP ESAPI Canonicalization control.
- Consider performing repeated canonicalization until your input does not change any more. This will avoid double-decoding and similar scenarios, but it might inadvertently modify inputs that are allowed to contain properly-encoded dangerous content.
References