Prior to Spark 2.3.3, in certain situations Spark would write user data to local disk unencrypted, even if spark.io.encryption.enabled=true. This includes cached blocks that are fetched to disk (controlled by spark.maxRemoteBlockSizeFetchToMem); in SparkR, using parallelize; in Pyspark, using broadcast and parallelize; and use of python udfs.
The product stores sensitive information in cleartext within a resource that might be accessible to another control sphere.
Name | Vendor | Start Version | End Version |
---|---|---|---|
Spark | Apache | 1.0.2 (including) | 1.6.3 (including) |
Spark | Apache | 2.0.0 (including) | 2.0.2 (including) |
Spark | Apache | 2.1.0 (including) | 2.1.3 (including) |
Spark | Apache | 2.2.0 (including) | 2.2.2 (including) |
Spark | Apache | 2.3.0 (including) | 2.3.2 (excluding) |