In the Linux kernel, the following vulnerability has been resolved:
mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
In commit 34488399fa08 (mm/madvise: add file and shmem support to MADV_COLLAPSE) we make the following change to find_pmd_or_thp_or_none():
- if (!pmd_present(pmde))
- return SCAN_PMD_NULL;
+ if (pmd_none(pmde))
+ return SCAN_PMD_NONE;
This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include:
A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address.
In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or its a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.
However, the commit introduces a logical hole; namely, that weve allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values arent checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd.
We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch.
The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86s pmd_present() also checks the _PAGE_PSE , riscvs version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit.
Covering all 4 cases for x86 (all checks done on the same pmd value):
pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We arent racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger?
The only relevant path is:
madvise_collapse() -> collapse_pte_mapped_thp()
Where we might just incorrectly report back success, when really the memory isnt pmd-backed. This is fine, since split could happen immediately after (actually) successful madvise_collapse(). So, it should be safe to just assume huge-pmd here.
pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock – or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated.
!pmd_present() && pmd_trans_huge() Not possible.
!pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present
Ive checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesnt necessarily hold in general – but that doesnt matter since !pmd_present() always takes failure path).
Also, add a comment above find_pmd_or_thp_or_none() —truncated—
Name | Vendor | Start Version | End Version |
---|---|---|---|
Linux-allwinner-5.19 | Ubuntu | jammy | * |
Linux-allwinner-5.19 | Ubuntu | upstream | * |
Linux-aws-5.0 | Ubuntu | esm-infra/bionic | * |
Linux-aws-5.0 | Ubuntu | upstream | * |
Linux-aws-5.11 | Ubuntu | focal | * |
Linux-aws-5.11 | Ubuntu | upstream | * |
Linux-aws-5.13 | Ubuntu | focal | * |
Linux-aws-5.13 | Ubuntu | upstream | * |
Linux-aws-5.19 | Ubuntu | jammy | * |
Linux-aws-5.19 | Ubuntu | upstream | * |
Linux-aws-5.3 | Ubuntu | esm-infra/bionic | * |
Linux-aws-5.3 | Ubuntu | upstream | * |
Linux-aws-5.8 | Ubuntu | focal | * |
Linux-aws-5.8 | Ubuntu | upstream | * |
Linux-aws-6.2 | Ubuntu | jammy | * |
Linux-aws-6.2 | Ubuntu | upstream | * |
Linux-aws-6.5 | Ubuntu | jammy | * |
Linux-aws-6.5 | Ubuntu | upstream | * |
Linux-azure | Ubuntu | esm-infra/bionic | * |
Linux-azure-5.11 | Ubuntu | focal | * |
Linux-azure-5.11 | Ubuntu | upstream | * |
Linux-azure-5.13 | Ubuntu | focal | * |
Linux-azure-5.13 | Ubuntu | upstream | * |
Linux-azure-5.19 | Ubuntu | jammy | * |
Linux-azure-5.19 | Ubuntu | upstream | * |
Linux-azure-5.3 | Ubuntu | esm-infra/bionic | * |
Linux-azure-5.3 | Ubuntu | upstream | * |
Linux-azure-5.8 | Ubuntu | focal | * |
Linux-azure-5.8 | Ubuntu | upstream | * |
Linux-azure-6.2 | Ubuntu | jammy | * |
Linux-azure-6.2 | Ubuntu | upstream | * |
Linux-azure-6.5 | Ubuntu | jammy | * |
Linux-azure-6.5 | Ubuntu | upstream | * |
Linux-azure-edge | Ubuntu | esm-infra/bionic | * |
Linux-azure-edge | Ubuntu | upstream | * |
Linux-azure-fde | Ubuntu | focal | * |
Linux-azure-fde-5.19 | Ubuntu | jammy | * |
Linux-azure-fde-5.19 | Ubuntu | upstream | * |
Linux-azure-fde-6.2 | Ubuntu | jammy | * |
Linux-azure-fde-6.2 | Ubuntu | upstream | * |
Linux-gcp | Ubuntu | esm-infra/bionic | * |
Linux-gcp-5.11 | Ubuntu | focal | * |
Linux-gcp-5.11 | Ubuntu | upstream | * |
Linux-gcp-5.13 | Ubuntu | focal | * |
Linux-gcp-5.13 | Ubuntu | upstream | * |
Linux-gcp-5.19 | Ubuntu | jammy | * |
Linux-gcp-5.19 | Ubuntu | upstream | * |
Linux-gcp-5.3 | Ubuntu | esm-infra/bionic | * |
Linux-gcp-5.3 | Ubuntu | upstream | * |
Linux-gcp-5.8 | Ubuntu | focal | * |
Linux-gcp-5.8 | Ubuntu | upstream | * |
Linux-gcp-6.2 | Ubuntu | jammy | * |
Linux-gcp-6.2 | Ubuntu | upstream | * |
Linux-gcp-6.5 | Ubuntu | jammy | * |
Linux-gcp-6.5 | Ubuntu | upstream | * |
Linux-gke | Ubuntu | focal | * |
Linux-gke-4.15 | Ubuntu | esm-infra/bionic | * |
Linux-gke-4.15 | Ubuntu | upstream | * |
Linux-gke-5.15 | Ubuntu | focal | * |
Linux-gke-5.15 | Ubuntu | upstream | * |
Linux-gke-5.4 | Ubuntu | esm-infra/bionic | * |
Linux-gke-5.4 | Ubuntu | upstream | * |
Linux-gkeop | Ubuntu | focal | * |
Linux-gkeop-5.15 | Ubuntu | focal | * |
Linux-gkeop-5.4 | Ubuntu | esm-infra/bionic | * |
Linux-gkeop-5.4 | Ubuntu | upstream | * |
Linux-hwe | Ubuntu | esm-infra/bionic | * |
Linux-hwe-5.11 | Ubuntu | focal | * |
Linux-hwe-5.11 | Ubuntu | upstream | * |
Linux-hwe-5.13 | Ubuntu | focal | * |
Linux-hwe-5.13 | Ubuntu | upstream | * |
Linux-hwe-5.19 | Ubuntu | jammy | * |
Linux-hwe-5.19 | Ubuntu | upstream | * |
Linux-hwe-5.8 | Ubuntu | focal | * |
Linux-hwe-5.8 | Ubuntu | upstream | * |
Linux-hwe-6.2 | Ubuntu | jammy | * |
Linux-hwe-6.2 | Ubuntu | upstream | * |
Linux-hwe-6.5 | Ubuntu | jammy | * |
Linux-hwe-6.5 | Ubuntu | upstream | * |
Linux-hwe-edge | Ubuntu | esm-infra/bionic | * |
Linux-hwe-edge | Ubuntu | esm-infra/xenial | * |
Linux-hwe-edge | Ubuntu | upstream | * |
Linux-intel-5.13 | Ubuntu | focal | * |
Linux-intel-5.13 | Ubuntu | upstream | * |
Linux-lowlatency-hwe-5.19 | Ubuntu | jammy | * |
Linux-lowlatency-hwe-5.19 | Ubuntu | upstream | * |
Linux-lowlatency-hwe-6.2 | Ubuntu | jammy | * |
Linux-lowlatency-hwe-6.2 | Ubuntu | upstream | * |
Linux-lowlatency-hwe-6.5 | Ubuntu | jammy | * |
Linux-lowlatency-hwe-6.5 | Ubuntu | upstream | * |
Linux-nvidia-6.2 | Ubuntu | jammy | * |
Linux-nvidia-6.2 | Ubuntu | upstream | * |
Linux-nvidia-6.5 | Ubuntu | jammy | * |
Linux-nvidia-6.5 | Ubuntu | upstream | * |
Linux-oem | Ubuntu | esm-infra/bionic | * |
Linux-oem | Ubuntu | upstream | * |
Linux-oem-5.10 | Ubuntu | focal | * |
Linux-oem-5.10 | Ubuntu | upstream | * |
Linux-oem-5.13 | Ubuntu | focal | * |
Linux-oem-5.13 | Ubuntu | upstream | * |
Linux-oem-5.14 | Ubuntu | focal | * |
Linux-oem-5.14 | Ubuntu | upstream | * |
Linux-oem-5.17 | Ubuntu | jammy | * |
Linux-oem-5.17 | Ubuntu | upstream | * |
Linux-oem-5.6 | Ubuntu | focal | * |
Linux-oem-5.6 | Ubuntu | upstream | * |
Linux-oem-6.0 | Ubuntu | jammy | * |
Linux-oem-6.0 | Ubuntu | upstream | * |
Linux-oem-6.1 | Ubuntu | jammy | * |
Linux-oem-6.1 | Ubuntu | upstream | * |
Linux-oem-6.5 | Ubuntu | jammy | * |
Linux-oem-6.5 | Ubuntu | upstream | * |
Linux-oracle-5.0 | Ubuntu | esm-infra/bionic | * |
Linux-oracle-5.0 | Ubuntu | upstream | * |
Linux-oracle-5.11 | Ubuntu | focal | * |
Linux-oracle-5.11 | Ubuntu | upstream | * |
Linux-oracle-5.13 | Ubuntu | focal | * |
Linux-oracle-5.13 | Ubuntu | upstream | * |
Linux-oracle-5.3 | Ubuntu | esm-infra/bionic | * |
Linux-oracle-5.3 | Ubuntu | upstream | * |
Linux-oracle-5.8 | Ubuntu | focal | * |
Linux-oracle-5.8 | Ubuntu | upstream | * |
Linux-oracle-6.5 | Ubuntu | jammy | * |
Linux-oracle-6.5 | Ubuntu | upstream | * |
Linux-raspi2 | Ubuntu | focal | * |
Linux-raspi2 | Ubuntu | upstream | * |
Linux-realtime | Ubuntu | jammy | * |
Linux-realtime | Ubuntu | noble | * |
Linux-riscv | Ubuntu | focal | * |
Linux-riscv | Ubuntu | jammy | * |
Linux-riscv-5.11 | Ubuntu | focal | * |
Linux-riscv-5.11 | Ubuntu | upstream | * |
Linux-riscv-5.19 | Ubuntu | jammy | * |
Linux-riscv-5.19 | Ubuntu | upstream | * |
Linux-riscv-5.8 | Ubuntu | focal | * |
Linux-riscv-5.8 | Ubuntu | upstream | * |
Linux-riscv-6.5 | Ubuntu | jammy | * |
Linux-riscv-6.5 | Ubuntu | upstream | * |
Linux-starfive-5.19 | Ubuntu | jammy | * |
Linux-starfive-5.19 | Ubuntu | upstream | * |
Linux-starfive-6.2 | Ubuntu | jammy | * |
Linux-starfive-6.2 | Ubuntu | upstream | * |
Linux-starfive-6.5 | Ubuntu | jammy | * |
Linux-starfive-6.5 | Ubuntu | upstream | * |