[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMk8QhkumtEoPVTh@willie-the-truck>
Date: Tue, 16 Sep 2025 11:30:26 +0100
From: Will Deacon <will@...nel.org>
To: Dev Jain <dev.jain@....com>
Cc: catalin.marinas@....com, anshuman.khandual@....com,
quic_zhenhuah@...cinc.com, ryan.roberts@....com,
kevin.brodsky@....com, yangyicong@...ilicon.com, joey.gouly@....com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
david@...hat.com, mark.rutland@....com, urezki@...il.com,
jthoughton@...gle.com
Subject: Re: [RESEND PATCH v5] arm64: Enable vmalloc-huge with ptdump
Hi Dev,
On Wed, Jul 23, 2025 at 09:48:27PM +0530, Dev Jain wrote:
> @@ -1301,16 +1319,76 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
> }
>
> table = pmd_offset(pudp, addr);
> +
> + /*
> + * Our objective is to prevent ptdump from reading a PMD table which has
> + * been freed. Assume that ptdump_walk_pgd() (call this thread T1)
> + * executes completely on CPU1 and pud_free_pmd_page() (call this thread
> + * T2) executes completely on CPU2. Let the region sandwiched by the
> + * mmap_write_lock/unlock in T1 be called CS (the critical section).
> + *
> + * Claim: The CS of T1 will never operate on a freed PMD table.
> + *
> + * Proof:
> + *
> + * Case 1: The static branch is visible to T2.
> + *
> + * Case 1 (a): T1 acquires the lock before T2 can.
> + * T2 will block until T1 drops the lock, so pmd_free() will only be
> + * executed after T1 exits CS.
> + *
> + * Case 1 (b): T2 acquires the lock before T1 can.
> + * The sequence of barriers issued in __flush_tlb_kernel_pgtable()
> + * ensures that an empty PUD (via pud_clear()) is visible to T1 before
> + * T1 can enter CS, therefore it is impossible for the CS to get hold
> + * of the address of the isolated PMD table.
> + *
> + * Case 2: The static branch is not visible to T2.
> + *
> + * Since static_branch_enable() (via dmb(ish)) and mmap_write_lock()
> + * have acquire semantics, it is guaranteed that the static branch
> + * will be visible to all CPUs before T1 can enter CS. The static
> + * branch not being visible to T2 therefore guarantees that T1 has
> + * not yet entered CS .... (i)
> + * The sequence of barriers via __flush_tlb_kernel_pgtable() in T2
> + * implies that if the invisibility of the static branch has been
> + * observed by T2 (i.e static_branch_unlikely() is observed as false),
> + * then all CPUs will have observed an empty PUD ... (ii)
> + * Combining (i) and (ii), we conclude that T1 observes an empty PUD
> + * before entering CS => it is impossible for the CS to get hold of
> + * the address of the isolated PMD table. Q.E.D
> + *
> + * We have proven that the claim is true on the assumption that
> + * there is no context switch for T1 and T2. Note that the reasoning
> + * of the proof uses barriers operating on the inner shareable domain,
> + * which means that they will affect all CPUs, and also a context switch
> + * will insert extra barriers into the code paths => the claim will
> + * stand true even if we drop the assumption.
> + *
> + * It is also worth reasoning whether something can go wrong via
> + * pud_free_pmd_page() -> __pmd_free_pte_page(), since the latter
> + * will be called locklessly on this code path.
> + *
> + * For Case 1 (a), T2 will block until CS is finished, so we are safe.
> + * For Case 1 (b) and Case 2, the PMD table will be isolated before
> + * T1 can enter CS, therefore it is safe for T2 to operate on the
> + * PMD table locklessly.
> + */
Although I can see that you put a lot of effort into this comment, I
think we should just remove it. Instead, we should have a litmus test
in the commit message and probably just some small comments here to
explain e.g. why the mmap_read_lock() critical section is empty.
I'm currently trying to put together a litmus test with James (cc'd) so
maybe we can help you out with that part.
In the meantime...
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index 421a5de806c6..65335c7ba482 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -283,6 +283,13 @@ void note_page_flush(struct ptdump_state *pt_st)
> note_page(pt_st, 0, -1, pte_val(pte_zero));
> }
>
> +static void arm64_ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
> +{
> + static_branch_enable(&arm64_ptdump_lock_key);
> + ptdump_walk_pgd(st, mm, NULL);
> + static_branch_disable(&arm64_ptdump_lock_key);
> +}
What serialises the toggling of the static key here? For example, I can't
see what prevents a kernel page-table dump running concurrently with a
check_wx() and the key ending up in the wrong state.
Either we need an additional lock or perhaps using
static_branch_{inc,dec}() would work instead? I haven't thought too hard
about that but it looks like we need _something_.
Will
Powered by blists - more mailing lists