lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc201f93-7743-4d9d-a62e-3c8ea22e2cfd@arm.com>
Date: Fri, 4 Jul 2025 17:12:13 +0530
From: Dev Jain <dev.jain@....com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: will@...nel.org, anshuman.khandual@....com, quic_zhenhuah@...cinc.com,
 ryan.roberts@....com, kevin.brodsky@....com, yangyicong@...ilicon.com,
 joey.gouly@....com, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, david@...hat.com
Subject: Re: [PATCH v4] arm64: Enable vmalloc-huge with ptdump


On 04/07/25 4:52 pm, Catalin Marinas wrote:
> On Thu, Jun 26, 2025 at 10:55:24AM +0530, Dev Jain wrote:
>> @@ -1301,16 +1314,39 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
>>   	}
>>   
>>   	table = pmd_offset(pudp, addr);
>> +	/*
>> +	 * Isolate the PMD table; in case of race with ptdump, this helps
>> +	 * us to avoid taking the lock in __pmd_free_pte_page().
>> +	 *
>> +	 * Static key logic:
>> +	 *
>> +	 * Case 1: If ptdump does static_branch_enable(), and after that we
>> +	 * execute the if block, then this patches in the read lock, ptdump has
>> +	 * the write lock patched in, therefore ptdump will never read from
>> +	 * a potentially freed PMD table.
>> +	 *
>> +	 * Case 2: If the if block starts executing before ptdump's
>> +	 * static_branch_enable(), then no locking synchronization
>> +	 * will be done. However, pud_clear() + the dsb() in
>> +	 * __flush_tlb_kernel_pgtable will ensure that ptdump observes an
> Statements like "observes" really need to be relative, not absolute.
> Like in "observes an empty PUD before/after ...".
>
>> +	 * empty PUD. Thus, it will never walk over a potentially freed
>> +	 * PMD table.
>> +	 */
>> +	pud_clear(pudp);
>> +	__flush_tlb_kernel_pgtable(addr);
>> +	if (static_branch_unlikely(&ptdump_lock_key)) {
>> +		mmap_read_lock(&init_mm);
>> +		mmap_read_unlock(&init_mm);
>> +	}
> This needs a formal model ;).
>
> static_branch_enable() is called before the mmap_write_lock(), so even
> if the above observes the new branch, it may do the read_unlock() before
> the ptdump_walk_pgd() attempted the write lock. So your case 1
> description is not entirely correct.

Thanks, I see I escaped by saying "we have read and write locks patched
in so it will all work out" this surely needs a better explanation.

>
> I don't get case 2. You want to ensure pud_clear() is observed by the
> ptdump code before the pmd_free(). The DSB in the TLB flushing code
> ensures some ordering between the pud_clear() and presumably something
> that the ptdump code can observe as well. Would that be the mmap
> semaphore? However, the read_lock would only be attempted if this code
> is seeing the static branch update, which is not guaranteed. I don't
> think it even matters since the lock may be released anyway before the
> write_lock in ptdump.
>
> For example, you do a pud_clear() above, skip the whole static branch.
> The ptdump comes along on another CPU but does not observe the
> pud_clear() since there's no synchronisation. It goes ahead and
> dereferences it while this CPU does a pmd_free().

The objective is: ptdump must not dereference a freed pagetable.
So for your example, if the static branch is not observed by
pud_free_pmd_page, this means that ptdump will take the write lock
after the execution of flush_tlb_kernel_pagetable completes (for if ptdump takes
the write lock before execution of flush_tlb_kernel_pagetable completes, we have
executed static_branch_enable(), contradiction). Since that contains a
dsb(ishst), that means that the store is observed by all observers in the inner shareable domain,
which is (I think?) all CPUs. Therefore before taking the write lock, ptdump
will observe an empty pud, so we cannot have an invalid dereference.
What is wrong with my reasoning here?

>
> And I can't get my head around memory ordering, it doesn't look sound.
> static_branch_enable() I don't think has acquire semantics, at least not
> in relation to the actual branch update. The static_branch_unlikely()
> test, again, not sure what guarantees it has (I don't think it has any
> in relation to memory loads). Maybe you have worked it all out and is
> fine but it needs a better explanation and ideally some simple formal
> model. Show it's correct with a global variable first and then we can
> optimise with static branches.

What do you suggest? As in what do you mean by showing its correct with
a global variable first...and, for the formal model thingy, do you
want mathematical rigor similar to what you explain in [1] :P, because unfortunately
(and quite embarrassingly) I didn't have formal methods in college : )

[1] https://www.youtube.com/watch?v=qeUHlXOBXmg


>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ