[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20ab00da-e85d-4f18-b482-bb406275693c@arm.com>
Date: Thu, 31 Jul 2025 10:00:15 +0530
From: Dev Jain <dev.jain@....com>
To: Ryan Roberts <ryan.roberts@....com>,
Catalin Marinas <catalin.marinas@....com>
Cc: will@...nel.org, anshuman.khandual@....com, quic_zhenhuah@...cinc.com,
kevin.brodsky@....com, yangyicong@...ilicon.com, joey.gouly@....com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
david@...hat.com, mark.rutland@....com, urezki@...il.com
Subject: Re: [RESEND PATCH v5] arm64: Enable vmalloc-huge with ptdump
On 30/07/25 11:59 pm, Ryan Roberts wrote:
> On 30/07/2025 18:00, Catalin Marinas wrote:
>> On Wed, Jul 23, 2025 at 09:48:27PM +0530, Dev Jain wrote:
> [...]
>
>>> + * mmap_write_lock/unlock in T1 be called CS (the critical section).
>>> + *
>>> + * Claim: The CS of T1 will never operate on a freed PMD table.
>>> + *
>>> + * Proof:
>>> + *
>>> + * Case 1: The static branch is visible to T2.
>>> + *
>>> + * Case 1 (a): T1 acquires the lock before T2 can.
>>> + * T2 will block until T1 drops the lock, so pmd_free() will only be
>>> + * executed after T1 exits CS.
>> This assumes that there is some ordering between unlock and pmd_free()
>> (e.g. some poisoning of the old page). The unlock only gives us release
>> semantics, not acquire. It just happens that we have an atomic
>> dec-and-test down the __free_pages() path but I'm not convinced we
>> should rely on it unless free_pages() has clear semantics on ordering
>> related to prior memory writes.
> I can understand how pmd_free() could be re-ordered before the unlock, but
> surely it can't be reorded before the lock? I need to go unlearn everything I
> thought I understood about locking if that's the case...
>
You are correct, what Catalin is saying is that my reasoning has a hole.
There is no obvious ordering between unlock and free(), but
mmap_write_unlock() will happen before mmap_read_lock() ... (i)
mmap_read_lock() will happen before pmd_free() ... (ii)
which lets us conclude that mmap_write_unlock() will happen before pmd_free().
A more rigorous way to write this would be (for Case 1):
T2 T1
pmd_clear() 5. cmpxchg(enable static branch)
1. while (!cmpxchg(check if lock not taken in write mode)); 6. smp_mb()
2. smp_mb(); 7. while (!cmpxchg(check if lock not taken))
3. smp_mb(); 8. smp_mb()
4. cmpxchg(release lock) CS instructions
pmd_free() 9. smp_mb()
10. cmpxchg (release lock)
where: (1,2) = mmap_read_lock(), (3,4) = mmap_read_unlock(), (5,6) = static_branch_enable(),
(7,8) = mmap_write_lock(), (9, 10) = mmap_write_unlock().
For Case 1(a), 7 succeeds before 1 can. So, 1 will block until 10 completes, so 10 is
observed before 1, and 1 is observed before pmd_free() due to the barriers. The associativity follows.
Powered by blists - more mailing lists