linux-kernel - Re: Memory hotplug locking issue: Useless (?) zone span seqlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5cde237-0dcf-4e85-b763-7a38e9f9c563@redhat.com>
Date: Thu, 8 May 2025 12:45:01 +0200
From: David Hildenbrand <david@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Oscar Salvador <osalvador@...e.de>
Cc: Michal Hocko <mhocko@...e.com>,
 Anshuman Khandual <anshuman.khandual@....com>,
 Vlastimil Babka <vbabka@...e.cz>, Pavel Tatashin
 <pasha.tatashin@...een.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
 linux-kernel <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>
Subject: Re: Memory hotplug locking issue: Useless (?) zone span seqlock

On 07.03.25 21:22, Mathieu Desnoyers wrote:
> I'm currently perfecting my understanding of the mm code and reviewing
> pieces of it as I go, and stumbled on this:
> 
> commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking")
> 
> This commit removes all users of zone_span_writelock(), thus making
> the inline useless, but leaves the now useless
> zone_span_seqbegin()/zone_span_seqretry() in place within
> page_outside_zone_boundaries().
> 
> So I'm confused. What's going on ?
> 
> And if this commit got things very wrong when removing the
> seqlock, I wonder if there are cases where its partial
> pgdat_resize_lock() removal can be an issue as well.

I stumbled over that myself recently as well. I think I mentioned in the 
past that we should just store

start_pfn + end_pfn

instead of

start_pfn + nr_pages


Then, concurrent resizing could happen (and we could atomically read 
start_pfn / end_pfn).

Right now, when adjusting start_pfn, we always also have to adjust 
nr_pages. A concurrent reader calculating end_pfn manually could see 
some crappy result.

Having that said, I am not aware of issues in that area, but it all 
looks like only a partial cleanup to me.

-- 
Cheers,

David / dhildenb