linux-kernel - Re: [PATCH v1] mm/gup: remove (VM_)BUG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e3db61eb-6dda-44f7-94a7-97f43c19093e@redhat.com>
Date: Wed, 11 Jun 2025 11:32:07 +0200
From: David Hildenbrand <david@...hat.com>
To: John Hubbard <jhubbard@...dia.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Jason Gunthorpe <jgg@...pe.ca>
Cc: Michal Hocko <mhocko@...e.com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
 <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH v1] mm/gup: remove (VM_)BUG_ONs

On 07.06.25 20:00, John Hubbard wrote:
> On 6/7/25 6:53 AM, Lorenzo Stoakes wrote:
>> On Sat, Jun 07, 2025 at 10:42:14AM -0300, Jason Gunthorpe wrote:
>>> On Fri, Jun 06, 2025 at 08:03:15PM +0100, Lorenzo Stoakes wrote:
>>>> On Fri, Jun 06, 2025 at 07:46:52PM +0100, Lorenzo Stoakes wrote:
>>>>> On Fri, Jun 06, 2025 at 03:42:12PM -0300, Jason Gunthorpe wrote:
>>>>>> On Fri, Jun 06, 2025 at 08:23:25PM +0200, David Hildenbrand wrote:
> ...
>>>> And I guess we'd not want a new interface for this like WARN_ON_ONCE_STORED()
>>>> because that'd be... weird and how would anyone think to use that and nearly all
>>>> cases wouldn't.
>>>
>>> No! Delete WARN_ON_ONCE and say the new global mechanism is good
>>> enough to capture the first WARN_ON, everyone always uses it always
>>> and then nobody needs to think about this anymore when writing new
>>> code.
> Interesting. This would cause WARN_ON*() to act similarly to lockdep, in
> that the first instance is the only one that works.
> 
> That works, but if-and-only-if the system is kept completely WARN-free.
> However, I have a mitigation for that, below.
> 
>>
>> Well that is simpler :)
>>
>> I have encountered situations where I've had more than one and needed 2nd+
>> but it is rare as you say.
>>
>> My late night incoherent babbling yesterday was perhaps because I
>> misunderstood David/John as to what they encountered in the past... maybe
>> they can clarify...
> 
> I've debugged lots of production systems, often these were large HPC
> clusters and supercomputers. I've seen:
> 
> a) Long up-times, with (of course!) relatively small dmesg buffer sizes,
> so that early logs are long gone. This means that WARN_ON_ONCE() is
> quite often gone (overwritten). This is common.
> 
> The worst part is that if you go to reproduce a problem, you don't
> see the next warning in the logs!! This is devastating, especially if
> the site makes it hard to ask for a system reboot. (If you have
> ~20,000 nodes in the cluster, a reboot is not a small affair.)
> 
> b) WARN_ON() loops that fill up the dmesg buffer immediately, which
> of course also overwrites anything leading up to the first WARN_ON.
> 
> However, (b) is rare on production kernels--it's more commonly
> self-inflicted, if I make a mistake and write WARN_ON() when I should
> have written WARN_ON_ONCE(), in a custom build to explore a problem.
> 
> c) Other printk-based noise in a loop, occasionally this also wraps
> the dmesg buffer. Especially if a customer has done a bit of their
> own "special" logging code.
> 
> 
>>
>> I do find myself grepping dmesg to find the first warning and it's
>> _annoying_ to do so, so this would be handy.
>>
>> But I'm not sure it'd justify getting rid of WARN_ON_ONCE() when you are in
>> a loop or something and now your dmesg is going to go to hell, still useful
>> not to do that, esp. if you know there's no value to further warnings
> 
> In all cases, the ideal outcome is a dmesg buffer that includes (let's
> fantasize for a moment):
> 
> 1) the earliest dmesg output (showing any oddness with system
> configuration, and what hardware was brought online, etc), plus
> 
> 2) the first problem that is logged, plus
> 
> 3) ...sometimes two or even three things happen in order to get to the
> problem, so (again, fantasizing!) really the first *three* warnings
> would be good to have.
> 
> So this is just a 3-element array of global warnings--not another
> circular buffer, but a little larger than a TAINT or lockdep type
> of capture.

Can't we do something very simple to get started: remove VM_WARN_ON_ONCE 
and implement VM_WARN_ON as something that uses a single global variable 
to print up to *magic value* 10 warnings. Then we'll simply stop 
printing any further warnings.

Races when updating that global variable? Who cars if we end up printing 11.

That is, have a global limit over all of the VM_WARN.

Later, we can do more advanced stuff, such as storing them in some other 
buffer to persist them etc.

-- 
Cheers,

David / dhildenb