linux-kernel - Re: [PATCH] mm: slub: Panic if the object corruption is checked.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3661f67f-8c52-4e7b-80b6-9b3cc63b41bd@suse.cz>
Date: Tue, 21 Jan 2025 11:27:19 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Hyesoo Yu <hyesoo.yu@...sung.com>, Matthew Wilcox <willy@...radead.org>
Cc: janghyuck.kim@...sung.com, Andrew Morton <akpm@...ux-foundation.org>,
 Jonathan Corbet <corbet@....net>, Christoph Lameter <cl@...ux.com>,
 Pekka Enberg <penberg@...nel.org>, David Rientjes <rientjes@...gle.com>,
 Joonsoo Kim <iamjoonsoo.kim@....com>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: slub: Panic if the object corruption is checked.

On 1/21/25 1:40 AM, Hyesoo Yu wrote:
> On Mon, Jan 20, 2025 at 03:36:08PM +0000, Matthew Wilcox wrote:
>> On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote:
>>> If a slab object is corrupted or an error occurs in its internal
>>> value, continuing after restoration may cause other side effects.
>>> At this point, it is difficult to debug because the problem occurred
>>> in the past. A flag has been added that can cause a panic when there
>>> is a problem with the object.
>>>
>>> Signed-off-by: Hyesoo Yu <hyesoo.yu@...sung.com>
>>> Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493
>>
>> Linux does not use Change IDs.  Please omit these from future patches.
>>
>> Panicing is a very unfriendly approach.  I think a better approach would
>> be to freeze the slab where corruption is detected.  That is, no future
>> objects are allocated from that slab, and attempts to free objects from
>> that slab become no-ops.  I don't think that should be hard to implement.

Freezing of slab is already done in some cases when corruption is
detected - all objects are marked as used, and further freeing attempts
on the slab are discarded. Perhaps not all cases, which could be improved.

> Thanks you for your responce. That is my mistake. I will remove the change ID.
> 
> I agree that freezing is better than recovery or panic for the system's stability.
> However what I want from the patch is not just to make the system run stably.
> I need to immediately trigger a panic to investigate the slub.

IMHO it's a valid goal to panic more quickly when debugging, and
enabling slub_debug means debugging is in progress (as opposed to normal
production when we try to avoid panic).
But making it possible to reuse the general panic_on_warn mechanism
(which can be also expected to be enabled when debugging) is indeed
preferable to introducing a new slab-specific flag.

> I would like to analyze the corrupted data at that moment to check issues
> like cache problem, user errors, system clock frequency and similar problems,
> not just passing by without any issues.
> 
> However I agree that panic is not a friendly approach.
> I will modify it to notify the problem using warn() and then use
> panic_on_warn to trigger panic.
> 
> Thanks,
> Regards.
> 
>