[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cabc6307-b5d1-488c-920d-0c66cedffb2e@linuxfoundation.org>
Date: Mon, 20 Oct 2025 14:36:17 -0600
From: Shuah Khan <skhan@...uxfoundation.org>
To: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@...il.com>,
akpm@...ux-foundation.org
Cc: linux-kernel@...r.kernel.org, david.hunter.linux@...il.com,
linux-kernel-mentees@...ts.linuxfoundation.org, khalid@...nel.org,
Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH] lib: cpu_rmap.c Refactor allocation size calculation in
kzalloc()
On 10/18/25 10:52, Mehdi Ben Hadj Khelifa wrote:
> On 10/10/25 6:00 PM, Shuah Khan wrote:
>> On 10/9/25 09:16, Mehdi Ben Hadj Khelifa wrote:
>>> On 10/7/25 11:23 PM, Shuah Khan wrote:
>>>
>>>>
>>>> How did you find this problem and how did you test this change?
>>
>> Bummer - you trimmed the code entirely from the thread. Next time
>> leave it in for context for the discussion.
>>
> Ah, I saw in other LKMLs that some do delete the code so I thought it was okay. We'll do next time.>> For the first part of your question,After simply referring to
>>> deprecated documentation[1] which states the following:
>>
>> Looks you forgot to add link to the deprecated documentation[1].
>> It sounds like this is a potential problem without a reproducer.
>> These types of problems made to a critical piece of code require
>> substantial testing.
>>
>
> Ack, This is the doc that I was referencing: https://docs.kernel.org/process/deprecated.html
> I'm not sure what is exactly demanded in substantial testing.My guess was to do normal testing as I mentionned and add some fault injection to test the change in case of failure and also compare dmesg outputs.I have run selftests for the net subsystem too since my last mail with no sign of regression.Any suggestions on what testing for this case should look like instead or on top of what I did?>> 'For other calculations, please compose the use of the size_mul(),
>>> size_add(), and size_sub() helpers'
>>> Which is about dynamic calculations made inside of kzalloc() and kmalloc(). Specifically, the quoted part is talking about calculations which can't be simply divided into two parameters referring to the number of elements and size per element and in cases where we can't use struct_size() too.After that it was a matter of finding code where that could be the problem which is the case of the changed code.
>>>
>>> For the second part, As per any patch,I make a copy of all dmesg warnings errors critical messages,then I compile install and boot the new kernel then check if there is any change or regression in dmesg.
>>
>> This is a basic boot test which isn't sufficient in this case.
>>
>>> For this particular change, since it doesn't have any selftests because it's in utility library which in my case cpu_rmap is used in the networking subsystem, I did some fault injection with a custom module to test if in case of overflow it fails safely reporting the issue in dmesg which is catched by the __alloc_frozen_pages_noprof() function in mm/page_alloc.c and also return a NULL for rmap instead of wrapping to a smaller size.
Why not a write a test for this then?
>>
>> Custom module testing doesn't test this change in a wider scope
>> which is necessary when you are making changes such as these
>> without a reproducer and a way to reproduce. How do you know
>> this change doesn't introduce regressions?
>>
> My custom module testing specifically tested the change in case of failure which is what the change is for in the first place.The change which deems to be simple in the documentation since we are just wrapping calculations instead of using operators,is just to safe guard calculations that are made inside of kzalloc() so that no unwanted behavior is produced i.e in case of overflow.As I mentionned above,I tested regressions by running selftests for net subsystem with it showing no regressions on top of fault injection mentionned.
> I would like to have more guidance as to what I could do to have more robust testing in this case.> thanks,
So as you say this is a potential overflow, can you explain what
are the cases where you would run into this?
thanks,
-- Shuah
Powered by blists - more mailing lists