[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1667aa56-e49d-495b-a8ea-8835b2ea7341@linux.dev>
Date: Sat, 1 Mar 2025 11:49:43 +0800
From: Wen Yang <wen.yang@...ux.dev>
To: Joel Granados <joel.granados@...nel.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Luis Chamberlain <mcgrof@...nel.org>, Kees Cook <keescook@...omium.org>,
Christian Brauner <brauner@...nel.org>, Dave Young <dyoung@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5] sysctl: simplify the min/max boundary check
On 2025/2/27 22:07, Joel Granados wrote:
> On Mon, Jan 27, 2025 at 11:51:51AM -0600, Eric W. Biederman wrote:
>> Joel Granados <joel.granados@...nel.org> writes:
>>
>>> On Thu, Jan 23, 2025 at 12:30:25PM -0600, Eric W. Biederman wrote:
>>>> "Eric W. Biederman" <ebiederm@...ssion.com> writes:
>>>>
>>>>> Joel Granados <joel.granados@...nel.org> writes:
>>>>>
>>>>>> On Sun, Jan 19, 2025 at 10:59:21PM +0800, Wen Yang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2025/1/16 17:37, Joel Granados wrote:
>>>>>>>> On Sun, Jan 05, 2025 at 11:28:53PM +0800, Wen Yang wrote:
>>>>>>>>> do_proc_dointvec_conv() used the default range of int type, while
>>>>>>>>> do_proc_dointvec_minmax_conv() additionally used "int * {min, max}" of
>>>>>>>>> struct do_proc_dointvec_minmax_conv_param, which are actually passed
>>>>>>>>> in table->extra{1,2} pointers.
>>> ...
>>>>>> (if any). And this is why:
>>>>>> 1. The long and the void* are most likely (depending on arch?) the same
>>>>>> size.
>>>>>> 2. In [1] it is mentioned that, we would still need an extra (void*) to
>>>>>> address the sysctl tables that are *NOT* using extra{1,2} as min max.
>>>>>> This means that we need a bigger ctl_table (long extra1, long extra2
>>>>>> and void* extra). We will need *more* memory?
>>>>>>
>>>>>> I would like to be proven wrong. So this is my proposal: Instead of
>>>>>> trying to do an incremental change, I suggest you remove the sysctl_vals
>>>>>> shared const array and measure how much memory you actually save. You
>>>>>> can use the ./scripts/bloat-o-meter in the linux kernel source and
>>>>>> follow something similar to what we did in [2] to measure how much
>>>>>> memory we are actually talking about.
>>>>>>
>>>>>> Once you get a hard number, then we can move forward on the memory
>>>>>> saving front.
>>>
>>> Hey Eric.
>>>
>>> Thx for the clarification. Much appreciated.
>>>>>
>>>>> When I originally suggested this my motivation had nothing to do with memory
>>> That makes a *lot* of sense :).
>>>
>>>>> The sysctl_vals memory array is type unsafe and has actively
>>> Here I understand that they are unsafe because of Integer promotion
>>> issues exacerbated by the void* variables extra{1,2}. Please correct me
>>> If I missed the point.
>>
>> Not precisely. It is because the (void *) pointers are silently cast to
>> either (int *) or (long *) pointers. So for example passing SYSCTL_ZERO
>> to proc_do_ulongvec_minmax results in reading sysctl_vals[0] and
>> sysctl_vals[1] and to get the long value. Since sysctl_vals[1] is 1
>> a 0 is not accepted because 0 is below the minimum.
>>
>> The minimum value that is accepted depends on which architecture you are
>> on. On x86_64 and other little endian architectures the minimum value
>> accepted is 0x0000000100000000. On big endian architectures like mips64
>> the minimum value accepted winds up being 0x0000000000000001. Or do I
>> have that backwards?
>>
>> It doesn't matter because neither case is what the programmer expected.
>> Further it means that keeping the current proc_do_ulongvec_minmax and
>> proc_do_int_minmax methods that it is impossible to define any of the
>> SYSCTL_XXX macros except SYSCTL_ZERO that will work with both methods.
>>
>>> There is also the fact that you can just do a `extra1 = &sysctl_vals[15]`
>>> and the compiler will not bark at you. At least It let me do that on my
>>> side.
>>
>> All of which in the simplest for has me think the SYSCTL_XXX cleanups
>> were a step in the wrong direction.
>>
>>>>> lead to real world bugs. AKA longs and int confusion. One example is
>>>>> that SYSCTL_ZERO does not properly work as a minimum to
>>>>> proc_do_ulongvec_minmax.
>>> That is a great example.
>>>
>>>>>
>>>>> Frankly those SYSCTL_XXX macros that use sysctl_vals are just plain
>>>>> scary to work with.
>>> I share your feeling :)
>>>
>>>>>
>>>>> So I suggested please making everything simpler by putting unsigned long
>>>>> min and max in to struct ctl_table and then getting rid of extra1 and
>>>>> extra2. As extra1 and extra2 are almost exclusively used to implement
>>>>> min and max.
>>> Explicitly specifying the type will help reduce the "unsefeness" but
>>> with all the ways that there are of using these pointers, I think we
>>> need to think bigger and maybe try to find a more typesafe way to
>>> represent all the interactions.
>>>
>>> It has always struck me as strange the arbitrariness of having 2 extra
>>> pointers. Why not just one?
>>
>> Which would be the void *data pointer.
>>
>>> At the end it is a pointer and can point to
>>> a struct that holds min, max... I do not have the answer yet, but I
>>> think what you propose here is part of a bigger refactoring needed in
>>> ctl_table structure. Would like to hear your thought on it if you have
>>> any.
>>
>> One of the things that happens and that is worth acknowledging is there
>> is code that wraps proc_doulongvec_minmax and proc_dointvec_minmax.
>> Having the minmax information separate from the data pointer makes that
>> wrapping easier.
>>
>> Further the min/max information is typically separate from other kinds
>> of data. So even when not wrapped it is nice just to take a quick
>> glance and see what the minimums and maximums are.
>>
>> My original suggest was that we change struct ctl_table from:
>>
>>> /* A sysctl table is an array of struct ctl_table: */
>>> struct ctl_table {
>>> const char *procname; /* Text ID for /proc/sys */
>>> void *data;
>>> int maxlen;
>>> umode_t mode;
>>> proc_handler *proc_handler; /* Callback for text formatting */
>>> struct ctl_table_poll *poll;
>>> void *extra1;
>>> void *extra2;
>>> } __randomize_layout;
>>
>> to:
>>
>>> /* A sysctl table is an array of struct ctl_table: */
>>> struct ctl_table {
>>> const char *procname; /* Text ID for /proc/sys */
>>> void *data;
>>> int maxlen;
>>> umode_t mode;
>>> proc_handler *proc_handler; /* Callback for text formatting */
>>> struct ctl_table_poll *poll;
>>> unsigned long min;
>>> unsigned long max;
>>> } __randomize_layout;
>>
>> That is just replace extra1 and extra2 with min and max members. The
>> members don't have any reason to be pointers. Without being pointers
>> the min/max functions can just use long values to cap either ints or
>> longs, and there is no room for error. The integer promotion rules
>> will ensure that even negative values can be stored in unsigned long
>> min and max values successfully. Plus it is all bog standard C
>> so there is nothing special to learn.
>>
>> There are a bunch of fiddly little details to transition from where we
>> are today. The most straightforward way I can see of making the
>> transition is to add the min and max members. Come up with replacements
>> for proc_doulongvec_minmax and proc_dointvec_minmax that read the new
>> min and max members. Update all of the users. Update the few users
>> that use extra1 or extra2 for something besides min and max. Then
>> remove extra1 and extra2. At the end it is simpler and requires the
>> same or a little less space.
>>
>> That was and remains my suggestion.
> Thx for all this. Been putting this off for a month now, but will slowly
> come back to it. I'll use your and Wen's series to try to come up with
> something that look good to me.
>
Thanks.
The following is the latest series, which has been tested for about 20 days:
https://lore.kernel.org/all/cover.1739115369.git.wen.yang@linux.dev/
We look forward to your comments and suggestions.
--
Best wishes,
Wen
Powered by blists - more mailing lists