[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875xm0gn60.fsf@email.froward.int.ebiederm.org>
Date: Mon, 27 Jan 2025 11:51:51 -0600
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Joel Granados <joel.granados@...nel.org>
Cc: Wen Yang <wen.yang@...ux.dev>, Luis Chamberlain <mcgrof@...nel.org>,
Kees Cook <keescook@...omium.org>, Christian Brauner
<brauner@...nel.org>, Dave Young <dyoung@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5] sysctl: simplify the min/max boundary check
Joel Granados <joel.granados@...nel.org> writes:
> On Thu, Jan 23, 2025 at 12:30:25PM -0600, Eric W. Biederman wrote:
>> "Eric W. Biederman" <ebiederm@...ssion.com> writes:
>>
>> > Joel Granados <joel.granados@...nel.org> writes:
>> >
>> >> On Sun, Jan 19, 2025 at 10:59:21PM +0800, Wen Yang wrote:
>> >>>
>> >>>
>> >>> On 2025/1/16 17:37, Joel Granados wrote:
>> >>> > On Sun, Jan 05, 2025 at 11:28:53PM +0800, Wen Yang wrote:
>> >>> > > do_proc_dointvec_conv() used the default range of int type, while
>> >>> > > do_proc_dointvec_minmax_conv() additionally used "int * {min, max}" of
>> >>> > > struct do_proc_dointvec_minmax_conv_param, which are actually passed
>> >>> > > in table->extra{1,2} pointers.
> ...
>> >> (if any). And this is why:
>> >> 1. The long and the void* are most likely (depending on arch?) the same
>> >> size.
>> >> 2. In [1] it is mentioned that, we would still need an extra (void*) to
>> >> address the sysctl tables that are *NOT* using extra{1,2} as min max.
>> >> This means that we need a bigger ctl_table (long extra1, long extra2
>> >> and void* extra). We will need *more* memory?
>> >>
>> >> I would like to be proven wrong. So this is my proposal: Instead of
>> >> trying to do an incremental change, I suggest you remove the sysctl_vals
>> >> shared const array and measure how much memory you actually save. You
>> >> can use the ./scripts/bloat-o-meter in the linux kernel source and
>> >> follow something similar to what we did in [2] to measure how much
>> >> memory we are actually talking about.
>> >>
>> >> Once you get a hard number, then we can move forward on the memory
>> >> saving front.
>
> Hey Eric.
>
> Thx for the clarification. Much appreciated.
>> >
>> > When I originally suggested this my motivation had nothing to do with memory
> That makes a *lot* of sense :).
>
>> > The sysctl_vals memory array is type unsafe and has actively
> Here I understand that they are unsafe because of Integer promotion
> issues exacerbated by the void* variables extra{1,2}. Please correct me
> If I missed the point.
Not precisely. It is because the (void *) pointers are silently cast to
either (int *) or (long *) pointers. So for example passing SYSCTL_ZERO
to proc_do_ulongvec_minmax results in reading sysctl_vals[0] and
sysctl_vals[1] and to get the long value. Since sysctl_vals[1] is 1
a 0 is not accepted because 0 is below the minimum.
The minimum value that is accepted depends on which architecture you are
on. On x86_64 and other little endian architectures the minimum value
accepted is 0x0000000100000000. On big endian architectures like mips64
the minimum value accepted winds up being 0x0000000000000001. Or do I
have that backwards?
It doesn't matter because neither case is what the programmer expected.
Further it means that keeping the current proc_do_ulongvec_minmax and
proc_do_int_minmax methods that it is impossible to define any of the
SYSCTL_XXX macros except SYSCTL_ZERO that will work with both methods.
> There is also the fact that you can just do a `extra1 = &sysctl_vals[15]`
> and the compiler will not bark at you. At least It let me do that on my
> side.
All of which in the simplest for has me think the SYSCTL_XXX cleanups
were a step in the wrong direction.
>> > lead to real world bugs. AKA longs and int confusion. One example is
>> > that SYSCTL_ZERO does not properly work as a minimum to
>> > proc_do_ulongvec_minmax.
> That is a great example.
>
>> >
>> > Frankly those SYSCTL_XXX macros that use sysctl_vals are just plain
>> > scary to work with.
> I share your feeling :)
>
>> >
>> > So I suggested please making everything simpler by putting unsigned long
>> > min and max in to struct ctl_table and then getting rid of extra1 and
>> > extra2. As extra1 and extra2 are almost exclusively used to implement
>> > min and max.
> Explicitly specifying the type will help reduce the "unsefeness" but
> with all the ways that there are of using these pointers, I think we
> need to think bigger and maybe try to find a more typesafe way to
> represent all the interactions.
>
> It has always struck me as strange the arbitrariness of having 2 extra
> pointers. Why not just one?
Which would be the void *data pointer.
> At the end it is a pointer and can point to
> a struct that holds min, max... I do not have the answer yet, but I
> think what you propose here is part of a bigger refactoring needed in
> ctl_table structure. Would like to hear your thought on it if you have
> any.
One of the things that happens and that is worth acknowledging is there
is code that wraps proc_doulongvec_minmax and proc_dointvec_minmax.
Having the minmax information separate from the data pointer makes that
wrapping easier.
Further the min/max information is typically separate from other kinds
of data. So even when not wrapped it is nice just to take a quick
glance and see what the minimums and maximums are.
My original suggest was that we change struct ctl_table from:
> /* A sysctl table is an array of struct ctl_table: */
> struct ctl_table {
> const char *procname; /* Text ID for /proc/sys */
> void *data;
> int maxlen;
> umode_t mode;
> proc_handler *proc_handler; /* Callback for text formatting */
> struct ctl_table_poll *poll;
> void *extra1;
> void *extra2;
> } __randomize_layout;
to:
> /* A sysctl table is an array of struct ctl_table: */
> struct ctl_table {
> const char *procname; /* Text ID for /proc/sys */
> void *data;
> int maxlen;
> umode_t mode;
> proc_handler *proc_handler; /* Callback for text formatting */
> struct ctl_table_poll *poll;
> unsigned long min;
> unsigned long max;
> } __randomize_layout;
That is just replace extra1 and extra2 with min and max members. The
members don't have any reason to be pointers. Without being pointers
the min/max functions can just use long values to cap either ints or
longs, and there is no room for error. The integer promotion rules
will ensure that even negative values can be stored in unsigned long
min and max values successfully. Plus it is all bog standard C
so there is nothing special to learn.
There are a bunch of fiddly little details to transition from where we
are today. The most straightforward way I can see of making the
transition is to add the min and max members. Come up with replacements
for proc_doulongvec_minmax and proc_dointvec_minmax that read the new
min and max members. Update all of the users. Update the few users
that use extra1 or extra2 for something besides min and max. Then
remove extra1 and extra2. At the end it is simpler and requires the
same or a little less space.
That was and remains my suggestion.
Eric
Powered by blists - more mailing lists