lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YtATLNvojuvOOmys@lorien.usersys.redhat.com>
Date:   Thu, 14 Jul 2022 08:59:24 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Barry Song <21cnbao@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Tian Tao <tiantao6@...ilicon.com>
Subject: Re: [PATCH v3] drivers/base/node.c: fix userspace break from using
 bin_attributes for cpumap and cpulist

On Thu, Jul 14, 2022 at 12:23:01PM +1200 Barry Song wrote:
> On Thu, Jul 14, 2022 at 6:38 AM Phil Auld <pauld@...hat.com> wrote:
> >
> > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > This breaks userspace code that retrieves the size before reading the file. Rather
> > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960
> > (8192 * 5). In order to get near that you'd need a system with every other CPU on one
> > node or something similar. e.g. (0,2,4,... 1024,1026...). To simplify the math and
> > support larger NR_CPUS we are using NR_CPUS * 7 to support a future with much larger NR_CPUS.
> > We also set it to a min of PAGE_SIZE to retain the older behavior for smaller NR_CPUS.
> > The cpumap file wants to be something like NR_CPUS/4 + NR_CPUS/32, for the ","s so for
> > simplicity we are using NR_CPUS/2.
> >
> > On an 80 cpu 4-node sytem (NR_CPUS == 8192)
> >
> > before:
> >
> > -r--r--r--. 1 root root 0 Jul 12 14:08 /sys/devices/system/node/node0/cpulist
> > -r--r--r--. 1 root root 0 Jul 11 17:25 /sys/devices/system/node/node0/cpumap
> >
> > after:
> >
> > -r--r--r--. 1 root root 57344 Jul 13 11:32 /sys/devices/system/node/node0/cpulist
> > -r--r--r--. 1 root root  4096 Jul 13 11:31 /sys/devices/system/node/node0/cpumap
> >
> > NR_CPUS = 16384
> > -r--r--r--. 1 root root 114688 Jul 13 14:03 /sys/devices/system/node/node0/cpulist
> > -r--r--r--. 1 root root   8192 Jul 13 14:02 /sys/devices/system/node/node0/cpumap
> >
> > Fixes: 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
> > Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > Cc: "Rafael J. Wysocki" <rafael@...nel.org>
> > Signed-off-by: Phil Auld <pauld@...hat.com>
> > ---
> >  drivers/base/node.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > index 0ac6376ef7a1..89c932a1d8ca 100644
> > --- a/drivers/base/node.c
> > +++ b/drivers/base/node.c
> > @@ -45,7 +45,11 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj,
> >         return n;
> >  }
> >
> > -static BIN_ATTR_RO(cpumap, 0);
> > +/* Report a valid max size for this file to avoid breaking userspace. We use NR_CPUS/2 as
> > + * a simplification of NR_CPUS/8 + NR_CPUS/32.  Use PAGE_SIZE as a minimum for smaller
> > + * configurations.
> > + */
> > +static BIN_ATTR_RO(cpumap, (((NR_CPUS >> 1) > PAGE_SIZE) ? NR_CPUS >> 1 : PAGE_SIZE));
> 
> the code should be fine. but the comment seems to be wrong?
> 
> /$ cat /sys/devices/system/node/node0/cpumap
> 00000000,00000000,00000000,000000ff
> 
> 4 cpus need one byte in hex, 32 cpus need a comma.
> for 32cpus, we totally need 9 bytes.
> 
> Based on your comment, you get 32/8+32/32=5.
> should be NR_CPUS/4 ?
>

Yes, sorry. Meant /4 as in the commit message.  I'll fix that.


> >
> >  static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> >                                    struct bin_attribute *attr, char *buf,
> > @@ -66,7 +70,15 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj,
> >         return n;
> >  }
> >
> > -static BIN_ATTR_RO(cpulist, 0);
> > +/* Report a valid maximum size for this file since 0 breaks userspace, which
> > + * may use the size from fstat to allocate a read buffer.
> > + * The value 7 is a hardcoded version of ceil(log10(NR_CPUS)) + 1 for future values
> > + * of NR_CPUS that may be upto 2 orders of magnitude larger than 8192.
> > + * In a worst case system every other cpu is on one of two nodes. This leads to
> > + * a file like "0,2,4,6,8...1024,...8190,...". Use PAGE_SIZE as a minimum for smaller
> > + * NR_CPUS.
> > +*/
> > +static BIN_ATTR_RO(cpulist, (((NR_CPUS * 7) > PAGE_SIZE) ? NR_CPUS * 7 : PAGE_SIZE));
> >
> 
> It seems to be very sufficient. At least, my poor math tells me 7
> bytes can describe cpu id like
> "100000," and up to "999999,"
> but it is still hard for me to understand the comments :-)
>

I picked 7 based on Greg saying there might be systems with 2 orders of magnitude more
than 8192 cpus. Personally I think lock contention and percpu data will start to be
a problem before that. I couldn't get x86 to build with more than NR_CPUS=16k. But it
allows for future expansion.

What would you like the comment to say that makes more sense to you? Should I put
some of those really large cpuids in the worst case example? Take that out completely?


> btw, we have a lot of other places which might need this, such as
> drivers/base/topology.c
> 
> so perhaps we can move them to some common place,
> 
> #define cpu_bitmap_bytes  (((NR_CPUS >> 1) > PAGE_SIZE) ? NR_CPUS >> 1
> : PAGE_SIZE)
> #define cpu_list_bytes  (((NR_CPUS * 7) > PAGE_SIZE) ? NR_CPUS * 7 : PAGE_SIZE)
> 
> is include/linux/cpumask.h a good place for it?

My concern is the ones that are breaking actual userspace code. But yes, those
otherwise have the same 0 size. 

It seems somewhat specific to drivers/base. Maybe there's a less global place to
put those closer. I can look and do it this way if that will help get it fixed.


Cheers,
Phil

> 
> >  /**
> >   * struct node_access_nodes - Access class device to hold user visible
> > --
> > 2.31.1
> >
> 
> Thanks
> Barry
> 

-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ