[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ys7EKngGeFP0eiXt@lorien.usersys.redhat.com>
Date: Wed, 13 Jul 2022 09:10:02 -0400
From: Phil Auld <pauld@...hat.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: linux-kernel@...r.kernel.org,
"Rafael J . Wysocki" <rafael@...nel.org>,
Tian Tao <tiantao6@...ilicon.com>,
Barry Song <song.bao.hua@...ilicon.com>
Subject: Re: [PATCH] drivers/base/node.c: fix userspace break from using
bin_attributes for cpumap and cpulist
On Wed, Jul 13, 2022 at 07:48:00AM -0400 Phil Auld wrote:
> Hi Greg,
>
> On Wed, Jul 13, 2022 at 08:06:02AM +0200 Greg Kroah-Hartman wrote:
> > On Tue, Jul 12, 2022 at 05:43:01PM -0400, Phil Auld wrote:
> > > Using bin_attributes with a 0 size causes fstat and friends to return that 0 size.
> > > This breaks userspace code that retrieves the size before reading the file. Rather
> > > than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size
> > > limitation of cpumap ABI") let's put in a size value at compile time. Use direct
> > > comparison and a worst-case maximum to ensure compile time constants. For cpulist the
> > > max is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1) which for 8192 is 40960.
> > > In order to get near that you'd need a system with every other CPU on one node or
> > > something similar. e.g. (0,2,4,... 1024,1026...). We set it to a min of PAGE_SIZE
> > > to retain the older behavior. For cpumap, PAGE_SIZE is plenty big.
> >
> > Does userspace care about that size, or can we just put any value in
> > there and it will be ok? How about just returning to the original
> > PAGE_SIZE value to keep things looking identical, will userspace not
> > read more than that size from the file then?
> >
>
> I'll go look. But I think the point of pre-reading the size with fstat is to allocate
> a buffer to read into. So that may be a problem.
>
>From what I understand the app does use the size from fstat to allocate a buffer.
It also does a lseek to the end and back. This is actaully where it breaks when it gets a 0.
This is before:
8747 10:20:32.584460 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=4096, ...}) = 0 <0.000006>
8747 10:20:32.584502 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=4096, ...}) = 0 <0.000005>
8747 10:20:32.584537 lseek(6</sys/devices/system/node/node0/cpulist>, 4096, SEEK_SET) = 4096 <0.000005>
8747 10:20:32.584560 lseek(6</sys/devices/system/node/node0/cpulist>, 0, SEEK_SET) = 0 <0.000005>
8747 10:20:32.584585 read(6</sys/devices/system/node/node0/cpulist>, "0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46\n", 4096) = 67 <0.000008>
8747 10:20:32.584617 read(6</sys/devices/system/node/node0/cpulist>, "", 4096) = 0 <0.000005>
(I'll note here their system does seem to have a worst case cpu to node list too)
And with the bin_attr change:
8871 09:27:41.034279 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=0, ...}) = 0 <0.000009>
8871 09:27:41.034330 fstat(6</sys/devices/system/node/node0/cpulist>, {..., st_size=0, ...}) = 0 <0.000010>
8871 09:27:41.034377 lseek(6</sys/devices/system/node/node0/cpulist>, 0, SEEK_SET) = 0 <0.000008>
8871 09:27:41.034410 lseek(6</sys/devices/system/node/node0/cpulist>, 0, SEEK_SET) = 0 <0.000008>
And here it spins in a loop.
> That said, I believe in this case it's the cpulist file which given the use of ranges
> is very unlikely to actually get that big.
>
> > > On an 80 cpu 4-node sytem (NR_CPUS == 8192)
> >
> > We have systems running Linux with many more cpus than that, and your
> > company knows this :)
>
> The 80 cpus here don't matter and we only build with NR_CPUS = 8192 :)
>
> But yes, I realize now that the cpumap part I posted is broken for larger
> NR_CPUS. I originally had it as NR_CPUS, but as I said in my reply to Barry,
> it wants to be ~= NR_CPUS/4 + NR_CPUS/32. I'll change that.
>
> I think we should decide on a max for each and use that.
>
What values of NR_CPUS are people actually using when they build kernels? Barry mentioned cpuid 100000 for
example. I'm not sure if that was real or just illustrating the need for more characters.
I've modified my patch to use NR_CPUS/2 for the cpumap which should be plenty. And to use NR_CPUS*6 for
the cpulist, which covers up to 99999 cpus safely.
Cheers,
Phil
> Cheers,
> Phil
>
> >
> > thanks,
> >
> > greg k-h
> >
>
> --
--
Powered by blists - more mailing lists