lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 Mar 2019 16:27:54 -0500
From:   Barret Rhoden <brho@...gle.com>
To:     Dennis Zhou <dennis@...nel.org>
Cc:     Eial Czerwacki <eial@...lemp.com>, tj@...nel.org, cl@...ux.com,
        linux-kernel@...r.kernel.org, Shai Fultheim <shai@...lemp.com>,
        Oren Twaig <oren@...lemp.com>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff
 X86_VSMP is set

Hi -

On 03/01/2019 03:34 PM, Dennis Zhou wrote:
> Hi Barret,
> 
> On Fri, Mar 01, 2019 at 01:30:15PM -0500, Barret Rhoden wrote:
>> Hi -
>>
>> On 01/21/2019 06:47 AM, Eial Czerwacki wrote:
>>>
>>
>> Your main issue was that you only sent this patch to LKML, but not the
>> maintainers of the file.  If you don't, your patch might get lost.  To get
>> the appropriate people and lists, run:
>>
>> 	scripts/get_maintainer.pl YOUR_PATCH.patch.
>>
>> For this patch, you'll get this:
>>
>> Dennis Zhou <dennis@...nel.org> (maintainer:PER-CPU MEMORY ALLOCATOR)
>> Tejun Heo <tj@...nel.org> (maintainer:PER-CPU MEMORY ALLOCATOR)
>> Christoph Lameter <cl@...ux.com> (maintainer:PER-CPU MEMORY ALLOCATOR)
>> linux-kernel@...r.kernel.org (open list)
>>
>> I added the three maintainers to this email.
>>
>> I have a few minor comments below.
>>
>>> [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is
>> set
>>
>> You misspelled 'reservation'.  Also, I'd just say: "percpu: increase module
>> reservation size if X86_VSMP is set".  ('change' -> 'increase'), only says
>> 'reservation' once.)
>>
>>> as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
>>
>> I think you can add a tag for this right above your Signed-off-by tags.
>> e.g.:
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339
>>
>>> by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
>>> causing the struct size to exceed the size ok 8KB.
>>                                              ^of
>>
>> Which struct are you talking about?  I have one in mind, but others might
>> not know from reading the commit message.
>>
>> I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511. In
>> that case, it was because modules (drm and amdkfd) were using DEFINE_SRCU,
>> which does a DEFINE_PER_CPU on struct srcu_data, and that used
>> ____cacheline_internodealigned_in_smp.
>>
>>>
>>> in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
>>                                 ^increase
>>
>>>
>>> the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
>>                 ^calculated
>>
>>> for f in `find -name *.ko`; do echo $f; readelf -S $f  |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n  |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc
>>
>> Not sure how useful the one-liner is, versus a description of what you're
>> doing.  i.e. "the size of all module percpu data sections, or something."
>>
>> Also, how close was that calculated value to 64K?  If more modules start
>> using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.
>>
>> Thanks,
>> Barret
>>
>>> Signed-off-by: Eial Czerwacki <eial@...lemp.com>
>>> Signed-off-by: Shai Fultheim <shai@...lemp.com>
>>> Signed-off-by: Oren Twaig <oren@...lemp.com>
>>> ---
>>>    include/linux/percpu.h | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
>>> index 70b7123..6b79693 100644
>>> --- a/include/linux/percpu.h
>>> +++ b/include/linux/percpu.h
>>> @@ -14,7 +14,11 @@
>>>    /* enough to cover all DEFINE_PER_CPUs in modules */
>>>    #ifdef CONFIG_MODULES
>>> +#ifdef X86_VSMP
>>> +#define PERCPU_MODULE_RESERVE		(1 << 16)
>>> +#else
>>>    #define PERCPU_MODULE_RESERVE		(8 << 10)
>>> +#endif
>>>    #else
>>>    #define PERCPU_MODULE_RESERVE		0
>>>    #endif
>>>
>>
> 
> Thanks for sending this to me.
> 
> I must say, I really do not want to expand the reserved region. In most
> cases, it can easily end up unused and thus wasted memory as it is hard
> allocated on boot. This is done because code gen assumes static
> variables are close to the program counter. This would not be true with
> dynamic allocations which being at the end of the vmalloc area
> (Summarized from Tejun's account in [1]).
> 
> Another note on the reserved region. It starts at the end of the static
> region which means it generally isn't page aligned. So while an 8kb
> allocation would fit, a 4kb alignment more than likely would fail.
> Something as large as 8kb should probably be dynamically allocated as
> well.
 >
> I read through the bugzilla report and it seems that the culprits are:
>    drivers/gpu/drm/amd/amdkfd/kfd_process.c:DEFINE_SRCU(kfd_processes_srcu);
>    drivers/gpu/drm/drm_drv.c:DEFINE_STATIC_SRCU(drm_unplug_srcu);
> 
> Is there a reason we cannot dynamically initialize these structs? I've
> cced Paul McKenney because we saw an issue with ipmi in December [1].

I looked at the AMD driver, and it looks like they could dynamically 
initialize it.  It would require a little extra plumbing.  I imagine the 
DRM one is the same way.

To catch this in the future, should we disallow DEFINE_SRCU in modules 
or something?  Otherwise, this will pop up again the next time someone 
uses DEFINE_SRCU in a module and builds with CONFIG_X86_VSMP.

That might be a little much, and it still won't be sufficient to catch 
all cases.  This will also come up any time a module has a static 
per-cpu data structure that uses __cacheline_aligned_in_smp, so it's not 
limited to SRCU either.

I'm not familiar with VSMP - how bad is it to use L1 cache alignment 
instead of 4K page alignment?  Maybe some structures can use the smaller 
alignment?  Or maybe have VSMP require SRCU-using modules to be built-in?

Thanks,

Barret

Powered by blists - more mailing lists