lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Jul 2021 10:44:03 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Mike Galbraith <efault@....de>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     linux-rt-users@...r.kernel.org,
        Mel Gorman <mgorman@...hsingularity.net>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [rfc/patch] mm/slub: restore/expand unfreeze_partials() local
 exclusion scope

On 7/21/21 6:56 AM, Mike Galbraith wrote:
> On Tue, 2021-07-20 at 13:26 +0200, Mike Galbraith wrote:
>> On Tue, 2021-07-20 at 10:56 +0200, Vlastimil Babka wrote:
>> > > crash> bt -sx
>> > > PID: 18761  TASK: ffff88812fff0000  CPU: 0   COMMAND: "hackbench"
>> > >  #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f
>> > >  #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72
>> > >  #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10
>> > >  #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3
>> > >  #4 [ffff88818f8ffab8] exc_general_protection+0x195 at
>> > > ffffffff8179fdb5
>> > >  #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at
>> > > ffffffff81800a0e
>> > >     [exception RIP: __unfreeze_partials+156]
>> >
>> > Hm going back to this report...
>> > So could it be that it was stillput_cpu_partial() preempting
>> > __slab_alloc() messing the partial list, but for some reason the
>> > put_cpu_partial() side crashed this time?
>>
>> Thinking this bug is toast, I emptied the trash bin, so no can peek.
> 
> I made fireworks while waiting for bike riding time, boom #10 was
> finally the right flavor, but...
> 
> crash> bt -sx
> PID: 32     TASK: ffff888100a56000  CPU: 3   COMMAND: "rcuc/3"
>  #0 [ffff888100aa7a90] machine_kexec+0x14f at ffffffff81051c8f
>  #1 [ffff888100aa7ad8] __crash_kexec+0xd2 at ffffffff81120612
>  #2 [ffff888100aa7b98] crash_kexec+0x30 at ffffffff811213b0
>  #3 [ffff888100aa7ba8] oops_end+0xd3 at ffffffff810267e3
>  #4 [ffff888100aa7bc8] exc_general_protection+0x195 at ffffffff817a2cc5
>  #5 [ffff888100aa7c60] asm_exc_general_protection+0x1e at ffffffff81800a0e
>     [exception RIP: __unfreeze_partials+149]
>     RIP: ffffffff8124a295  RSP: ffff888100aa7d10  RFLAGS: 00010202
>     RAX: 0000000000190016  RBX: 0000000000190016  RCX: 000000017fffffff
>     RDX: 00000001ffffffff  RSI: 0000000000000023  RDI: ffffffff81e58b10
>     RBP: ffff888100aa7da0   R8: 0000000000000000   R9: 0000000000190018
>     R10: ffff888100aa7db8  R11: 000000000002d9e4  R12: ffff888100190500
>     R13: ffff88810018c980  R14: 00000001ffffffff  R15: ffffea0004571588
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffff888100aa7db0] put_cpu_partial+0x8e at ffffffff8124a56e
>  #7 [ffff888100aa7dd0] kmem_cache_free+0x3a8 at ffffffff8124d238
>  #8 [ffff888100aa7e08] rcu_do_batch+0x186 at ffffffff810eb246
>  #9 [ffff888100aa7e70] rcu_core+0x25f at ffffffff810eeb2f
> #10 [ffff888100aa7eb0] rcu_cpu_kthread+0x94 at ffffffff810eed24
> #11 [ffff888100aa7ee0] smpboot_thread_fn+0x249 at ffffffff8109e559
> #12 [ffff888100aa7f18] kthread+0x1ac at ffffffff810984dc
> #13 [ffff888100aa7f50] ret_from_fork+0x1f at ffffffff81001b1f
> crash> runq
> ...
> CPU 3 RUNQUEUE: ffff88840ece9980
>   CURRENT: PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
>   RT PRIO_ARRAY: ffff88840ece9bc0
>      [ 94] PID: 32     TASK: ffff888100a56000  COMMAND: "rcuc/3"
>   CFS RB_ROOT: ffff88840ece9a40
>      [120] PID: 33     TASK: ffff888100a51000  COMMAND: "ksoftirqd/3"
> ...
> crash> bt -sx 33
> PID: 33     TASK: ffff888100a51000  CPU: 3   COMMAND: "ksoftirqd/3"
>  #0 [ffff888100aabdf0] __schedule+0x2d7 at ffffffff817ad3a7
>  #1 [ffff888100aabec8] schedule+0x3b at ffffffff817ae4eb
>  #2 [ffff888100aabee0] smpboot_thread_fn+0x18c at ffffffff8109e49c
>  #3 [ffff888100aabf18] kthread+0x1ac at ffffffff810984dc
>  #4 [ffff888100aabf50] ret_from_fork+0x1f at ffffffff81001b1f
> crash>

So this doesn't look like our put_cpu_partial() preempted a __slab_alloc() on
the same cpu, right? There might have been __slab_alloc() in irq handler
preempting us, but we won't see that anymore. I don't immediately see the root
cause and this scenario should be possible on !RT too where we however didn't
see these explosions.

BTW did my ugly patch work?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ