lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 29 Jun 2020 15:01:05 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Zenghui Yu <yuzenghui@...wei.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        wanghaibin.wang@...wei.com, kuhn.chenqun@...wei.com,
        wangjingyi11@...wei.com
Subject: Re: [BUG] irqchip/gic-v4.1: sleeping function called from invalid
 context

Hi Zenghui,

On 2020-06-29 10:39, Zenghui Yu wrote:
> Hi All,
> 
> Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 
> enabled
> box, I get the following kernel splat:
> 
> [    0.053766] BUG: sleeping function called from invalid context at
> mm/slab.h:567
> [    0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0,
> pid: 0, name: swapper/1
> [    0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23
> [    0.053770] Call trace:
> [    0.053774]  dump_backtrace+0x0/0x218
> [    0.053775]  show_stack+0x2c/0x38
> [    0.053777]  dump_stack+0xc4/0x10c
> [    0.053779]  ___might_sleep+0xfc/0x140
> [    0.053780]  __might_sleep+0x58/0x90
> [    0.053782]  slab_pre_alloc_hook+0x7c/0x90
> [    0.053783]  kmem_cache_alloc_trace+0x60/0x2f0
> [    0.053785]  its_cpu_init+0x6f4/0xe40
> [    0.053786]  gic_starting_cpu+0x24/0x38
> [    0.053788]  cpuhp_invoke_callback+0xa0/0x710
> [    0.053789]  notify_cpu_starting+0xcc/0xd8
> [    0.053790]  secondary_start_kernel+0x148/0x200
> 
> # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40
> its_cpu_init+0x6f4/0xe40:
> allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818
> (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138
> (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166

Let me guess: a system with more than a single CommonLPIAff group?

> I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate 
> memory
> in this atomic context, and the splat disappears. But after a quick 
> look
> at [*], it seems not a good idea to allocate memory within the CPU
> hotplug notifier. I really don't know much about it, please have a 
> look.
> 
> [*]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f

The allocation of the cpumask is pretty benign, and could either be
allocated upfront for all RDs (and freed on detecting that we share
the same CommonLPIAff group) or made atomic.

The much bigger issue is the alloc_pages call just after. Allocating 
this
upfront probably is the wrong thing to do, as you are likely to allocate
way too much memory, even if you free it quickly afterwards.

At this stage, I'd rather we turn this into an atomic allocation. A 
notifier
is just another atomic context, and if this fails at such an early 
stage,
then the CPU is unlikely to continue booting...

Would you like to write a patch for this? Given that you have tested
something, it probably already exists. Or do you want me to do it?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ