lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YgvTZHwKkGcUgWrL@alley>
Date:   Tue, 15 Feb 2022 17:23:00 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <senozhatsky@...omium.org>
Cc:     Guanghui Feng <guanghuifeng@...ux.alibaba.com>,
        rostedt@...dmis.org, john.ogness@...utronix.de,
        keescook@...omium.org, anton@...msg.org, ccross@...roid.com,
        tony.luck@...el.com, linux-kernel@...r.kernel.org,
        baolin.wang@...ux.alibaba.com, yaohongbo@...ux.alibaba.com,
        zhangliguang@...ux.alibaba.com, zhuo.song@...ux.alibaba.com
Subject: Re: [PATCH] printk: fix softlockup/rcu stall warning without setting
 CONFIG_PREEMPTION

On Mon 2022-02-14 12:06:53, Sergey Senozhatsky wrote:
> On (22/02/12 21:27), Guanghui Feng wrote:
> >     console_unlock+0x220/0x420
> >     vprintk_emit+0x17c/0x1ac
> >     vprintk_default+0x3c/0x44
> >     vprintk+0x38/0x70
> >     printk+0x64/0x88
> >     dump_task.part.0+0xc4/0xe0
> >     dump_task+0x70/0x74
> >     dump_tasks+0x78/0x90
> >     dump_global_header+0xcc/0xe8
> >     oom_kill_process+0x258/0x274
> >     out_of_memory.part.0+0xb0/0x33c
> >     out_of_memory+0x4c/0xa0
> >     __alloc_pages_may_oom+0x11c/0x1a0
> >     __alloc_pages_slowpath.constprop.0+0x4c0/0x75c
> >     __alloc_pages_nodemask+0x2b4/0x310
> >     alloc_pages_current+0x8c/0x140
> >     get_zeroed_page+0x20/0x50
> >     __pud_alloc+0x40/0x190
> >     copy_pud_range+0x264/0x280
> >     copy_page_range+0xe8/0x204
> >     dup_mmap+0x334/0x434
> >     dup_mm+0x64/0x11c
> >     copy_process+0x5e0/0x11a0
> >     kernel_clone+0x94/0x364
> >     __do_sys_clone+0x54/0x80
> >     __arm64_sys_clone+0x24/0x30
> >     el0_svc_common.constprop.0+0x7c/0x210
> >     do_el0_svc+0x74/0x90
> >     el0_svc+0x24/0x60
> >     el0_sync_handler+0xa8/0xb0
> >     el0_sync+0x140/0x180
> 
> [..]
> 
> > @@ -2716,7 +2716,11 @@ void console_unlock(void)
> >  		if (handover)
> >  			return;
> >
> > +#ifndef CONFIG_PREEMPTION
> > +		if (do_cond_resched || need_resched())
> > +#else
> >  		if (do_cond_resched)
> > +#endif
> >  			cond_resched();
> 
> console_unlock() can be called from atomic context, it should not schedule
> when it cannot do so. That's what console_may_schedule indicates.

Yup. It is even confirmed by the kernel test robot.

Alternative workaround is to disable rcu stall reports in
console_unlock, see the discussion at
https://lore.kernel.org/r/20211111195904.618253-2-wander@redhat.com

I personally do not like these approaches because it hides a problem.
The stall is real. The right solution is to remove the stall by
offloading the console handling into preemptive context (kthreads).
The last version of this approach is discussed at
https://lore.kernel.org/r/20220207194323.273637-1-john.ogness@linutronix.de
I prefer to go this way because it solves the root of the problem.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ