linux-kernel - Re: [BUG] workqueues and printk not playing nice since next-20240130

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a8386e9d-39f6-4bd5-8329-30550fb2745a@paulmck-laptop>
Date: Mon, 5 Feb 2024 11:41:10 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Tejun Heo <tj@...nel.org>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Petr Mladek <pmladek@...e.com>,
	Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	John Ogness <john.ogness@...utronix.de>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [BUG] workqueues and printk not playing nice since next-20240130

On Mon, Feb 05, 2024 at 07:46:48AM -1000, Tejun Heo wrote:
> On Mon, Feb 05, 2024 at 09:45:53AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 05, 2024 at 10:25:15PM +0900, Sergey Senozhatsky wrote:
> > > On (24/02/05 14:07), Petr Mladek wrote:
> > > > > Good point, if it does recur, I could try it on bare metal.
> > > > 
> > > > Please, me, John, and Sergey know if anyone see this again. I do not
> > > > feel comfortable when there is problem which might make consoles calm.
> > > 
> > > Agreed.
> > > 
> > > > Bisection identified this commit:
> > > > 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
> > > 
> > > That commit triggered early boot use-after-free (per kasan) on
> > > my system, which probably could derail some things.
> > 
> > And enabling KASAN on next-20240130 got me that same KASAN report and
> > also suppressed the misbehavior, which is not surprising given that
> > KASAN quarantines free memory for some time.  Plus enabling KASAN
> > on recent -next does not trigger that KASAN report.
> > 
> > So my guess is that we can attribute my oddball test failures to
> > that use after free.  But I will of course continue testing.
> 
> Can someone paste the KASAN report?

Here you go!

							Thanx, Paul

------------------------------------------------------------------------

[    0.316453] ==================================================================
[    0.317646] BUG: KASAN: use-after-free in wq_update_node_max_active+0x123/0x810
[    0.318851] Read of size 8 at addr ffff88802109d788 by task swapper/0/0
[    0.319937] 
[    0.320195] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240130 #7935
[    0.321453] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[    0.323299] Call Trace:
[    0.323700]  <TASK>
[    0.324043]  dump_stack_lvl+0x37/0x50
[    0.324653]  print_report+0xcb/0x620
[    0.325249]  ? wq_update_node_max_active+0x123/0x810
[    0.326066]  kasan_report+0xaf/0xe0
[    0.326639]  ? wq_update_node_max_active+0x123/0x810
[    0.327455]  kasan_check_range+0x39/0x1c0
[    0.328119]  wq_update_node_max_active+0x123/0x810
[    0.328903]  ? __pfx_mutex_lock+0x10/0x10
[    0.329567]  apply_wqattrs_commit+0x4e4/0xb80
[    0.330289]  ? __pfx_mutex_lock+0x10/0x10
[    0.330946]  apply_workqueue_attrs_locked+0x9e/0x110
[    0.331764]  alloc_workqueue+0xf76/0x18d0
[    0.332432]  ? __pfx_alloc_workqueue+0x10/0x10
[    0.333189]  ? kasan_unpoison+0x27/0x60
[    0.333818]  ? kasan_unpoison+0x27/0x60
[    0.334455]  ? __kasan_slab_alloc+0x30/0x70
[    0.335147]  ? __pfx_mutex_unlock+0x10/0x10
[    0.335831]  ? idr_alloc_u32+0x291/0x2c0
[    0.336479]  ? mutex_unlock+0x7e/0xd0
[    0.337085]  workqueue_init_early+0x69a/0xe70
[    0.337800]  ? __pfx_workqueue_init_early+0x10/0x10
[    0.338605]  ? kmem_cache_create_usercopy+0xcc/0x230
[    0.339421]  start_kernel+0x141/0x380
[    0.340023]  x86_64_start_reservations+0x18/0x30
[    0.340788]  x86_64_start_kernel+0xcf/0xe0
[    0.341465]  secondary_startup_64_no_verify+0x16d/0x17b
[    0.342334]  </TASK>
[    0.342703] 
[    0.342954] The buggy address belongs to the physical page:
[    0.343899] page:00000000a19a7ad3 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2109d
[    0.345471] flags: 0x100000000000000(node=0|zone=1)
[    0.346297] page_type: 0xffffffff()
[    0.346882] raw: 0100000000000000 ffffea0000842748 ffffea0000842748 0000000000000000
[    0.348184] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    0.349518] page dumped because: kasan: bad access detected
[    0.350457] 
[    0.350706] Memory state around the buggy address:
[    0.351532]  ffff88802109d680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.352748]  ffff88802109d700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.353968] >ffff88802109d780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.355221]                       ^
[    0.355808]  ffff88802109d800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.357161]  ffff88802109d880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[    0.358439] ==================================================================