lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m2msg4u1lv.fsf@posteo.net>
Date: Mon, 06 Jan 2025 02:18:04 +0000
From: Charalampos Mitrodimas <charmitro@...teo.net>
To: Koichiro Den <koichiro.den@...onical.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,  linux-mm@...ck.org,
  akpm@...ux-foundation.org,  linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()

Koichiro Den <koichiro.den@...onical.com> writes:

> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
>> On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
>> > Even after mm/vmstat:online teardown, shepherd may still queue work for
>> > the dying cpu until the cpu is removed from online mask. While it's
>> > quite rare, this means that after unbind_workers() unbinds a per-cpu
>> > kworker, it potentially runs vmstat_update for the dying CPU on an
>> > irrelevant cpu before entering atomic AP states.
>> > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
>> > backtrace.
>> >
>> >   BUG: using smp_processor_id() in preemptible [00000000] code: \
>> >                                                kworker/7:3/1702
>> >   caller is refresh_cpu_vm_stats+0x235/0x5f0
>> >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
>> >   Tainted: [N]=TEST
>> >   Workqueue: mm_percpu_wq vmstat_update
>> >   Call Trace:
>> >    <TASK>
>> >    dump_stack_lvl+0x8d/0xb0
>> >    check_preemption_disabled+0xce/0xe0
>> >    refresh_cpu_vm_stats+0x235/0x5f0
>> >    vmstat_update+0x17/0xa0
>> >    process_one_work+0x869/0x1aa0
>> >    worker_thread+0x5e5/0x1100
>> >    kthread+0x29e/0x380
>> >    ret_from_fork+0x2d/0x70
>> >    ret_from_fork_asm+0x1a/0x30
>> >    </TASK>
>> >
>> > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
>> > symmetrically enable it on startup.
>> >
>> > Signed-off-by: Koichiro Den <koichiro.den@...onical.com>
>>
>> Hi,
>>
>> I observed a warning in my qemu and real hardware, which I bisected to this commit:
>>
>> [    0.087733] ------------[ cut here ]------------
>> [    0.087733] workqueue: work disable count underflowed
>> [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0

I also encountered this in my QEMU Debian installation.

>>
>> This is:
>>
>> static void work_offqd_enable(struct work_offq_data *offqd)
>> {
>> 	if (likely(offqd->disable > 0))
>> 		offqd->disable--;
>> 	else
>> 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
>> }
>>
>> So (based on this code) presumably an enable is only required if previously
>> disabled, and this code is being called on startup unconditionally without
>> the work having been disabled previously? I'm not hugely familiar with
>> delayed workqueue implementation details.
>>
>> [    0.087733] Modules linked in:
>> [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
>> [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>> [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
>> [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
>> [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
>> [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
>> [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
>> [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
>> [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
>> [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
>> [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
>> [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
>> [    0.087733] PKRU: 55555554
>> [    0.087733] Call Trace:
>> [    0.087733]  <TASK>
>> [    0.087733]  ? enable_work+0xb5/0xc0
>> [    0.087733]  ? __warn.cold+0x93/0xf2
>> [    0.087733]  ? enable_work+0xb5/0xc0
>> [    0.087733]  ? report_bug+0xff/0x140
>> [    0.087733]  ? handle_bug+0x54/0x90
>> [    0.087733]  ? exc_invalid_op+0x17/0x70
>> [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
>> [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
>> [    0.087733]  ? enable_work+0xb5/0xc0
>> [    0.087733]  vmstat_cpu_online+0x5c/0x70
>> [    0.087733]  cpuhp_invoke_callback+0x133/0x440
>> [    0.087733]  cpuhp_thread_fun+0x95/0x150
>> [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
>> [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
>> [    0.087735]  kthread+0xc8/0xf0
>> [    0.087737]  ? __pfx_kthread+0x10/0x10
>> [    0.087738]  ret_from_fork+0x2c/0x50
>> [    0.087739]  ? __pfx_kthread+0x10/0x10
>> [    0.087740]  ret_from_fork_asm+0x1a/0x30
>> [    0.087742]  </TASK>
>> [    0.087742] ---[ end trace 0000000000000000 ]---
>>
>>
>> > ---
>> > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
>> > ---
>> >  mm/vmstat.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/vmstat.c b/mm/vmstat.c
>> > index 4d016314a56c..0889b75cef14 100644
>> > --- a/mm/vmstat.c
>> > +++ b/mm/vmstat.c
>> > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
>> >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
>> >  		node_set_state(cpu_to_node(cpu), N_CPU);
>> >  	}
>> > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
>>
>> Probably needs to be 'if disabled' here, as this is invoked on normal
>> startup when the work won't have been disabled?
>>
>> Had a brief look at code and couldn't see how that could be done
>> however... and one would need to be careful about races... Tricky!
>>
>> >
>> >  	return 0;
>> >  }
>> >
>> >  static int vmstat_cpu_down_prep(unsigned int cpu)
>> >  {
>> > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>> > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>> >  	return 0;
>> >  }
>> >
>> > --
>> > 2.43.0
>> >
>> >
>>
>> Let me know if you need any more details, .config etc.
>>
>> I noticed this warning on a real box too (in both cases running akpm's
>> mm-unstable branch), FWIW.
>
> Thank you for the report. I was able to reproduce the warning and now
> wonder how I missed it.. My oversight, apologies.
>
> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
>
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..19ceed5d34bf 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>    {
>           int cpu;
>
>   -       for_each_possible_cpu(cpu)
>   +       for_each_possible_cpu(cpu) {
>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                           vmstat_update);
>
>   +               /* will be enabled on vmstat_cpu_online */
>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       }
>   +
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>    }
>
> If you think of a better solution later, please let me know. Otherwise,
> I'll submit a follow-up fix patch with the above diff.

Can't think of a better solution myself but this fixes the issue. Thanks!


>
> Thanks.
>
> -Koichiro

C. Mitrodimas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ