linux-kernel - Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMuHMdWvW7hsUq68xuX-YNApk06zMMzRsHMCDCLcrsTiEUkuDg@mail.gmail.com>
Date: Mon, 6 Jan 2025 11:18:08 +0100
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: Koichiro Den <koichiro.den@...onical.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org, 
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, 
	Linux-Renesas <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()

Hi Koichiro,

On Sat, Jan 4, 2025 at 5:00 AM Koichiro Den <koichiro.den@...onical.com> wrote:
> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > > the dying cpu until the cpu is removed from online mask. While it's
> > > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > > kworker, it potentially runs vmstat_update for the dying CPU on an
> > > irrelevant cpu before entering atomic AP states.
> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > > backtrace.
> > >
> > >   BUG: using smp_processor_id() in preemptible [00000000] code: \
> > >                                                kworker/7:3/1702
> > >   caller is refresh_cpu_vm_stats+0x235/0x5f0
> > >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> > >   Tainted: [N]=TEST
> > >   Workqueue: mm_percpu_wq vmstat_update
> > >   Call Trace:
> > >    <TASK>
> > >    dump_stack_lvl+0x8d/0xb0
> > >    check_preemption_disabled+0xce/0xe0
> > >    refresh_cpu_vm_stats+0x235/0x5f0
> > >    vmstat_update+0x17/0xa0
> > >    process_one_work+0x869/0x1aa0
> > >    worker_thread+0x5e5/0x1100
> > >    kthread+0x29e/0x380
> > >    ret_from_fork+0x2d/0x70
> > >    ret_from_fork_asm+0x1a/0x30
> > >    </TASK>
> > >
> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > > symmetrically enable it on startup.
> > >
> > > Signed-off-by: Koichiro Den <koichiro.den@...onical.com>
> >
> > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> >
> > [    0.087733] ------------[ cut here ]------------
> > [    0.087733] workqueue: work disable count underflowed
> > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0

I am seeing the same on arm32 (R-Car M2-W) and arm64 (R-Car H3 ES2.0).

> Thank you for the report. I was able to reproduce the warning and now
> wonder how I missed it.. My oversight, apologies.
>
> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
>
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..19ceed5d34bf 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>    {
>           int cpu;
>
>   -       for_each_possible_cpu(cpu)
>   +       for_each_possible_cpu(cpu) {
>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                           vmstat_update);
>
>   +               /* will be enabled on vmstat_cpu_online */
>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       }
>   +
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>    }
>
> If you think of a better solution later, please let me know. Otherwise,
> I'll submit a follow-up fix patch with the above diff.

Thank you, that fixes the warnings for me!
Tested-by: Geert Uytterhoeven <geert+renesas@...der.be>

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds