[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdWvW7hsUq68xuX-YNApk06zMMzRsHMCDCLcrsTiEUkuDg@mail.gmail.com>
Date: Mon, 6 Jan 2025 11:18:08 +0100
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: Koichiro Den <koichiro.den@...onical.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-mm@...ck.org,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
Linux-Renesas <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
Hi Koichiro,
On Sat, Jan 4, 2025 at 5:00 AM Koichiro Den <koichiro.den@...onical.com> wrote:
> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > > the dying cpu until the cpu is removed from online mask. While it's
> > > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > > kworker, it potentially runs vmstat_update for the dying CPU on an
> > > irrelevant cpu before entering atomic AP states.
> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > > backtrace.
> > >
> > > BUG: using smp_processor_id() in preemptible [00000000] code: \
> > > kworker/7:3/1702
> > > caller is refresh_cpu_vm_stats+0x235/0x5f0
> > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> > > Tainted: [N]=TEST
> > > Workqueue: mm_percpu_wq vmstat_update
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x8d/0xb0
> > > check_preemption_disabled+0xce/0xe0
> > > refresh_cpu_vm_stats+0x235/0x5f0
> > > vmstat_update+0x17/0xa0
> > > process_one_work+0x869/0x1aa0
> > > worker_thread+0x5e5/0x1100
> > > kthread+0x29e/0x380
> > > ret_from_fork+0x2d/0x70
> > > ret_from_fork_asm+0x1a/0x30
> > > </TASK>
> > >
> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > > symmetrically enable it on startup.
> > >
> > > Signed-off-by: Koichiro Den <koichiro.den@...onical.com>
> >
> > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> >
> > [ 0.087733] ------------[ cut here ]------------
> > [ 0.087733] workqueue: work disable count underflowed
> > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0
I am seeing the same on arm32 (R-Car M2-W) and arm64 (R-Car H3 ES2.0).
> Thank you for the report. I was able to reproduce the warning and now
> wonder how I missed it.. My oversight, apologies.
>
> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 0889b75cef14..19ceed5d34bf 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
> {
> int cpu;
>
> - for_each_possible_cpu(cpu)
> + for_each_possible_cpu(cpu) {
> INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> vmstat_update);
>
> + /* will be enabled on vmstat_cpu_online */
> + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> + }
> +
> schedule_delayed_work(&shepherd,
> round_jiffies_relative(sysctl_stat_interval));
> }
>
> If you think of a better solution later, please let me know. Otherwise,
> I'll submit a follow-up fix patch with the above diff.
Thank you, that fixes the warnings for me!
Tested-by: Geert Uytterhoeven <geert+renesas@...der.be>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Powered by blists - more mailing lists