linux-kernel - Re: linux-next: tracebacks in workqueue.c/__flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <72e7d782-85f2-b499-8614-9e3498106569@i-love.sakura.ne.jp>
Date:   Sun, 3 Feb 2019 10:21:06 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To:     Guenter Roeck <linux@...ck-us.net>,
        Chris Metcalf <chris.d.metcalf@...il.com>,
        Rusty Russell <rusty@...tcorp.com.au>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tejun Heo <tj@...nel.org>, linux-mm <linux-mm@...ck.org>
Subject: Re: linux-next: tracebacks in workqueue.c/__flush_work()

(Adding Chris Metcalf and Rusty Russell.)

If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
commits listed below.

  Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
  expects that has_work is evaluated by for_each_cpu().

  Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
  assumes that for_each_cpu() does not need to evaluate has_work.

  Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
  expects that has_work is evaluated by for_each_cpu().

What should we do? Do we explicitly evaluate has_mask if NR_CPUS == 1 ?

 mm/swap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/swap.c b/mm/swap.c
index 4929bc1..5f07734 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -698,7 +698,8 @@ void lru_add_drain_all(void)
 	}
 
 	for_each_cpu(cpu, &has_work)
-		flush_work(&per_cpu(lru_add_drain_work, cpu));
+		if (NR_CPUS > 1 || cpumask_test_cpu(cpu, &has_work))
+			flush_work(&per_cpu(lru_add_drain_work, cpu));
 
 	mutex_unlock(&lock);
 }

On 2019/02/03 7:20, Guenter Roeck wrote:
> Commit "workqueue: Try to catch flush_work() without INIT_WORK()" added
> a warning if flush_work() is called without worker function.
> 
> This results in the following tracebacks, typically observed during
> system shutdown.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 101 at kernel/workqueue.c:3018 __flush_work+0x2a4/0x2e0
> Modules linked in:
> CPU: 0 PID: 101 Comm: umount Not tainted 5.0.0-rc4-next-20190201 #1
>        fffffc0007dcbd18 0000000000000000 fffffc00003338a0 fffffc00003517d4
>        fffffc00003517d4 fffffc0000e56c98 fffffc0000e56c98 fffffc0000ebc1d8
>        fffffc0000ec0bd8 ffffffffa8024010 0000000000000bca 0000000000000000
>        fffffc00003d3ea4 fffffc0000e56c98 fffffc0000e56c60 fffffc0000ebc1d8
>        fffffc0000ec0bd8 0000000000000000 0000000000000001 0000000000000000
>        fffffc000782d520 0000000000000000 fffffc000044ef50 fffffc0007c4b540
> Trace:
> [<fffffc00003338a0>] __warn+0x160/0x190
> [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0
> [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0
> [<fffffc00003d3ea4>] lru_add_drain_all+0xe4/0x190
> [<fffffc000044ef50>] shrink_dcache_sb+0x70/0xb0
> [<fffffc0000478dc4>] invalidate_bh_lru+0x44/0x80
> [<fffffc00003a94fc>] on_each_cpu_cond+0x5c/0x90
> [<fffffc0000478d80>] invalidate_bh_lru+0x0/0x80
> [<fffffc000047fe7c>] invalidate_bdev+0x3c/0x70
> [<fffffc0000432ca8>] reconfigure_super+0x178/0x2c0
> [<fffffc000045ee64>] ksys_umount+0x664/0x680
> [<fffffc000045ee9c>] sys_umount+0x1c/0x30
> [<fffffc00003115d4>] entSys+0xa4/0xc0
> [<fffffc00003115d4>] entSys+0xa4/0xc0
> 
> ---[ end trace 613cea34708701f1 ]---
> 
> The problem is seen with several (but not all) architectures. Affected
> architectures/platforms are:
>     alpha
>     arm:versatilepb
>     m68k
>     mips, mips64 (boot from IDE drive or MMC, SMP disabled)
>     parisc (nosmp builds)
>     sparc, sparc64 (nosmp builds)
> 
> There may be others; several of my tests fail with build failures.
> 
> If/when it is seen, the problem is persistent.
> 
> Common denominator seems to be that SMP is disabled. It does appear that
> for_each_cpu() ignores the mask for nosmp builds, but I don't really
> understand why.
> 
> Guenter
>