linux-kernel - Re: [syzbot] WARNING in c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23488f06-c4b4-8bd8-b0bc-85914ba4d1c6@I-love.SAKURA.ne.jp>
Date:   Sun, 16 Oct 2022 09:24:57 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Yury Norov <yury.norov@...il.com>, Borislav Petkov <bp@...en8.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     syzbot <syzbot+d0fd2bf0dd6da72496dd@...kaller.appspotmail.com>,
        syzkaller-bugs@...glegroups.com,
        Andrew Jones <ajones@...tanamicro.com>, netdev@...r.kernel.org,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Menglong Dong <imagedong@...cent.com>,
        Kuniyuki Iwashima <kuniyu@...zon.com>,
        Petr Machata <petrm@...dia.com>,
        Guo Ren <guoren@...ux.alibaba.com>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        andriy.shevchenko@...ux.intel.com, linux@...musvillemoes.dk,
        caraitto@...gle.com, willemb@...gle.com, jonolson@...gle.com,
        amritha.nambiar@...el.com, linux-kernel@...r.kernel.org
Subject: Re: [syzbot] WARNING in c_start

On 2022/10/16 5:44, Yury Norov wrote:
> Add people from other threads discussing this.
> 
> On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
>> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
>>> That's an invalid command line. The correct syntax is:
>>>
>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>
>> The fix is not in Linus' tree yet.
>>
>>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
>>> other architectures have the same problem. And fixing all callers will
>>> not be in time for this merge window.
>>
>> Why won't there be time? That's why the -rcs are for.
>>
>> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
>>
>> So no, we will take Andrew's fixes for all arches in time for 6.1.
> 
> Summarizing things:
> 
> 1. cpumask_check() was introduced to make sure that the cpu number
> passed into cpumask API belongs to a valid range. But the check is
> broken for a very long time. And because of that there are a lot of
> places where cpumask API is used wrongly.
> 
> 2. Underlying bitmap functions handle that correctly - when user
> passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
> what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
> everything is working smoothly.
> 
> 3. I fixed all warnings that I was aware at the time of submitting the
> patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
> netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
> cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
> proposed fixes for c_start() in arch code.
> 
> 4. The code paths mentioned above are all known to me that violate
> cpumask_check() rules. (Did I miss something?)
> 
> With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
> me about this problem with c_start(). But I don't like the idea to revert
> cpumask_check() fix. This way we'll never clean that mess. 
> 
> If for some reason those warnings are unacceptable for -rcs (and like
> Boris, I don't understand why), than instead of reverting commits, I'd
> suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
> config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
> fix their code. I can send a patch shortly, if we'll decide going this way.
> 
> How people would even realize that they're doing something wrong if
> they will not get warned about it?

I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
Just printing messages (without "BUG:"/"WARNING:" string which also breaks
syzkaller) like below diff is sufficient for people to realize that they're
doing something wrong.

Again, please do revert "cpumask: fix checking valid cpu range" immediately.

 include/linux/cpumask.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c2aa0aa26b45..31af2cc5f0c2 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -118,6 +118,18 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu)
 	return cpu;
 }
 
+/*
+ * We want to avoid passing -1 as a valid cpu argument.
+ * But we should not crash the kernel until all in-tree callers are fixed.
+ */
+static __always_inline void report_negative_cpuid(void)
+{
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+	pr_warn_once("FIXME: Passing -1 as CPU argument needs to be avoided.\n");
+	DO_ONCE_LITE(dump_stack);
+#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+}
+
 /**
  * cpumask_first - get the first cpu in a cpumask
  * @srcp: the cpumask pointer
@@ -177,6 +189,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp)
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
 }
 
@@ -192,6 +206,8 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
 }
 
@@ -234,6 +250,8 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
 		nr_cpumask_bits, n + 1);
 }
@@ -265,6 +283,8 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo
 	cpumask_check(start);
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 
 	/*
 	 * Return the first available CPU when wrapping, or when starting before cpu0,
-- 
2.18.4

> 
> Thanks,
> Yury