netdev - Re: [syzbot] WARNING in c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f3097cb-834d-4fcc-9d96-999ddeb4045e@infradead.org>
Date:   Sat, 15 Oct 2022 17:28:35 -0700
From:   Randy Dunlap <rdunlap@...radead.org>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Yury Norov <yury.norov@...il.com>,
        Borislav Petkov <bp@...en8.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     syzbot <syzbot+d0fd2bf0dd6da72496dd@...kaller.appspotmail.com>,
        syzkaller-bugs@...glegroups.com,
        Andrew Jones <ajones@...tanamicro.com>, netdev@...r.kernel.org,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Menglong Dong <imagedong@...cent.com>,
        Kuniyuki Iwashima <kuniyu@...zon.com>,
        Petr Machata <petrm@...dia.com>,
        Guo Ren <guoren@...ux.alibaba.com>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        andriy.shevchenko@...ux.intel.com, linux@...musvillemoes.dk,
        caraitto@...gle.com, willemb@...gle.com, jonolson@...gle.com,
        amritha.nambiar@...el.com, linux-kernel@...r.kernel.org
Subject: Re: [syzbot] WARNING in c_start

Hi--

On 10/15/22 17:24, Tetsuo Handa wrote:
> On 2022/10/16 5:44, Yury Norov wrote:
>> Add people from other threads discussing this.
>>
>> On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
>>> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
>>>> That's an invalid command line. The correct syntax is:
>>>>
>>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>
>>> The fix is not in Linus' tree yet.
>>>
>>>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
>>>> other architectures have the same problem. And fixing all callers will
>>>> not be in time for this merge window.
>>>
>>> Why won't there be time? That's why the -rcs are for.
>>>
>>> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
>>>
>>> So no, we will take Andrew's fixes for all arches in time for 6.1.
>>
>> Summarizing things:
>>
>> 1. cpumask_check() was introduced to make sure that the cpu number
>> passed into cpumask API belongs to a valid range. But the check is
>> broken for a very long time. And because of that there are a lot of
>> places where cpumask API is used wrongly.
>>
>> 2. Underlying bitmap functions handle that correctly - when user
>> passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
>> what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
>> everything is working smoothly.
>>
>> 3. I fixed all warnings that I was aware at the time of submitting the
>> patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
>> netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
>> cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
>> proposed fixes for c_start() in arch code.
>>
>> 4. The code paths mentioned above are all known to me that violate
>> cpumask_check() rules. (Did I miss something?)
>>
>> With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
>> me about this problem with c_start(). But I don't like the idea to revert
>> cpumask_check() fix. This way we'll never clean that mess. 
>>
>> If for some reason those warnings are unacceptable for -rcs (and like
>> Boris, I don't understand why), than instead of reverting commits, I'd
>> suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
>> config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
>> fix their code. I can send a patch shortly, if we'll decide going this way.
>>
>> How people would even realize that they're doing something wrong if
>> they will not get warned about it?
> 
> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
> syzkaller) like below diff is sufficient for people to realize that they're
> doing something wrong.
> 
> Again, please do revert "cpumask: fix checking valid cpu range" immediately.
> 
>  include/linux/cpumask.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index c2aa0aa26b45..31af2cc5f0c2 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -118,6 +118,18 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu)
>  	return cpu;
>  }
>  
> +/*
> + * We want to avoid passing -1 as a valid cpu argument.
> + * But we should not crash the kernel until all in-tree callers are fixed.
> + */

Why not say that any negative cpu argument is invalid?
Or is it OK to pass -2 as the cpu arg?

> +static __always_inline void report_negative_cpuid(void)
> +{
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +	pr_warn_once("FIXME: Passing -1 as CPU argument needs to be avoided.\n");
> +	DO_ONCE_LITE(dump_stack);
> +#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
> +}
> +
>  /**
>   * cpumask_first - get the first cpu in a cpumask
>   * @srcp: the cpumask pointer
> @@ -177,6 +189,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp)
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
>  }
>  
> @@ -192,6 +206,8 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
>  }
>  
> @@ -234,6 +250,8 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
>  		nr_cpumask_bits, n + 1);
>  }
> @@ -265,6 +283,8 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo
>  	cpumask_check(start);
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  
>  	/*
>  	 * Return the first available CPU when wrapping, or when starting before cpu0,

-- 
~Randy