lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Yu9R1NgpU5C5bcXi@xsang-OptiPlex-9020>
Date:   Sun, 7 Aug 2022 13:47:00 +0800
From:   Oliver Sang <oliver.sang@...el.com>
To:     Will Deacon <will@...nel.org>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Valentin Schneider <Valentin.Schneider@....com>,
        Quentin Perret <qperret@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>, <lkp@...ts.01.org>,
        <lkp@...el.com>, <aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>
Subject: Re: [sched]  9ae606bc74:
 WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]

hi Will,

On Fri, Jul 29, 2022 at 01:18:49PM +0800, Oliver Sang wrote:
> hi Will,
> 
> On Mon, Jul 25, 2022 at 10:20:58AM +0100, Will Deacon wrote:
> > On Mon, Jul 25, 2022 at 04:12:57PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed the following commit (built with clang-15):
> > > 
> > > commit: 9ae606bc74dd0e58d4de894e3c5cbb9d45599267 ("sched: Introduce task_cpu_possible_mask() to limit fallback rq selection")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > > in testcase: rcutorture
> > > version: 
> > > with following parameters:
> > > 
> > > 	runtime: 300s
> > > 	test: cpuhotplug
> > > 	torture_type: trivial
> > > 
> > > test-description: rcutorture is rcutorture kernel module load/unload test.
> > > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> > > 
> > > 
> > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > 
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > > 
> > > 
> > > +-------------------------------------------------------------------------+------------+------------+
> > > |                                                                         | 304000390f | 9ae606bc74 |
> > > +-------------------------------------------------------------------------+------------+------------+
> > > | WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture] | 120        | 120        |
> > > | RIP:synchronize_rcu_trivial[rcutorture]                                 | 120        | 120        |
> > > | WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity            | 120        | 120        |
> > > | RIP:rcutorture_sched_setaffinity                                        | 120        | 120        |
> > > | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] | 0          | 36         |
> > > | RIP:rcu_torture_stats_print[rcutorture]                                 | 0          | 36         |
> > > +-------------------------------------------------------------------------+------------+------------+
> > > 
> > > 
> > > please be noted, since 9ae606bc74 is kind of old, we also tested on a latest
> > > mainline commit:
> > > commit 515f71412bb73ebd7f41f90e1684fc80b8730789
> > > Merge: 301c8949322fe cf5029d5dd7cb
> > > Author: Linus Torvalds <torvalds@...ux-foundation.org>
> > > Date:   Sat Jul 23 10:22:26 2022 -0700
> > > 
> > > and confirmed the
> > >    WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
> > > still exists.
> > 
> > I'm not convinced by the bisection -- that commit should't have any effect
> > on x86.


recently we updated our clang to version 16 so we rerun this case, then found
the issue also could be reproduced on parent, though the rate seems quite
smaller than this commit.

304000390f88d049 9ae606bc74dd0e58d4de894e3c5
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
          3:300         36%         112:300   dmesg.RIP:rcu_torture_stats_print[rcutorture]
        300:300         -0%         299:300   dmesg.RIP:rcutorture_sched_setaffinity
        300:300         -0%         299:300   dmesg.RIP:synchronize_rcu_trivial[rcutorture]
          3:300         36%         112:300   dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
        300:300         -0%         299:300   dmesg.WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture]
        300:300         -0%         299:300   dmesg.WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity


we also checked the dmesg, confirmed they have same Call Trace and similar
context when the issue reproduced. so this is a false positive.

sorry if this caused any inconvenience.

> 
> Thanks a lot for your information!
> we will do some further tests to see if below part could impact x86.
> will update you next week. thanks
> 
> @@ -3124,9 +3124,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
> 
>                 /* Look for allowed, online CPU in same node. */
>                 for_each_cpu(dest_cpu, nodemask) {
> -                       if (!cpu_active(dest_cpu))
> -                               continue;
> -                       if (cpumask_test_cpu(dest_cpu, p->cpus_ptr))
> +                       if (is_cpu_allowed(p, dest_cpu))
>                                 return dest_cpu;
>                 }
>         }
> 
> 
> > 
> > Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ