lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YyGQUu9S+ISHaNFi@xsang-OptiPlex-9020>
Date:   Wed, 14 Sep 2022 16:26:58 +0800
From:   Oliver Sang <oliver.sang@...el.com>
To:     Shinichiro Kawasaki <shinichiro.kawasaki@....com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "lkp@...ts.01.org" <lkp@...ts.01.org>,
        "lkp@...el.com" <lkp@...el.com>,
        Damien Le Moal <Damien.LeMoal@....com>
Subject: Re: [cpuidle,intel_idle]  32d4fd5751:
 WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit

Hi Shin'ichiro Kawasaki and Peter Zijlstra,

On Thu, Jun 23, 2022 at 11:23:59AM +0000, Shinichiro Kawasaki wrote:
> On Jun 13, 2022 / 00:00, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-11):
> > 
> > commit: 32d4fd5751eadbe1823a37eb38df85ec5c8e6207 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > in testcase: kernel-selftests
> > version: kernel-selftests-x86_64-cef46213-1_20220609
> > with following parameters:
> > 
> > 	group: resctrl
> > 	ucode: 0x500320a
> > 
> > test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> > test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> > 
> > 
> > on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > 
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > 
> > 
> > [ 29.104402][ T0] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:864 rcu_eqs_exit+0x4b/0xc0 
> > [   29.104417][    T0]
> > [   29.104418][    T0] =============================
> > [   29.104419][    T0] WARNING: suspicious RCU usage
> > [   29.104421][    T0] 5.19.0-rc1-00001-g32d4fd5751ea #1 Not tainted
> > [   29.104424][    T0] -----------------------------
> 
> FYI, I observe this WARNING on my test servers for fstests, with kernel
> v5.19-rc3. It was observed at system boot, and was also observed repeatedly
> during fstests run. I reverted the commit 32d4fd5751ea then the WARNING
> disappeared. The WARNING was observed on systems with 20 threads CPU, but
> not observed on systems with 8 threads CPU.
> 
> Looking in the commit, I'm not sure how it is related to the RCU warning.
> If any further action on my system would help, please let me know.

recently we made further tests and confirmed the issue is existing on this
commit but clean on parent, still on test machine:
  88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory

=========================================================================================
compiler/group/kconfig/rootfs/tbox_group/testcase:
  gcc-11/resctrl/x86_64-rhel-8.3-kselftests/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/kernel-selftests

commit:
  v5.19-rc1
  32d4fd5751eadbe1823a37eb38df85ec5c8e6207

       v5.19-rc1 32d4fd5751eadbe1823a37eb38d
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :20         100%          20:20    dmesg.RIP:rcu_eqs_exit   <------
           :20          95%          19:20    dmesg.RIP:sched_clock_tick
           :20          90%          18:20    dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit
           :20          90%          18:20    dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick
           :20         100%          20:20    dmesg.WARNING:suspicious_RCU_usage
           :20         100%          20:20    dmesg.boot_failures
           :20           5%           1:20    dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle
           :20           5%           1:20    dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle
           :20          95%          19:20    dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage
           :20         100%          20:20    dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage


as Shin'ichiro Kawasaki mentioned, the issues seems not be able to reproduce on
systems with small number of threads of CPU. so we tested on a vm which only
have 2 threads
  qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

we confirmed the issue cannot be reproduced.

we actually don't have related knolwedge, if need extra data or testing we can
help.

> 
> -- 
> Shin'ichiro Kawasaki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ