[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0c2d1f6f-c7e8-c6bb-4588-e2846d1d1447@redhat.com>
Date: Sun, 18 Dec 2022 22:31:23 -0500
From: Waiman Long <longman@...hat.com>
To: kernel test robot <yujie.liu@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org
Subject: Re: [linus:master] [sched] 8f9ea86fdf:
WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity
On 12/18/22 06:37, kernel test robot wrote:
> Please be noted that we reported
> [tip:sched/core] [sched] 8f9ea86fdf: WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity
> at
> https://lore.kernel.org/all/202211072243.f6b8c13b-yujie.liu@intel.com/
> when this commit was on
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
> since it has been merged to mainline now and the issue still exists in
> our tests, so we report again FYI
>
>
> Greeting,
>
> FYI, we noticed WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity due to commit (built with gcc-11):
>
> commit: 8f9ea86fdf99b81458cc21fc1c591fcd4a0fa1f4 ("sched: Always preserve the user requested cpumask")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master 459c73db4069c27c1d4a0e20d055b837396364b8]
>
> in testcase: rcutorture
> version:
> with following parameters:
>
> runtime: 300s
> test: cpuhotplug
> torture_type: trivial
>
> test-description: rcutorture is rcutorture kernel module load/unload test.
> test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> [ 146.157862][ T487] ------------[ cut here ]------------
> [ 146.158315][ T487] rcutorture_sched_setaffinity: sched_setaffinity() returned -22
> [ 146.158938][ T487] WARNING: CPU: 0 PID: 487 at kernel/rcu/update.c:495 rcutorture_sched_setaffinity (kernel/rcu/update.c:495 (discriminator 3))
> [ 146.159676][ T487] Modules linked in: rcutorture torture aesni_intel input_leds autofs4 cmdlinepart
> [ 146.160249][ T487] CPU: 0 PID: 487 Comm: rcu_torture_fak Tainted: G W T 6.1.0-rc2-00013-g8f9ea86fdf99 #1
> [ 146.161051][ T487] EIP: rcutorture_sched_setaffinity (kernel/rcu/update.c:495 (discriminator 3))
> [ 146.161517][ T487] Code: d2 c3 8d 74 26 00 90 80 3d 75 ee 28 c2 00 75 ee 50 68 68 7a 9c c1 68 98 f8 cc c1 89 45 fc c6 05 75 ee 28 c2 01 e8 04 d8 84 00 <0f> 0b 8b 45 fc 83 c4 0c c9 31 d2 c3 8d b4 26 00 00 00 00 66 90 55
> All code
> ========
> 0: d2 c3 rol %cl,%bl
> 2: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi
> 6: 90 nop
> 7: 80 3d 75 ee 28 c2 00 cmpb $0x0,-0x3dd7118b(%rip) # 0xffffffffc228ee83
> e: 75 ee jne 0xfffffffffffffffe
> 10: 50 push %rax
> 11: 68 68 7a 9c c1 pushq $0xffffffffc19c7a68
> 16: 68 98 f8 cc c1 pushq $0xffffffffc1ccf898
> 1b: 89 45 fc mov %eax,-0x4(%rbp)
> 1e: c6 05 75 ee 28 c2 01 movb $0x1,-0x3dd7118b(%rip) # 0xffffffffc228ee9a
> 25: e8 04 d8 84 00 callq 0x84d82e
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: 8b 45 fc mov -0x4(%rbp),%eax
> 2f: 83 c4 0c add $0xc,%esp
> 32: c9 leaveq
> 33: 31 d2 xor %edx,%edx
> 35: c3 retq
> 36: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
> 3d: 66 90 xchg %ax,%ax
> 3f: 55 push %rbp
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 8b 45 fc mov -0x4(%rbp),%eax
> 5: 83 c4 0c add $0xc,%esp
> 8: c9 leaveq
> 9: 31 d2 xor %edx,%edx
> b: c3 retq
> c: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
> 13: 66 90 xchg %ax,%ax
> 15: 55 push %rbp
> [ 146.162968][ T487] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00000000
> [ 146.163487][ T487] ESI: c38d3480 EDI: 00000002 EBP: edbcff54 ESP: edbcff44
> [ 146.164076][ T487] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> [ 146.164654][ T487] CR0: 80050033 CR2: 08082ff0 CR3: 2e0f6000 CR4: 00040690
> [ 146.165171][ T487] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 146.165645][ T487] DR6: fffe0ff0 DR7: 00000400
> [ 146.165958][ T487] Call Trace:
> [ 146.166198][ T487] synchronize_rcu_trivial (kernel/rcu/rcutorture.c:798) rcutorture
> [ 146.166679][ T487] rcu_torture_fakewriter (kernel/rcu/rcutorture.c:1648) rcutorture
> [ 146.167155][ T487] kthread (kernel/kthread.c:376)
> [ 146.167456][ T487] ? rcu_read_unlock (kernel/rcu/rcutorture.c:1567) rcutorture
> [ 146.167908][ T487] ? kthread_complete_and_exit (kernel/kthread.c:331)
> [ 146.168340][ T487] ret_from_fork (arch/x86/entry/entry_32.S:770)
> [ 146.168693][ T487] irq event stamp: 36141
> [ 146.169013][ T487] hardirqs last enabled at (36149): __up_console_sem (kernel/printk/printk.c:261 (discriminator 1))
> [ 146.169707][ T487] hardirqs last disabled at (36156): __up_console_sem (kernel/printk/printk.c:259 (discriminator 1))
> [ 146.170403][ T487] softirqs last enabled at (36050): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600)
> [ 146.171012][ T487] softirqs last disabled at (36041): do_softirq_own_stack (arch/x86/kernel/irq_32.c:60 arch/x86/kernel/irq_32.c:150)
> [ 146.171669][ T487] ---[ end trace 0000000000000000 ]---
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <yujie.liu@...el.com>
> | Link: https://lore.kernel.org/oe-lkp/202212181906.c1c31e8e-yujie.liu@intel.com
>
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-6.1.0-rc2-00013-g8f9ea86fdf99 .config
> make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
> make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> cd <mod-install-dir>
> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
That warning is expected. The sched_setaffinity() syscall is changed in
a way that an error condition that was previously ignored (i.e.
sched_setaffinity() fails to set cpu affinity to at least one of the
CPUs in the mask) will now return an error code. I don't consider that a
breakage of user API. Let me know if the assumption is incorrect.
Thanks,
Longman
Powered by blists - more mailing lists