lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c2888d14-e01d-4d28-866e-edeac0532d2a@redhat.com>
Date: Thu, 7 Aug 2025 09:08:37 -0400
From: Waiman Long <llong@...hat.com>
To: Naresh Kamboju <naresh.kamboju@...aro.org>,
 Cgroups <cgroups@...r.kernel.org>,
 "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>, lkft-triage@...ts.linaro.org,
 Linux Regressions <regressions@...ts.linux.dev>
Cc: Michal Koutný <mkoutny@...e.com>,
 Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
 Dan Carpenter <dan.carpenter@...aro.org>,
 Anders Roxell <anders.roxell@...aro.org>, Arnd Bergmann <arnd@...db.de>,
 kamalesh.babulal@...cle.com
Subject: Re: next-20250805: ampere: WARNING: kernel/cgroup/cpuset.c:1352 at
 remote_partition_disable

On 8/7/25 4:27 AM, Naresh Kamboju wrote:
> Regressions noticed intermittently on AmpereOne while running selftest
> cgroup testing
> with Linux next-20250805 and earlier seen on next-20250722 tag also.
>
> Regression Analysis:
> - New regression? Yes
> - Reproducibility? Intermittent
>
> First seen on the next-20250722 and after next-20250805.
>
> Test regression: next-20250805 ampere WARNING kernel cgroup cpuset.c
> at remote_partition_disable
>
> Reported-by: Linux Kernel Functional Testing <lkft@...aro.org>
>
> ## Test log
> selftests: cgroup: test_cpuset_prs.sh
> Running state tRunning state transition test ...
> ransition test ...
> Running test 0 ...
> Running test 1 ...
> Running test 2 ...
> Running test 3 ...
> Running test 4 ...
> Running test 5 ...
> Running test 6 ...
> Running test 7 ...
> Running test 8 ...
> Running test 9 ...
> Running test 10 ...
> Running test 11 ...
> Running test 12 ...
> Running test 13 ...
> Running test 14 ...
> Running test 15 ...
> Running test 16 ...
> Running test 17 ...
> Running test 18 ...
> Running test 19 ...
> [  137.504549] psci: CPU2 killed (polled 0 ms)
> [  137.747094] Detected PIPT I-cache on CPU2
> [  137.747214] GICv3: CPU2: found redistributor 3500 region 0:0x0000400201cc0000
> [  137.747312] CPU2: Booted secondary processor 0x0000003500 [0xc00fac40]
>
> <>
>
> Running test 63 ...
> Running test 64 ...
> Running test 66 ...
> [  174.929535] psci: CPU3 killed (polled 0 ms)
> [  175.263087] Detected PIPT I-cache on CPU3
> [  175.263203] GICv3: CPU3: found redistributor 3501 region 0:0x0000400201d00000
> [  175.263300] CPU3: Booted secondary processor 0x0000003501 [0xc00fac40]
> [  175.434129] workqueue: Interrupted when creating a worker thread
> "kworker/u1028:0"
> ** replaying previous printk message **
> [  175.434129] workqueue: Interrupted when creating a worker thread
> "kworker/u1028:0"
> [  175.440230] ------------[ cut here ]------------
> [  175.440234] WARNING: kernel/cgroup/cpuset.c:1352 at
> remote_partition_disable+0x120/0x160, CPU#170: rmdir/33763
> [  175.467456] Modules linked in: cdc_ether usbnet sm3_ce sha3_ce nvme
> nvme_core xhci_pci_renesas arm_cspmu_module ipmi_devintf arm_spe_pmu
> ipmi_msghandler arm_cmn cppc_cpufreq fuse drm backlight
> [  175.484676] CPU: 170 UID: 0 PID: 33763 Comm: rmdir Not tainted
> 6.16.0-next-20250805 #1 PREEMPT
> [  175.493365] Hardware name: Inspur NF5280R7/Mitchell MB, BIOS
> 04.04.00004001 2025-02-04 22:23:30 02/04/2025
> [  175.503178] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> not ok 12 selftests: cgroup: test_cpuset_prs.sh TIMEOUT 45 seconds
> [  175.510130] pc : remote_partition_disable
> (kernel/cgroup/cpuset.c:1352 (discriminator 1)
> kernel/cgroup/cpuset.c:1342 (discriminator 1)
> kernel/cgroup/cpuset.c:1514 (discriminator 1))

The warning is caused by workqueue_unbound_exclude_cpumask() returning 
an error which should not normally happen. There is a "workqueue: 
Interrupted when creating a worker thread" which may cause problem in 
the workqueue code leading to this error. That particular error happens 
when kthread_create_on_node() fails to create the requested worker kthread.

The test itself uses the hotplug code rather heavily to offline/online 
CPUs to test the cpuset code related to hotplug. I don't know if that is 
part of the problem or not. Anyway, there isn't any big change in the 
cpuset code recently. I think the real bug may lie in other kernel areas 
used by the cpuset code.

Cheers,
Longman


> [  175.518032] lr : remote_partition_disable
> (kernel/cgroup/cpuset.c:1352 (discriminator 1)
> kernel/cgroup/cpuset.c:1514 (discriminator 1))
> [  175.525849] sp : ffff8000c853bb90
> [  175.529585] x29: ffff8000c853bb90 x28: ffff00017badc800 x27: 0000000000000000
> timeout set to 45
> [  175.536713] x26: 0000000000000000 x25: ffff00014c422540 x24: ffffb1c71020b000
> [  175.545489] x23: ffff000113769c00 x22: 0000000000000001 x21: ffffb1c71020b5c0
> [  175.552615] x20: ffff8000c853bbd0 x19: ffff000113769a00 x18: 00000000ffffffff
> selftests: cgroup: test_cpuset_v1_hp.sh
> [  175.559910] x17: 31752f72656b726f x16: 776b222064616572 x15: 68742072656b726f
> [  175.569900] x14: 0000000000000004 x13: ffffb1c70fb4f160 x12: 0000000000000000
> cpuset v1 mount point not found!
> [  175.577888] x11: 000002f6b9bf58c3 x10: 0000000000000023 x9 : ffffb1c70d6bdff8
> Test SKIPPED
> ok 13 selftests: cgroup: test_cpuset_v1_hp.sh #SKIP
> [  175.587877] x8 : ffff8000c853bad0 x7 : 0000000000000000 x6 : 0000000000000001
> [  175.597864] x5 : ffffb1c70e87a488 x4 : fffffdffc40a88e0 x3 : 000000000080007d
> [  175.607849] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffff4
> [  175.615578] Call trace:
> [  175.618013] remote_partition_disable (kernel/cgroup/cpuset.c:1352
> (discriminator 1) kernel/cgroup/cpuset.c:1342 (discriminator 1)
> kernel/cgroup/cpuset.c:1514 (discriminator 1)) (P)
> [  175.623057] update_prstate (include/linux/spinlock.h:376
> kernel/cgroup/cpuset.c:2963)
> [  175.626799] cpuset_css_killed (kernel/cgroup/cpuset.c:3598)
> [  175.630713] kill_css.part.0 (kernel/cgroup/cgroup.c:5968)
> [  175.634464] cgroup_destroy_locked (kernel/cgroup/cgroup.c:6058
> (discriminator 4))
> [  175.638810] cgroup_rmdir (kernel/cgroup/cgroup.c:6102)
> [  175.642376] kernfs_iop_rmdir (fs/kernfs/dir.c:1286)
> [  175.646203] vfs_rmdir (fs/namei.c:4461 fs/namei.c:4438)
> [  175.649515] do_rmdir (fs/namei.c:4516 (discriminator 1))
> [  175.652823] __arm64_sys_unlinkat (fs/namei.c:4690 (discriminator 2)
> fs/namei.c:4684 (discriminator 2) fs/namei.c:4684 (discriminator 2))
> [  175.656998] invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:54)
> [  175.660738] el0_svc_common.constprop.0
> (include/linux/thread_info.h:135 (discriminator 2)
> arch/arm64/kernel/syscall.c:140 (discriminator 2))
> [  175.665431] do_el0_svc (arch/arm64/kernel/syscall.c:152)
> [  175.668735] el0_svc (arch/arm64/include/asm/irqflags.h:82
> (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
> 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
> arch/arm64/kernel/entry-common.c:169 (discriminator 1)
> arch/arm64/kernel/entry-common.c:182 (discriminator 1)
> arch/arm64/kernel/entry-common.c:880 (discriminator 1))
> [  175.671877] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:899)
> [  175.676052] el0t_64_sync (arch/arm64/kernel/entry.S:596)
> [  175.679705] ---[ end trace 0000000000000000 ]---
>
>
> ## Source
> * Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
> * Git sha: afec768a6a8fe7fb02a08ffce5f2f556f51d4b52
> * Git describe: next-20250805
> * Architectures: arm64
> * Toolchains: gcc-13
> * Kconfigs: defconfig+selftests/*/configs
>
> ## Build
> * Test log 1: https://qa-reports.linaro.org/api/testruns/29220998/log_file/
> * Test log 2: https://qa-reports.linaro.org/api/testruns/29395866/log_file/
> * LAVA log: https://lkft-staging.validation.linaro.org/scheduler/job/187100#L6621
> * Test history:
> https://regressions.linaro.org/lkft/linux-next-master-ampere/next-20250805/log-parser-test/exception-warning-kernelcgroupcpuset-at-remote_partition_disable/history/
> * Test plan: https://tuxapi.tuxsuite.com/v1/groups/ampere/projects/ci/tests/30rj0dIdTXUiGfYMA7suavpa77r
> * Build link: https://storage.tuxsuite.com/public/ampere/ci/builds/30rj0OYSDUMeT0cyTDioTe5XVOI/
> * Kernel config:
> https://storage.tuxsuite.com/public/ampere/ci/builds/30rj0OYSDUMeT0cyTDioTe5XVOI/config
>
> --
> Linaro LKFT
> https://lkft.linaro.org
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ