[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251122004358.GB2682494@ax162>
Date: Fri, 21 Nov 2025 17:43:58 -0700
From: Nathan Chancellor <nathan@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Gabriele Monaco <gmonaco@...hat.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Michael Jeanson <mjeanson@...icios.com>,
Jens Axboe <axboe@...nel.dk>,
"Paul E. McKenney" <paulmck@...nel.org>,
"Gautham R. Shenoy" <gautham.shenoy@....com>,
Florian Weimer <fweimer@...hat.com>,
Tim Chen <tim.c.chen@...el.com>, Yury Norov <yury.norov@...il.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V5 20/20] sched/mmcid: Switch over to the new mechanism
Hi Thomas,
On Wed, Nov 19, 2025 at 06:27:22PM +0100, Thomas Gleixner wrote:
> Now that all pieces are in place, change the implementations of
> sched_mm_cid_fork() and sched_mm_cid_exit() to adhere to the new strict
> ownership scheme and switch context_switch() over to use the new
> mm_cid_schedin() functionality.
>
> The common case is that there is no mode change required, which makes
> fork() and exit() just update the user count and the constraints.
>
> In case that a new user would exceed the CID space limit the fork() context
> handles the transition to per CPU mode with mm::mm_cid::mutex held. exit()
> handles the transition back to per task mode when the user count drops
> below the switch back threshold. fork() might also be forced to handle a
> deferred switch back to per task mode, when a affinity change increased the
> number of allowed CPUs enough.
>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Our CI started seeing a hang in QEMU after getting to userspace when
booting with a single CPU (our default logic when using TCG instead of
KVM) that I bisected to this change as commit 2635fb0f0973
("sched/mmcid: Switch over to the new mechanism") in -next.
https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/19557323049/job/56003155890
$ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- clean defconfig bzImage
$ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/x86_64-rootfs.cpio.zst | zstd -d >rootfs.cpio
$ qemu-system-x86_64 \
-display none \
-nodefaults \
-M q35 \
-d unimp,guest_errors \
-append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
-kernel arch/x86/boot/bzImage \
-initrd rootfs.cpio \
-cpu host \
-enable-kvm \
-m 512m \
-smp 1 \
-serial mon:stdio
[ 0.000000] Linux version 6.18.0-rc4-00066-g2635fb0f0973 (nathan@...62) (x86_64-linux-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP PREEMPT_DYNAMIC Fri Nov 21 17:30:21 MST 2025
...
[ 0.946720] Freeing unused kernel image (initmem) memory: 2812K
[ 0.947864] Write protecting the kernel read-only data: 28672k
[ 0.949257] Freeing unused kernel image (text/rodata gap) memory: 1540K
[ 0.950579] Freeing unused kernel image (rodata/data gap) memory: 760K
[ 0.991764] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[ 0.993000] Run /init as init process
<hangs>
If I change '-smp 1' to '-smp 2', userspace runs properly. At the parent
change, this issue does not exist but it is obviously possible that this
change exposes a bug from earlier in the series, I did not test.
Cheers,
Nathan
# bad: [d724c6f85e80a23ed46b7ebc6e38b527c09d64f5] Add linux-next specific files for 20251121
# good: [fd95357fd8c6778ac7dea6c57a19b8b182b6e91f] Merge tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
git bisect start 'd724c6f85e80a23ed46b7ebc6e38b527c09d64f5' 'fd95357fd8c6778ac7dea6c57a19b8b182b6e91f'
# good: [d8d0238c7ea639aa8c9f8c76d089e684e283ea87] Merge branch 'libcrypto-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
git bisect good d8d0238c7ea639aa8c9f8c76d089e684e283ea87
# good: [84abc3686f608efd4914a5b4d651307e32ea4000] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
git bisect good 84abc3686f608efd4914a5b4d651307e32ea4000
# bad: [c709a6758d17f1f2bccadb18cc82bccecdf216ff] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git
git bisect bad c709a6758d17f1f2bccadb18cc82bccecdf216ff
# skip: [8d00858e9326dafe5e27072e3cee5c23b60e7281] Merge branch 'master' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect skip 8d00858e9326dafe5e27072e3cee5c23b60e7281
# good: [8be00d1ba3a1305469f6747d0285c4496a6855ee] KVM: arm64: GICv3: Completely disable trapping on vcpu exit
git bisect good 8be00d1ba3a1305469f6747d0285c4496a6855ee
# good: [4daaead949b331c271f6df4162f39589088fe27b] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 4daaead949b331c271f6df4162f39589088fe27b
# bad: [b1a2fa6d34e27225c2b3f79840f198227af1f53d] Merge branch into tip/master: 'x86/sev'
git bisect bad b1a2fa6d34e27225c2b3f79840f198227af1f53d
# bad: [468f14a4c5c9def982ae9a74344fa6f905a9b725] Merge branch into tip/master: 'locking/core'
git bisect bad 468f14a4c5c9def982ae9a74344fa6f905a9b725
# bad: [705d7cad382be9b3b52b89b220e77b17cf24322a] Merge branch 'sched/core' into core/rseq, to resolve conflict
git bisect bad 705d7cad382be9b3b52b89b220e77b17cf24322a
# good: [472931e757fb3dfad1f78ce6f5abd821155433b2] sched/mmcid: Convert mm CID mask to a bitmap
git bisect good 472931e757fb3dfad1f78ce6f5abd821155433b2
# good: [7f829bde94b1c97b1804fa5860e066ea49dbfca3] sched/core: Optimize core cookie matching check
git bisect good 7f829bde94b1c97b1804fa5860e066ea49dbfca3
# good: [d206fbad9328ddb68ebabd7cf7413392acd38081] sched/fair: Revert max_newidle_lb_cost bump
git bisect good d206fbad9328ddb68ebabd7cf7413392acd38081
# good: [340af997d25dab0f05c4de8399d656b112592a93] sched/mmcid: Provide CID ownership mode fixup functions
git bisect good 340af997d25dab0f05c4de8399d656b112592a93
# bad: [2635fb0f0973c57c45f03708d52e827ec99ac78e] sched/mmcid: Switch over to the new mechanism
git bisect bad 2635fb0f0973c57c45f03708d52e827ec99ac78e
# good: [2644779ec144d3e8cce5fed9623b47e70b3e0422] irqwork: Move data struct to a types header
git bisect good 2644779ec144d3e8cce5fed9623b47e70b3e0422
# good: [cba5e581161e379037a94f5a75d1a61bd1ccce3b] sched/mmcid: Implement deferred mode change
git bisect good cba5e581161e379037a94f5a75d1a61bd1ccce3b
# first bad commit: [2635fb0f0973c57c45f03708d52e827ec99ac78e] sched/mmcid: Switch over to the new mechanism
Powered by blists - more mailing lists