[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877dvg4ud4.fsf@oldenburg2.str.redhat.com>
Date: Mon, 06 Jul 2020 15:59:35 +0200
From: Florian Weimer <fweimer@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Carlos O'Donell <carlos@...hat.com>,
Joseph Myers <joseph@...esourcery.com>,
Szabolcs Nagy <szabolcs.nagy@....com>,
libc-alpha@...rceware.org, Thomas Gleixner <tglx@...utronix.de>,
Ben Maurer <bmaurer@...com>,
Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Boqun Feng <boqun.feng@...il.com>,
Will Deacon <will.deacon@....com>,
Paul Turner <pjt@...gle.com>, linux-kernel@...r.kernel.org,
linux-api@...r.kernel.org
Subject: Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)
* Mathieu Desnoyers:
> When available, use the cpu_id field from __rseq_abi on Linux to
> implement sched_getcpu(). Fall-back on the vgetcpu vDSO if
> unavailable.
I've pushed this to glibc master, but unfortunately it looks like this
exposes a kernel bug related to affinity mask changes.
After building and testing glibc, this
for x in {1..2000} ; do posix/tst-affinity-static & done
produces some “error:” lines for me:
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
“expected 0” is a result of how the test has been written, it bails out
on the first failure, which happens with CPU ID 0.
Smaller systems can use a smaller count than 2000 to reproduce this. It
also happens sporadically when running the glibc test suite itself
(which is why it took further testing to reveal this issue).
I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the
Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel
4.18.0-193.el8 (all x86_64).
As to the cause, I'd guess that the exit path in the sched_setaffinity
system call fails to update the rseq area, so that userspace can observe
the outdated CPU ID there.
Thanks,
Florian
Powered by blists - more mailing lists