[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877dvg4ud4.fsf@oldenburg2.str.redhat.com>
Date:   Mon, 06 Jul 2020 15:59:35 +0200
From:   Florian Weimer <fweimer@...hat.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Carlos O'Donell <carlos@...hat.com>,
        Joseph Myers <joseph@...esourcery.com>,
        Szabolcs Nagy <szabolcs.nagy@....com>,
        libc-alpha@...rceware.org, Thomas Gleixner <tglx@...utronix.de>,
        Ben Maurer <bmaurer@...com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Will Deacon <will.deacon@....com>,
        Paul Turner <pjt@...gle.com>, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org
Subject: Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)
* Mathieu Desnoyers:
> When available, use the cpu_id field from __rseq_abi on Linux to
> implement sched_getcpu().  Fall-back on the vgetcpu vDSO if
> unavailable.
I've pushed this to glibc master, but unfortunately it looks like this
exposes a kernel bug related to affinity mask changes.
After building and testing glibc, this
  for x in {1..2000} ; do posix/tst-affinity-static  & done
produces some “error:” lines for me:
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
“expected 0” is a result of how the test has been written, it bails out
on the first failure, which happens with CPU ID 0.
Smaller systems can use a smaller count than 2000 to reproduce this.  It
also happens sporadically when running the glibc test suite itself
(which is why it took further testing to reveal this issue).
I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the
Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel
4.18.0-193.el8 (all x86_64).
As to the cause, I'd guess that the exit path in the sched_setaffinity
system call fails to update the rseq area, so that userspace can observe
the outdated CPU ID there.
Thanks,
Florian
Powered by blists - more mailing lists
 
