linux-kernel - Re: [PATCH 2/3] Linux: Use rseq in sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <877dvg4ud4.fsf@oldenburg2.str.redhat.com>
Date:   Mon, 06 Jul 2020 15:59:35 +0200
From:   Florian Weimer <fweimer@...hat.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Carlos O'Donell <carlos@...hat.com>,
        Joseph Myers <joseph@...esourcery.com>,
        Szabolcs Nagy <szabolcs.nagy@....com>,
        libc-alpha@...rceware.org, Thomas Gleixner <tglx@...utronix.de>,
        Ben Maurer <bmaurer@...com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Will Deacon <will.deacon@....com>,
        Paul Turner <pjt@...gle.com>, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org
Subject: Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)

* Mathieu Desnoyers:

> When available, use the cpu_id field from __rseq_abi on Linux to
> implement sched_getcpu().  Fall-back on the vgetcpu vDSO if
> unavailable.

I've pushed this to glibc master, but unfortunately it looks like this
exposes a kernel bug related to affinity mask changes.

After building and testing glibc, this

  for x in {1..2000} ; do posix/tst-affinity-static  & done

produces some “error:” lines for me:

error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0

“expected 0” is a result of how the test has been written, it bails out
on the first failure, which happens with CPU ID 0.

Smaller systems can use a smaller count than 2000 to reproduce this.  It
also happens sporadically when running the glibc test suite itself
(which is why it took further testing to reveal this issue).

I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the
Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel
4.18.0-193.el8 (all x86_64).

As to the cause, I'd guess that the exit path in the sched_setaffinity
system call fails to update the rseq area, so that userspace can observe
the outdated CPU ID there.

Thanks,
Florian