linux-kernel - Re: SIGSEGVs after 39a167560a61 ("rseq: Optimize event setting")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <874io5andc.ffs@tglx>
Date: Wed, 28 Jan 2026 14:16:31 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: Dmitry Vyukov <dvyukov@...gle.com>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>
Cc: David Matlack <dmatlack@...gle.com>, Marco Elver <elver@...gle.com>,
 Peter Zijlstra <peterz@...radead.org>, LKML
 <linux-kernel@...r.kernel.org>, Michael Jeanson <mjeanson@...icios.com>,
 Jens Axboe <axboe@...nel.dk>, "Paul E. McKenney" <paulmck@...nel.org>, X86
 ML <x86@...nel.org>, Sean Christopherson <seanjc@...gle.com>, Wei Liu
 <wei.liu@...nel.org>
Subject: Re: SIGSEGVs after 39a167560a61 ("rseq: Optimize event setting")

On Wed, Jan 28 2026 at 12:40, Dmitry Vyukov wrote:
> On Wed, 28 Jan 2026 at 12:28, Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
>> I suspect that tcmalloc's aliasing of the rseq cpu_id_start field
>> with its own data structure, corrupting its content, and expecting the
>> kernel to update it on every preemption does not work anymore, because
>> the kernel only updates it when the cpu_id actually changes.
>
> I can't recall now why tcmalloc would need updates on every
> preemption... Do you know why?

>From the tcmalloc docs:

    "To understand that the cached pointer is not valid anymore when a
     thread is rescheduled to another CPU, we overlap the top 4 bytes of
     the cached address with `__rseq_abi.cpu_id_start`. When a thread is
     rescheduled the kernel overwrites `cpu_id_start` with the current
     CPU number, which gives us the signal that the cached address is not
     valid anymore."

That's still the case as the kernel updates the CPU number when the task
is migrated to a different CPU. What it not longer does is updating the
CPU number for the preemption case on the same CPU because that's just a
massive waste of CPU cycles.

Now the interesting part of that documentation:

    "To distinguish the high part of the cached address from the CPU
     number, we set the top bit in the cached address, real CPU numbers
     (`<2^31`) do not have this bit set.

     With these arrangements, slabs address calculation on
     allocation/deallocation fast paths reduces to load and check of the
     cached address:

     ```
     slabs = __rseq_abi[-4];
     if ((slabs & (1 << 63)) == 0) goto slowpath;
     slabs &= ~(1 << 63);
     ```"

which means the code fiddles in rseq_abi::cpu_id_start. That never
worked on a RSEQ DEBUG enabled kernel because the kernel detects that
user space fiddled with data which is defined as USER_RO in the ABI and
kills the offender.

I have not looked at the code, but I suspect that this is also
(undocumented) abused as a light weight preemption indicator and that
indication is not longer given as the kernel only updates when the CPU
number changed.

As this was never correctly using the ABI (enable CONFIG_RSEQ_DEBUG on a
pre 6.19 kernel and watch the show or ask David) this is a pure tcmalloc
problem and not a regression. And no, we are not catering to that abuse
for the cost of performance regressions which have been observed after
glibc enabled rseq.

Thanks,

        tglx