lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a984527-5775-479c-8f16-f9feb38064c0@efficios.com>
Date: Mon, 10 Mar 2025 13:26:01 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
 aruna.ramakrishna@...cle.com, elver@...gle.com,
 "Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org,
 linux-kernel@...r.kernel.org, Michael Jeanson <mjeanson@...icios.com>
Subject: Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys

On 2025-03-10 12:31, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
>>
>> On 2025-03-10 10:43, Dmitry Vyukov wrote:
>>> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
>>> <mathieu.desnoyers@...icios.com> wrote:
>>>>
>>>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
>>>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
>>>>> <mathieu.desnoyers@...icios.com> wrote:
>>>>>>
>>>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>>>>>> works and does not lead to process kills.
>>>>>>>
>>>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
>>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
>>>>>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>>>>>> Cc: "Paul E. McKenney" <paulmck@...nel.org>
>>>>>>> Cc: Boqun Feng <boqun.feng@...il.com>
>>>>>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>>>>>> Cc: Ingo Molnar <mingo@...hat.com>
>>>>>>> Cc: Borislav Petkov <bp@...en8.de>
>>>>>>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
>>>>>>> Cc: "H. Peter Anvin" <hpa@...or.com>
>>>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
>>>>>>> Cc: x86@...nel.org
>>>>>>> Cc: linux-kernel@...r.kernel.org
>>>>>>> Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
>>>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>>>>>
>>>>>>> ---
>>>>>>> Changes in v5:
>>>>>>>      - Use static for variables/functions
>>>>>>>      - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>>>>>
>>>>>>> Changes in v4:
>>>>>>>      - Added Fixes tag
>>>>>>>
>>>>>>> Changes in v3:
>>>>>>>      - added Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
>>>>>>>      - rework the test to work when only pkey 0 is supported for rseq
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>>      - change test to install protected rseq_cs instead of rseq
>>>>>>> ---
>>>>>>>      tools/testing/selftests/rseq/Makefile    |  2 +-
>>>>>>>      tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>>>>>>      tools/testing/selftests/rseq/rseq.h      |  1 +
>>>>>>>      3 files changed, 100 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>>>>>> index 5a3432fceb586..9111d25fea3af 100644
>>>>>>> --- a/tools/testing/selftests/rseq/Makefile
>>>>>>> +++ b/tools/testing/selftests/rseq/Makefile
>>>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>>>>>
>>>>>>>      TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>>>>>>                  param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>>>>>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>>>>>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>>>>>
>>>>>>>      TEST_GEN_PROGS_EXTENDED = librseq.so
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>>>>>> new file mode 100644
>>>>>>> index 0000000000000..cc4dd98190942
>>>>>>> --- /dev/null
>>>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>>>>>> @@ -0,0 +1,98 @@
>>>>>>> +// SPDX-License-Identifier: LGPL-2.1
>>>>>>> +/*
>>>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>>>>>> + */
>>>>>>> +
>>>>>>> +#define _GNU_SOURCE
>>>>>>> +#include <err.h>
>>>>>>> +#include <errno.h>
>>>>>>> +#include <stdio.h>
>>>>>>> +#include <stdlib.h>
>>>>>>> +#include <string.h>
>>>>>>> +#include <sys/mman.h>
>>>>>>> +#include <sys/syscall.h>
>>>>>>> +#include <ucontext.h>
>>>>>>> +#include <unistd.h>
>>>>>>> +
>>>>>>> +#include "rseq.h"
>>>>>>> +#include "rseq-abi.h"
>>>>>>> +
>>>>>>> +static int pkey;
>>>>>>> +static ucontext_t ucp0, ucp1;
>>>>>>> +
>>>>>>> +static void coroutine(void)
>>>>>>> +{
>>>>>>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>>>>>> +     /*
>>>>>>> +      * When we disable access to pkey 0, globals and TLS become
>>>>>>> +      * inaccessible too, so we need to tread carefully.
>>>>>>> +      * Pkey is global so we need to copy it onto the stack.
>>>>>>> +      */
>>>>>>> +     int pk = RSEQ_READ_ONCE(pkey);
>>>>>>> +     struct timespec ts;
>>>>>>> +
>>>>>>> +     orig_pk0 = pkey_get(0);
>>>>>>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>>>>>> +             err(1, "pkey_set failed");
>>>>>>> +     old_pk0 = pkey_get(0);
>>>>>>> +     old_pk1 = pkey_get(pk);
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Prevent compiler from initializing it by loading a 16-global.
>>>>>>> +      */
>>>>>>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>>>>>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>>>>>> +     /*
>>>>>>> +      * If the kernel misbehaves, context switches in the following loop
>>>>>>> +      * will terminate the process with SIGSEGV.
>>>>>>> +      * Trigger preemption w/o accessing TLS.
>>>>>>> +      * Note that glibc's usleep touches errno always.
>>>>>>> +      */
>>>>>>> +     for (i = 0; i < 10; i++)
>>>>>>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>>>>>> +
>>>>>>> +     pk0 = pkey_get(0);
>>>>>>> +     pk1 = pkey_get(pk);
>>>>>>> +     if (pkey_set(0, orig_pk0))
>>>>>>> +             err(1, "pkey_set failed");
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Ensure that the kernel has restored the previous value of pkeys
>>>>>>> +      * register after changing them.
>>>>>>> +      */
>>>>>>> +     if (old_pk0 != pk0)
>>>>>>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>>>>>> +     if (old_pk1 != pk1)
>>>>>>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>>>>>> +
>>>>>>> +     swapcontext(&ucp1, &ucp0);
>>>>>>> +     abort();
>>>>>>> +}
>>>>>>> +
>>>>>>> +int main(int argc, char **argv)
>>>>>>> +{
>>>>>>> +     pkey = pkey_alloc(0, 0);
>>>>>>> +     if (pkey == -1) {
>>>>>>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>>>>>> +                     strerror(errno));
>>>>>>> +             return 0;
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     if (rseq_register_current_thread())
>>>>>>> +             err(1, "rseq_register_current_thread failed");
>>>>>>> +
>>>>>>> +     if (getcontext(&ucp1))
>>>>>>> +             err(1, "getcontext failed");
>>>>>>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
>>>>>>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>>>>>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>>>>>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>>>>>> +             err(1, "mmap failed");
>>>>>>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>>>>>> +                     PROT_READ | PROT_WRITE, pkey))
>>>>>>> +             err(1, "pkey_mprotect failed");
>>>>>>> +     makecontext(&ucp1, coroutine, 0);
>>>>>>> +     if (swapcontext(&ucp0, &ucp1))
>>>>>>> +             err(1, "swapcontext failed");
>>>>>>
>>>>>> Should the rseq register be paired with a rseq unregister here ?
>>>>>
>>>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
>>>>> unregister on thread exit. Even for a libc it may be a bad idea due to
>>>>> signals.
>>>>
>>>> The rseq register/unregister is only for the case where libc does not
>>>> support rseq. But if you want to use rseq_register_current_thread(),
>>>> then you'll want to pair it with unregister.
>>>
>>> Why? Isn't it better to have tests more realitic?
>>
>> If you use rseq.c rseq_register_current_thread without
>> rseq_unregister_current_thread, then you rely on implicit
>> unregistration done by the kernel at thread exit.
>>
>> Then you need to ensure your userspace runtime keep the TLS
>> that holds the rseq area allocated for the entire execution
>> of the thread until it exits in the kernel. Note that
>> disabling signals is not sufficient to prevent the kernel
>> from accessing the rseq area.
>>
>> GNU libc gets away with automatic unregistration at
>> thread exit because it completely controls the lifetime
>> of the registered rseq area, keeping it allocated for as
>> long as the thread is executing.
>>
>> So in order to minimize the dependency on specific libc
>> behavior in the kernel sefltests, the selftests rseq.h
>> requires explicit registration *and* unregistration.
> 
> If libc registers rseq (!rseq_ownership), then
> rseq_unregister_current_thread is a no-op. And if libc has not
> registered rseq, then we are not relying on any libc behavior. I don't
> see how calling rseq_unregister_current_thread helps to rely less on
> libc. Am I missing something?

When libc does not support rseq (either because of the
glibc tunable, or having an old glibc, or running another
libc), rseq.c in the selftests registers an initial-exec
TLS rseq area. This TLS rseq area's lifetime is handled
by the libc. I don't want to depend on implementation-specific
libc behavior, unless they are documented and part of the
ABI.

In your case the area won't be re-used because the program
exits, but I'd rather use the same approach everywhere,
which is to unregister explicitly.

Note that in the librseq project, we are currently planning
to remove the explicit thread registration/unregistration
from the API, and will exclusively depend on having rseq support
provided by the libc. It simplifies the implementation, the API,
and will make it OK to link a dlopen'd .so against librseq.
There has been no official release of librseq yet, so now is
a good time to simplify the API and aim at what is becoming
the primary use-case.

Thanks,

Mathieu

> 
> 
> 
> 
>> Thanks,
>>
>> Mathieu
>>
>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mathieu
>>>>>>
>>>>>>> +     return 0;
>>>>>>> +}
>>>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>>>>>> index ba424ce80a719..65da4a727c550 100644
>>>>>>> --- a/tools/testing/selftests/rseq/rseq.h
>>>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>>>>>> @@ -8,6 +8,7 @@
>>>>>>>      #ifndef RSEQ_H
>>>>>>>      #define RSEQ_H
>>>>>>>
>>>>>>> +#include <assert.h>
>>>>>>>      #include <stdint.h>
>>>>>>>      #include <stdbool.h>
>>>>>>>      #include <pthread.h>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Desnoyers
>>>>>> EfficiOS Inc.
>>>>>> https://www.efficios.com
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ