[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y1BGYv+Vn_vNd50BM=5b8NKchqV6OTVhRj1_P7x-abbA@mail.gmail.com>
Date: Wed, 21 May 2025 10:57:05 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
aruna.ramakrishna@...cle.com, elver@...gle.com,
"Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org,
Michael Jeanson <mjeanson@...icios.com>
Subject: Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
On Mon, 10 Mar 2025 at 18:26, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> On 2025-03-10 12:31, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
> > <mathieu.desnoyers@...icios.com> wrote:
> >>
> >> On 2025-03-10 10:43, Dmitry Vyukov wrote:
> >>> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> >>> <mathieu.desnoyers@...icios.com> wrote:
> >>>>
> >>>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> >>>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> >>>>> <mathieu.desnoyers@...icios.com> wrote:
> >>>>>>
> >>>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>>>>>> works and does not lead to process kills.
> >>>>>>>
> >>>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> >>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> >>>>>>> Cc: Peter Zijlstra <peterz@...radead.org>
> >>>>>>> Cc: "Paul E. McKenney" <paulmck@...nel.org>
> >>>>>>> Cc: Boqun Feng <boqun.feng@...il.com>
> >>>>>>> Cc: Thomas Gleixner <tglx@...utronix.de>
> >>>>>>> Cc: Ingo Molnar <mingo@...hat.com>
> >>>>>>> Cc: Borislav Petkov <bp@...en8.de>
> >>>>>>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> >>>>>>> Cc: "H. Peter Anvin" <hpa@...or.com>
> >>>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
> >>>>>>> Cc: x86@...nel.org
> >>>>>>> Cc: linux-kernel@...r.kernel.org
> >>>>>>> Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
> >>>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>>>>>
> >>>>>>> ---
> >>>>>>> Changes in v5:
> >>>>>>> - Use static for variables/functions
> >>>>>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>>>>>
> >>>>>>> Changes in v4:
> >>>>>>> - Added Fixes tag
> >>>>>>>
> >>>>>>> Changes in v3:
> >>>>>>> - added Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
> >>>>>>> - rework the test to work when only pkey 0 is supported for rseq
> >>>>>>>
> >>>>>>> Changes in v2:
> >>>>>>> - change test to install protected rseq_cs instead of rseq
> >>>>>>> ---
> >>>>>>> tools/testing/selftests/rseq/Makefile | 2 +-
> >>>>>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>>>>> tools/testing/selftests/rseq/rseq.h | 1 +
> >>>>>>> 3 files changed, 100 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>>>>>> index 5a3432fceb586..9111d25fea3af 100644
> >>>>>>> --- a/tools/testing/selftests/rseq/Makefile
> >>>>>>> +++ b/tools/testing/selftests/rseq/Makefile
> >>>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>>>>>
> >>>>>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>>>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>>>>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>>>>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>>>>>
> >>>>>>> TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>>>>>> new file mode 100644
> >>>>>>> index 0000000000000..cc4dd98190942
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>>>>>> @@ -0,0 +1,98 @@
> >>>>>>> +// SPDX-License-Identifier: LGPL-2.1
> >>>>>>> +/*
> >>>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>>>>>> + */
> >>>>>>> +
> >>>>>>> +#define _GNU_SOURCE
> >>>>>>> +#include <err.h>
> >>>>>>> +#include <errno.h>
> >>>>>>> +#include <stdio.h>
> >>>>>>> +#include <stdlib.h>
> >>>>>>> +#include <string.h>
> >>>>>>> +#include <sys/mman.h>
> >>>>>>> +#include <sys/syscall.h>
> >>>>>>> +#include <ucontext.h>
> >>>>>>> +#include <unistd.h>
> >>>>>>> +
> >>>>>>> +#include "rseq.h"
> >>>>>>> +#include "rseq-abi.h"
> >>>>>>> +
> >>>>>>> +static int pkey;
> >>>>>>> +static ucontext_t ucp0, ucp1;
> >>>>>>> +
> >>>>>>> +static void coroutine(void)
> >>>>>>> +{
> >>>>>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>>>>>> + /*
> >>>>>>> + * When we disable access to pkey 0, globals and TLS become
> >>>>>>> + * inaccessible too, so we need to tread carefully.
> >>>>>>> + * Pkey is global so we need to copy it onto the stack.
> >>>>>>> + */
> >>>>>>> + int pk = RSEQ_READ_ONCE(pkey);
> >>>>>>> + struct timespec ts;
> >>>>>>> +
> >>>>>>> + orig_pk0 = pkey_get(0);
> >>>>>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>>>>>> + err(1, "pkey_set failed");
> >>>>>>> + old_pk0 = pkey_get(0);
> >>>>>>> + old_pk1 = pkey_get(pk);
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> + * Prevent compiler from initializing it by loading a 16-global.
> >>>>>>> + */
> >>>>>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>>>>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>>>>>> + /*
> >>>>>>> + * If the kernel misbehaves, context switches in the following loop
> >>>>>>> + * will terminate the process with SIGSEGV.
> >>>>>>> + * Trigger preemption w/o accessing TLS.
> >>>>>>> + * Note that glibc's usleep touches errno always.
> >>>>>>> + */
> >>>>>>> + for (i = 0; i < 10; i++)
> >>>>>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>>>>>> +
> >>>>>>> + pk0 = pkey_get(0);
> >>>>>>> + pk1 = pkey_get(pk);
> >>>>>>> + if (pkey_set(0, orig_pk0))
> >>>>>>> + err(1, "pkey_set failed");
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> + * Ensure that the kernel has restored the previous value of pkeys
> >>>>>>> + * register after changing them.
> >>>>>>> + */
> >>>>>>> + if (old_pk0 != pk0)
> >>>>>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>>>>>> + if (old_pk1 != pk1)
> >>>>>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>>>>>> +
> >>>>>>> + swapcontext(&ucp1, &ucp0);
> >>>>>>> + abort();
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +int main(int argc, char **argv)
> >>>>>>> +{
> >>>>>>> + pkey = pkey_alloc(0, 0);
> >>>>>>> + if (pkey == -1) {
> >>>>>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>>>>>> + strerror(errno));
> >>>>>>> + return 0;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + if (rseq_register_current_thread())
> >>>>>>> + err(1, "rseq_register_current_thread failed");
> >>>>>>> +
> >>>>>>> + if (getcontext(&ucp1))
> >>>>>>> + err(1, "getcontext failed");
> >>>>>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>>>>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>>>>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>>>>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>>>>>> + err(1, "mmap failed");
> >>>>>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>>>>>> + PROT_READ | PROT_WRITE, pkey))
> >>>>>>> + err(1, "pkey_mprotect failed");
> >>>>>>> + makecontext(&ucp1, coroutine, 0);
> >>>>>>> + if (swapcontext(&ucp0, &ucp1))
> >>>>>>> + err(1, "swapcontext failed");
> >>>>>>
> >>>>>> Should the rseq register be paired with a rseq unregister here ?
> >>>>>
> >>>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
> >>>>> unregister on thread exit. Even for a libc it may be a bad idea due to
> >>>>> signals.
> >>>>
> >>>> The rseq register/unregister is only for the case where libc does not
> >>>> support rseq. But if you want to use rseq_register_current_thread(),
> >>>> then you'll want to pair it with unregister.
> >>>
> >>> Why? Isn't it better to have tests more realitic?
> >>
> >> If you use rseq.c rseq_register_current_thread without
> >> rseq_unregister_current_thread, then you rely on implicit
> >> unregistration done by the kernel at thread exit.
> >>
> >> Then you need to ensure your userspace runtime keep the TLS
> >> that holds the rseq area allocated for the entire execution
> >> of the thread until it exits in the kernel. Note that
> >> disabling signals is not sufficient to prevent the kernel
> >> from accessing the rseq area.
> >>
> >> GNU libc gets away with automatic unregistration at
> >> thread exit because it completely controls the lifetime
> >> of the registered rseq area, keeping it allocated for as
> >> long as the thread is executing.
> >>
> >> So in order to minimize the dependency on specific libc
> >> behavior in the kernel sefltests, the selftests rseq.h
> >> requires explicit registration *and* unregistration.
> >
> > If libc registers rseq (!rseq_ownership), then
> > rseq_unregister_current_thread is a no-op. And if libc has not
> > registered rseq, then we are not relying on any libc behavior. I don't
> > see how calling rseq_unregister_current_thread helps to rely less on
> > libc. Am I missing something?
>
> When libc does not support rseq (either because of the
> glibc tunable, or having an old glibc, or running another
> libc), rseq.c in the selftests registers an initial-exec
> TLS rseq area. This TLS rseq area's lifetime is handled
> by the libc. I don't want to depend on implementation-specific
> libc behavior, unless they are documented and part of the
> ABI.
>
> In your case the area won't be re-used because the program
> exits, but I'd rather use the same approach everywhere,
> which is to unregister explicitly.
>
> Note that in the librseq project, we are currently planning
> to remove the explicit thread registration/unregistration
> from the API, and will exclusively depend on having rseq support
> provided by the libc. It simplifies the implementation, the API,
> and will make it OK to link a dlopen'd .so against librseq.
> There has been no official release of librseq yet, so now is
> a good time to simplify the API and aim at what is becoming
> the primary use-case.
Hi,
I was distracted by other things for a while.
But now resent rebased series v7 with this fix.
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>
> >>>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Mathieu
> >>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Mathieu
> >>>>>>
> >>>>>>> + return 0;
> >>>>>>> +}
> >>>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>>>>>> index ba424ce80a719..65da4a727c550 100644
> >>>>>>> --- a/tools/testing/selftests/rseq/rseq.h
> >>>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>>>>>> @@ -8,6 +8,7 @@
> >>>>>>> #ifndef RSEQ_H
> >>>>>>> #define RSEQ_H
> >>>>>>>
> >>>>>>> +#include <assert.h>
> >>>>>>> #include <stdint.h>
> >>>>>>> #include <stdbool.h>
> >>>>>>> #include <pthread.h>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Mathieu Desnoyers
> >>>>>> EfficiOS Inc.
> >>>>>> https://www.efficios.com
> >>>>
> >>>>
> >>>> --
> >>>> Mathieu Desnoyers
> >>>> EfficiOS Inc.
> >>>> https://www.efficios.com
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
Powered by blists - more mailing lists