lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPNVh5dhoSmoxXabnDg5w1CnwVz0J_nS09xQbQ4cJu-UNa26Lg@mail.gmail.com>
Date:   Fri, 10 Sep 2021 11:00:56 -0700
From:   Peter Oskolkov <posk@...gle.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Florian Weimer <fweimer@...hat.com>,
        Prakash Sangappa <prakash.sangappa@...cle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, Paul Turner <pjt@...gle.com>,
        Jann Horn <jannh@...gle.com>, Peter Oskolkov <posk@...k.io>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RESEND RFC PATCH 0/3] Provide fast access to thread specific data

On Fri, Sep 10, 2021 at 10:55 AM Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> ----- On Sep 10, 2021, at 1:48 PM, Peter Oskolkov posk@...gle.com wrote:
>
> > On Fri, Sep 10, 2021 at 10:33 AM Mathieu Desnoyers
> > <mathieu.desnoyers@...icios.com> wrote:
> >>
> >> ----- On Sep 10, 2021, at 12:37 PM, Florian Weimer fweimer@...hat.com wrote:
> >>
> >> > * Peter Oskolkov:
> >> >
> >> >> In short, due to the need to read/write to the userspace from
> >> >> non-sleepable contexts in the kernel it seems that we need to have some
> >> >> form of per task/thread kernel/userspace shared memory that is pinned,
> >> >> similar to what your sys_task_getshared does.
> >> >
> >> > In glibc, we'd also like to have this for PID and TID.  Eventually,
> >> > rt_sigprocmask without kernel roundtrip in most cases would be very nice
> >> > as well.  For performance and simplicity in userspace, it would be best
> >> > if the memory region could be at the same offset from the TCB for all
> >> > threads.
> >> >
> >> > For KTLS, the idea was that the auxiliary vector would contain size and
> >> > alignment of the KTLS.  Userspace would reserve that memory, register it
> >> > with the kernel like rseq (or the robust list pointers), and pass its
> >> > address to the vDSO functions that need them.  The last part ensures
> >> > that the vDSO functions do not need non-global data to determine the
> >> > offset from the TCB.  Registration is still needed for the caches.
> >> >
> >> > I think previous discussions (in the KTLS and rseq context) did not have
> >> > the pinning constraint.
> >>
> >> If this data is per-thread, and read from user-space, why is it relevant
> >> to update this data from non-sleepable kernel context rather than update it as
> >> needed on return-to-userspace ? When returning to userspace, sleeping due to a
> >> page fault is entirely acceptable. This is what we currently do for rseq.
> >>
> >> In short, the data could be accessible from the task struct. Flags in the
> >> task struct can let return-to-userspace know that it has outdated ktls
> >> data. So before returning to userspace, the kernel can copy the relevant data
> >> from the task struct to the shared memory area, without requiring any pinning.
> >>
> >> What am I missing ?
> >
> > I can't speak about other use cases, but in the context of userspace
> > scheduling, the information that a task has blocked in the kernel and
> > is going to be removed from its runqueue cannot wait to be delivered
> > to the userspace until the task wakes up, as the userspace scheduler
> > needs to know of the even when it happened so that it can schedule
> > another task in place of the blocked one. See the discussion here:
> >
> > https://lore.kernel.org/lkml/CAG48ez0mgCXpXnqAUsa0TcFBPjrid-74Gj=xG8HZqj2n+OPoKw@mail.gmail.com/
>
> OK, just to confirm my understanding, so the use-case here is per-thread
> state which can be read by other threads (in this case the userspace scheduler) ?

Yes, exactly! And sometimes these other threads have to read/write the
state while they are themselves in preempt_disabled regions in the
kernel. There could be a way to do that asynchronously (e.g. via
workpools), but this will add latency and complexity.

>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ