lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEE+ybkTs4U7h-Js818k1QEqpVfHwAHSTXaEwHs3g37LwOsjLQ@mail.gmail.com>
Date:   Wed, 26 Feb 2020 12:27:36 -0500
From:   Chris Kennelly <ckennelly@...gle.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     "Joel Fernandes, Google" <joel@...lfernandes.org>,
        Paul Turner <pjt@...gle.com>,
        Florian Weimer <fweimer@...hat.com>,
        "Carlos O'Donell" <codonell@...hat.com>,
        libc-alpha <libc-alpha@...rceware.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        paulmck <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
        Brian Geffon <bgeffon@...gle.com>
Subject: Re: Rseq registration: Google tcmalloc vs glibc

On Wed, Feb 26, 2020 at 12:01 PM Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> ----- On Feb 25, 2020, at 10:38 PM, Chris Kennelly ckennelly@...gle.com wrote:
>
> > On Tue, Feb 25, 2020 at 10:25 PM Joel Fernandes <joel@...lfernandes.org> wrote:
> >>
> >> On Fri, Feb 21, 2020 at 11:13 AM Mathieu Desnoyers
> >> <mathieu.desnoyers@...icios.com> wrote:
> >> >
> >> > ----- On Feb 21, 2020, at 10:49 AM, Joel Fernandes, Google
> >> > joel@...lfernandes.org wrote:
> >> >
> >> > [...]
> >> > >>
> >> > >> 3) Use the  __rseq_abi TLS cpu_id field to know whether Rseq has been
> >> > >> registered.
> >> > >>
> >> > >> - Current protocol in the most recent glibc integration patch set.
> >> > >> - Not supported yet by Linux kernel rseq selftests,
> >> > >> - Not supported yet by tcmalloc,
> >> > >>
> >> > >> Use the per-thread state to figure out whether each thread need to register
> >> > >> Rseq individually.
> >> > >>
> >> > >> Works for integration between a library which exists for the entire lifetime
> >> > >> of the executable (e.g. glibc) and other libraries. However, it does not
> >> > >> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without
> >> > >> having a library like glibc handling the registration present.
> >> > >
> >> > > Mathieu, could you share more details about why during dlopen/close
> >> > > libraries we cannot use the same __rseq_abi TLS to detect that rseq was
> >> > > registered?
> >> >
> >> > Sure,
> >> >
> >> > A library which is only loaded and never closed during the execution of the
> >> > program can let the kernel implicitly unregister rseq at thread exit. For
> >> > the dlopen/dlclose use-case, we need to be able to explicitly unregister
> >> > each thread's __rseq_abi which sit in a library which is going to be
> >> > dlclose'd.
> >>
> >> Mathieu, Thanks a lot for the explanation, it makes complete sense. It
> >> sounds from Chris's reply that tcmalloc already checks
> >> __rseq_abi.cpu_id and is not dlopened/closed. Considering these, it
> >> seems to already handle things properly - CMIIW.
> >
> > I'll make a note about this, since we can probably benefit from some
> > more comments about the assumptions/invariants the fastpath uses.
>
> I suspect the integration with glibc and with dlopen'd/dlclose'd libraries will not
> behave correctly with the current tcmalloc implementation.
>
> Based on the tcmalloc code-base, InitFastPerCpu is only called from IsFast. As long
> as this is the only expected caller, having IsFast comparing the RseqCpuId detects
> whether glibc (or some other library) has already registered rseq for the current
> thread.
>
> However, if the application chooses to invoke InitFastPerCpu() directly, things become
> expected, because it invokes:
>
>   absl::base_internal::LowLevelCallOnce(&init_per_cpu_once, InitPerCpu);
>
> which AFAIU invokes InitPerCpu once after execution of the current program. Which
> does:
>
> static bool InitThreadPerCpu() {
>   if (__rseq_refcount++ > 0) {
>     return true;
>   }
>
>   auto ret = syscall(__NR_rseq, &__rseq_abi, sizeof(__rseq_abi), 0,
>                      PERCPU_RSEQ_SIGNATURE);
>   if (ret == 0) {
>     return true;
>   } else {
>     __rseq_refcount--;
>   }
>
>   return false;
> }
>
> static void InitPerCpu() {
>   // Based on the results of successfully initializing the first thread, mark
>   // init_status to initialize all subsequent threads.
>   if (InitThreadPerCpu()) {
>     init_status = kFastMode;
>   }
> }
>
> In a scenario where glibc has already registered Rseq, the __rseq_refcount will
> be incremented, the __NR_rseq syscall will fail with -1, errno=EBUSY, so the refcount
> will be immediately decremented and it will return false. Therefore, "init_status" will
> never be set fo kFastMode, leaving it in kSlowMode for the entire lifetime of this
> program. That being said, even though this state can come as a surprise, it seems to
> be entirely bypassed by the fast-paths IsFast() and IsFastNoInit(), so maybe it won't
> have any observable side-effects other than leaving init_status in a state that does not
> match reality.

I agree that this could potentially violate inviarants, but
InitFastPerCpu is not intended to be called by the application.

> In the other use-case where tcmalloc co-exist with a dlopened/dlclosed library, but glibc
> does not provide Rseq registration, we run into issues as well if the dlopened library
> registers rseq first for a given thread. The IsFastNoInit() expects that if Rseq has been
> observed as registered in the past for a thread, it stays registered. However, if a
> dlclosed library unregisters Rseq, we need to be prepared to re-register it. So either
> tcmalloc needs to express its use of Rseq by incrementing __rseq_refcount even when Rseq
> is registered (this would hurt the fast-path however, and I would hate to have to do this),
> or tcmalloc needs to be able to handle the fact that Rseq may be unregistered by a dlclosed
> library which was the actual owner of the Rseq registration.

We have a bit of an opportunity to figure out whether this is the
first time--from TCMalloc's perspective--a thread is doing per-CPU and
bump the __rseq_count accordingly.  I think this could be done off of
the fast path.

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ