linux-kernel - Re: [PATCH 3/3] membarrier: Propagate SYNC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVXG0A2NwiPY31G3uQYvVzbwFM80hFbVLWi8tb-_+k1dQ@mail.gmail.com>
Date:   Tue, 1 Dec 2020 10:12:34 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>, x86 <x86@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Arnd Bergmann <arnd@...db.de>,
        Anton Blanchard <anton@...abs.org>
Subject: Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions more carefully

On Tue, Dec 1, 2020 at 6:28 AM Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> ----- On Dec 1, 2020, at 5:16 AM, Peter Zijlstra peterz@...radead.org wrote:
>
> > On Mon, Nov 30, 2020 at 09:50:35AM -0800, Andy Lutomirski wrote:
> >> membarrier() carefully propagates SYNC_CORE and RSEQ actions to all
> >> other CPUs, but there are two issues.
> >>
> >>  - membarrier() does not sync_core() or rseq_preempt() the calling
> >>    CPU.  Aside from the logic being mind-bending, this also means
> >>    that it may not be safe to modify user code through an alias,
> >>    call membarrier(), and then jump to a different executable alias
> >>    of the same code.
> >
> > I always understood this to be on purpose. The calling CPU can fix up
> > itself just fine. The pain point is fixing up the other CPUs, and that's
> > where membarrier() helps.
>
> Indeed, as documented in the man page:
>
>        MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16)
>               In  addition  to  providing  the  memory ordering guarantees de‐
>               scribed in MEMBARRIER_CMD_PRIVATE_EXPEDITED,  upon  return  from
>               system call the calling thread has a guarantee that all its run‐
>               ning thread siblings have executed a core  serializing  instruc‐
>               tion.   This  guarantee is provided only for threads in the same
>               process as the calling thread.
>
> membarrier sync core guarantees a core serializing instruction on the siblings,
> not on the caller thread. This has been done on purpose given that the caller
> thread can always issue its core serializing instruction from user-space on
> its own.
>
> >
> > That said, I don't mind including self, these aren't fast calls by any
> > means.
>
> I don't mind including self either, but this would require documentation
> updates, including man pages, to state that starting from kernel Y this
> is the guaranteed behavior. It's then tricky for user-space to query what
> the behavior is unless we introduce a new membarrier command for it. So this
> could introduce issues if software written for the newer kernels runs on older
> kernels.

For rseq at least, if we do this now we don't have this issue -- I
don't think any released kernel has the rseq mode.

>
> >
> >>  - membarrier() does not explicitly sync_core() remote CPUs either;
> >>    instead, it relies on the assumption that an IPI will result in a
> >>    core sync.  On x86, I think this may be true in practice, but
> >>    it's not architecturally reliable.  In particular, the SDM and
> >>    APM do not appear to guarantee that interrupt delivery is
> >>    serializing.
> >
> > Right, I don't think we rely on that, we do rely on interrupt delivery
> > providing order though -- as per the previous email.
> >
> >>    On a preemptible kernel, IPI return can schedule,
> >>    thereby switching to another task in the same mm that was
> >>    sleeping in a syscall.  The new task could then SYSRET back to
> >>    usermode without ever executing IRET.
> >
> > This; I think we all overlooked this scenario.
>
> Indeed, this is an issue which needs to be fixed.
>
> >
> >> This patch simplifies the code to treat the calling CPU just like
> >> all other CPUs, and explicitly sync_core() on all target CPUs.  This
> >> eliminates the need for the smp_mb() at the end of the function
> >> except in the special case of a targeted remote membarrier().  This
> >> patch updates that code and the comments accordingly.
>
> I am not confident that removing the smp_mb at the end of membarrier is
> an appropriate change, nor that it simplifies the model.

Ah, but I didn't remove it.  I carefully made sure that every possible
path through the function does an smp_mb() or stronger after all the
cpu_rq reads.  ipi_func(), on_each_cpu(), and the explicit smp_mb()
cover the three cases.

That being said, if you prefer, I can make the change to skip the
calling CPU, in which case I'll leave the smp_mb() at the end alone.