linux-kernel - Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1595324577.x3bf55tpgu.astroid@bobo.none>
Date:   Tue, 21 Jul 2020 20:04:27 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Anton Blanchard <anton@...abs.org>, Arnd Bergmann <arnd@...db.de>,
        Jens Axboe <axboe@...nel.dk>,
        linux-arch <linux-arch@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>, x86 <x86@...nel.org>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than
 membarrier_mm_sync_core_before_usermode

Excerpts from Mathieu Desnoyers's message of July 21, 2020 2:46 am:
> ----- On Jul 19, 2020, at 11:03 PM, Nicholas Piggin npiggin@...il.com wrote:
> 
>> Excerpts from Mathieu Desnoyers's message of July 17, 2020 11:42 pm:
>>> ----- On Jul 16, 2020, at 7:26 PM, Nicholas Piggin npiggin@...il.com wrote:
>>> [...]
>>>> 
>>>> membarrier does replace barrier instructions on remote CPUs, which do
>>>> order accesses performed by the kernel on the user address space. So
>>>> membarrier should too I guess.
>>>> 
>>>> Normal process context accesses like read(2) will do so because they
>>>> don't get filtered out from IPIs, but kernel threads using the mm may
>>>> not.
>>> 
>>> But it should not be an issue, because membarrier's ordering is only with
>>> respect
>>> to submit and completion of io_uring requests, which are performed through
>>> system calls from the context of user-space threads, which are called from the
>>> right mm.
>> 
>> Is that true? Can io completions be written into an address space via a
>> kernel thread? I don't know the io_uring code well but it looks like
>> that's asynchonously using the user mm context.
> 
> Indeed, the io completion appears to be signaled asynchronously between kernel
> and user-space.

Yep, many other places do similar with use_mm.

[snip]

> So as far as membarrier memory ordering dependencies are concerned, it relies
> on the store-release/load-acquire dependency chain in the completion queue to
> order against anything that was done prior to the completed requests.
> 
> What is in-flight while the requests are being serviced provides no memory
> ordering guarantee whatsoever.

Yeah you're probably right in this case I think. Quite likely most kernel 
tasks that asynchronously write to user memory would at least have some 
kind of producer-consumer barriers.

But is that restriction of all async modifications documented and enforced
anywhere?

>> How about other memory accesses via kthread_use_mm? Presumably there is
>> still ordering requirement there for membarrier,
> 
> Please provide an example case with memory accesses via kthread_use_mm where
> ordering matters to support your concern.

I think the concern Andy raised with io_uring was less a specific 
problem he saw and more a general concern that we have these memory 
accesses which are not synchronized with membarrier.

>> so I really think
>> it's a fragile interface with no real way for the user to know how
>> kernel threads may use its mm for any particular reason, so membarrier
>> should synchronize all possible kernel users as well.
> 
> I strongly doubt so, but perhaps something should be clarified in the documentation
> if you have that feeling.

I'd rather go the other way and say if you have reasoning or numbers for 
why PF_KTHREAD is an important optimisation above rq->curr == rq->idle
then we could think about keeping this subtlety with appropriate 
documentation added, otherwise we can just kill it and remove all doubt.

That being said, the x86 sync core gap that I imagined could be fixed 
by changing to rq->curr == rq->idle test does not actually exist because
the global membarrier does not have a sync core option. So fixing the
exit_lazy_tlb points that this series does *should* fix that. So
PF_KTHREAD may be less problematic than I thought from implementation
point of view, only semantics.

Thanks,
Nick