netdev - Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190128215623.6eqskzhklydhympa@ast-mbp>
Date:   Mon, 28 Jan 2019 13:56:24 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net,
        daniel@...earbox.net, jakub.kicinski@...ronome.com,
        netdev@...r.kernel.org, kernel-team@...com, mingo@...hat.com,
        will.deacon@....com, Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        jannh@...gle.com
Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce
 bpf_spin_lock

On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > On Fri, Jan 25, 2019 at 11:23:12AM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 24, 2019 at 03:58:59PM -0800, Alexei Starovoitov wrote:
> > > > On Thu, Jan 24, 2019 at 07:01:09PM +0100, Peter Zijlstra wrote:
> > > 
> > > > > And this would again be the moment where I go pester you about the BPF
> > > > > memory model :-)
> > > > 
> > > > hehe :)
> > > > How do you propose to define it in a way that it applies to all archs
> > > > and yet doesn't penalize x86 ?
> > > > "Assume minimum execution ordering model" the way kernel does
> > > > unfortunately is not usable, since bpf doesn't have a luxury
> > > > of using nice #defines that convert into nops on x86.
> > > 
> > > Why not? Surely the JIT can fix it up? That is, suppose you were to have
> > > smp_rmb() as a eBPF instruction then the JIT (which knows what
> > > architecture it is targeting) can simply avoid emitting instructions for
> > > it.
> > 
> > I'm all for adding new instructions that solve real use cases.
> > imo bpf_spin_lock() is the first step in helping with concurrency.
> > At plumbers conference we agreed to add new sync_fetch_and_add()
> > and xchg() instructions. That's a second step.
> > smp_rmb/wmb/mb should be added as well.
> > JITs will patch them depending on architecture.
> > 
> > What I want to avoid is to define the whole execution ordering model upfront.
> > We cannot say that BPF ISA is weakly ordered like alpha.
> > Most of the bpf progs are written and running on x86. We shouldn't
> > twist bpf developer's arm by artificially relaxing memory model.
> > BPF memory model is equal to memory model of underlying architecture.
> > What we can do is to make it bpf progs a bit more portable with
> > smp_rmb instructions, but we must not force weak execution on the developer.
> 
> Well, I agree with only introducing bits you actually need, and my
> smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> smp_store_release() might have been a far more useful example.
> 
> But I disagree with the last part; we have to pick a model now;
> otherwise you'll pain yourself into a corner.
> 
> Also; Alpha isn't very relevant these days; however ARM64 does seem to
> be gaining a lot of attention and that is very much a weak architecture.
> Adding strongly ordered assumptions to BPF now, will penalize them in
> the long run.

arm64 is gaining attention just like riscV is gaining it too.
BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
BPF is not picking sides in CPU HW and ISA battles.
Memory model is CPU HW design decision. BPF ISA cannot dictate HW design.
We're not saying today that BPF is strongly ordered.
BPF load/stores are behaving differently on x86 vs arm64.
We can add new instructions, but we cannot 'define' how load/stores behave
from memory model perspective.
For example, take atomicity of single byte load/store.
Not all archs have them atomic, but we cannot say to bpf programmers
to always assume non-atomic byte loads.

> > > Similarly; could something like this not also help with the spinlock
> > > thing? Use that generic test-and-set thing for the interpreter, but
> > > provide a better implementation in the JIT?
> > 
> > May be eventually. We can add cmpxchg insn, but the verifier still
> > doesn't support loops. We made a lot of progress in bounded loop research
> > over the last 2 years, but loops in bpf are still a year away.
> > We considered adding 'bpf_spin_lock' as a new instruction instead of helper call,
> > but that approach has a lot of negatives, so we went with the helper.
> 
> Ah, but the loop won't be in the BPF program itself. The BPF program
> would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
> code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
> out-of-line versions of them).

As I said we considered exactly that and such approach has a lot of downsides
comparing with the helper approach.
Pretty much every time new feature is added we're evaluating whether it
should be new instruction or new helper. 99% of the time we go with new helper.

> There isn't anything that mandates the JIT uses the exact same locking
> routines the interpreter does, is there?

sure. This bpf_spin_lock() helper can be optimized whichever way the kernel wants.
Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map types.
JITs don't even need to do anything. It looks like function call from bpf prog
point of view, but in JITed code it is a sequence of native instructions.

Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock()
takes too much time then we can inline fast path of queued_spin_lock
directly into bpf prog and save function call cost.