[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190131184749.ic7pwxlxvpd2k7hn@ast-mbp.dhcp.thefacebook.com>
Date: Thu, 31 Jan 2019 10:47:50 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc: Will Deacon <will.deacon@....com>,
Peter Zijlstra <peterz@...radead.org>,
Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net,
daniel@...earbox.net, jakub.kicinski@...ronome.com,
netdev@...r.kernel.org, kernel-team@...com, mingo@...hat.com,
jannh@...gle.com
Subject: Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce
bpf_spin_lock
On Thu, Jan 31, 2019 at 06:01:56AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 30, 2019 at 02:57:43PM -0800, Alexei Starovoitov wrote:
> > On Wed, Jan 30, 2019 at 01:05:36PM -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 30, 2019 at 11:51:14AM -0800, Alexei Starovoitov wrote:
> > > > On Wed, Jan 30, 2019 at 10:36:18AM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Jan 30, 2019 at 06:11:00PM +0000, Will Deacon wrote:
> > > > > > Hi Alexei,
> > > > > >
> > > > > > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > > > > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > > > > > > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > > > > > > > What I want to avoid is to define the whole execution ordering model upfront.
> > > > > > > > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > > > > > > > Most of the bpf progs are written and running on x86. We shouldn't
> > > > > > > > > twist bpf developer's arm by artificially relaxing memory model.
> > > > > > > > > BPF memory model is equal to memory model of underlying architecture.
> > > > > > > > > What we can do is to make it bpf progs a bit more portable with
> > > > > > > > > smp_rmb instructions, but we must not force weak execution on the developer.
> > > > > > > >
> > > > > > > > Well, I agree with only introducing bits you actually need, and my
> > > > > > > > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > > > > > > > smp_store_release() might have been a far more useful example.
> > > > > > > >
> > > > > > > > But I disagree with the last part; we have to pick a model now;
> > > > > > > > otherwise you'll pain yourself into a corner.
> > > > > > > >
> > > > > > > > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > > > > > > > be gaining a lot of attention and that is very much a weak architecture.
> > > > > > > > Adding strongly ordered assumptions to BPF now, will penalize them in
> > > > > > > > the long run.
> > > > > > >
> > > > > > > arm64 is gaining attention just like riscV is gaining it too.
> > > > > > > BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> > > > > > > BPF is not picking sides in CPU HW and ISA battles.
> > > > > >
> > > > > > It's not about picking a side, it's about providing an abstraction of the
> > > > > > various CPU architectures out there so that the programmer doesn't need to
> > > > > > worry about where their program may run. Hell, even if you just said "eBPF
> > > > > > follows x86 semantics" that would be better than saying nothing (and then we
> > > > > > could have a discussion about whether x86 semantics are really what you
> > > > > > want).
> > > > >
> > > > > To reinforce this point, the Linux-kernel memory model (tools/memory-model)
> > > > > is that abstraction for the Linux kernel. Why not just use that for BPF?
> > > >
> > > > I already answered this earlier in the thread.
> > > > tldr: not going to sacrifice performance.
> > >
> > > Understood.
> > >
> > > But can we at least say that where there are no performance consequences,
> > > BPF should follow LKMM? You already mentioned smp_load_acquire()
> > > and smp_store_release(), but the void atomics (e.g., atomic_inc())
> > > should also work because they don't provide any ordering guarantees.
> > > The _relaxed(), _release(), and _acquire() variants of the value-returning
> > > atomics should be just fine as well.
> > >
> > > The other value-returning atomics have strong ordering, which is fine
> > > on many systems, but potentially suboptimal for the weakly ordered ones.
> > > Though you have to have pretty good locality of reference to be able to
> > > see the difference, because otherwise cache-miss overhead dominates.
> > >
> > > Things like cmpxchg() don't seem to fit BPF because they are normally
> > > used in spin loops, though there are some non-spinning use cases.
> > >
> > > You correctly pointed out that READ_ONCE() and WRITE_ONCE() are suboptimal
> > > on systems that don't support all sizes of loads, but I bet that there
> > > are some sizes for which they are just fine across systems, for example,
> > > pointer size and int size.
> > >
> > > Does that help? Or am I missing additional cases where performance
> > > could be degraded?
> >
> > bpf doesn't have smp_load_acquire, atomic_fetch_add, xchg, fence instructions.
> > They can be added step by step. That's easy.
> > I believe folks already started working on adding atomic_fetch_add.
> > What I have problem with is making a statement today that bpf's end
> > goal is LKMM. Even after adding all sorts of instructions it may
> > not be practical.
> > Only when real use case requires adding new instruction we do it.
> > Do you have a bpf program that needs smp_load_acquire ?
>
> We seem to be talking past each other. Let me try again...
>
> I believe that if BPF adds a given concurrency feature, it should follow
> LKMM unless there is some specific problem with its doing so.
>
> My paragraphs in my previous email list the concurrency features BPF
> could follow LKMM without penalty, should BPF choose to add them.
>
> Does that help?
yeah. we're talking past each other indeed.
Doesn't look like that more emails will help.
Let's resolve it either f2f during next conference or join our bi-weekly
bpf bluejeans call Wed 11am pacific.
Reminders and links are on this list
https://lists.iovisor.org/g/iovisor-dev/messages?p=created,0,,20,2,0,0
Powered by blists - more mailing lists