[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190128213710.vjxnc2eq5rsisgfx@ast-mbp>
Date: Mon, 28 Jan 2019 13:37:12 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net,
daniel@...earbox.net, jakub.kicinski@...ronome.com,
netdev@...r.kernel.org, kernel-team@...com, mingo@...hat.com,
will.deacon@....com, Paul McKenney <paulmck@...ux.vnet.ibm.com>,
jannh@...gle.com
Subject: Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
On Mon, Jan 28, 2019 at 09:43:10AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 25, 2019 at 03:42:43PM -0800, Alexei Starovoitov wrote:
> > On Fri, Jan 25, 2019 at 10:10:57AM +0100, Peter Zijlstra wrote:
>
> > > What about the progs that run from SoftIRQ ? Since that bpf_prog_active
> > > thing isn't inside BPF_PROG_RUN() what is to stop say:
> > >
> > > reuseport_select_sock()
> > > ...
> > > BPF_PROG_RUN()
> > > bpf_spin_lock()
> > > <IRQ>
> > > ...
> > > BPF_PROG_RUN()
> > > bpf_spin_lock() // forever more
> > >
> > > </IRQ>
> > >
> > > Unless you stick that bpf_prog_active stuff inside BPF_PROG_RUN itself,
> > > I don't see how you can fundamentally avoid this happening (now or in
> > > the future).
>
> > But your issue above is valid.
>
> > We don't use bpf_prog_active for networking progs, since we allow
> > for one level of nesting due to the classic SKF_AD_PAY_OFFSET legacy.
> > Also we allow tracing progs to nest with networking progs.
> > People using this actively.
> > Typically it's not an issue, since in networking there is no
> > arbitrary nesting (unlike kprobe/nmi in tracing),
> > but for bpf_spin_lock it can be, since the same map can be shared
> > by networking and tracing progs and above deadlock would be possible:
> > (first BPF_PROG_RUN will be from networking prog, then kprobe+bpf's
> > BPF_PROG_RUN accessing the same map with bpf_spin_lock)
> >
> > So for now I'm going to allow bpf_spin_lock in networking progs only,
> > since there is no arbitrary nesting there.
>
> Isn't that still broken? AFAIU networking progs can happen in task
> context (TX) and SoftIRQ context (RX), which can nest.
Sure. sendmsg side of networking can be interrupted by napi receive.
Both can have bpf progs attached at different points, but napi won't run
when bpf prog is running, because bpf prog disables preemption.
More so the whole networking stack can be recursive and there is
xmit_recursion counter to check for bad cases.
When bpf progs interact with networking they don't add to that recursion.
All of *redirect*() helpers do so outside of bpf preempt disabled context.
Also there is no nesting of the same networking prog type.
Like xdp/tc/lwt/cgroup bpf progs cannot be called recursively by design.
There are no arbitrary entry points unlike kprobe/tracepoint.
The only nesting is when socket filter _classic_ bpf prog is calling
SKF_AD_PAY_OFFSET legacy. That calls flow dissector which may call flow dissector
bpf prog. Classic bpf doesn't use bpf maps, so no deadlock issues.
> > And once we figure out the safety concerns for kprobe/tracepoint progs
> > we can enable bpf_spin_lock there too.
> > NMI bpf progs will never have bpf_spin_lock.
>
> kprobe is like NMI, since it pokes an INT3 instruction which can trigger
> in the middle of IRQ-disabled or even in NMIs. Similar arguments can be
> made for tracepoints, they can happen 'anywhere'.
exactly. that's why there is bpf_prog_active to protect the kernel in general
for tracing bpf progs.
Powered by blists - more mailing lists