netdev - Re: Get rid of RCU callbacks in TC filters?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20171020165623.GM3521@linux.vnet.ibm.com>
Date:   Fri, 20 Oct 2017 09:56:23 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Jamal Hadi Salim <jhs@...atatu.com>,
        Chris Mi <chrism@...lanox.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>,
        Jiri Pirko <jiri@...nulli.us>
Subject: Re: Get rid of RCU callbacks in TC filters?

On Thu, Oct 19, 2017 at 08:26:01PM -0700, Cong Wang wrote:
> On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
> >> Hi, all
> >>
> >> Recently, the RCU callbacks used in TC filters and TC actions keep
> >> drawing my attention, they introduce at least 4 race condition bugs:
> >>
> >> 1. A simple one fixed by Daniel:
> >>
> >> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
> >> Author: Daniel Borkmann <daniel@...earbox.net>
> >> Date:   Wed May 20 17:13:33 2015 +0200
> >>
> >>     net: sched: fix call_rcu() race on classifier module unloads
> >>
> >> 2. A very nasty one fixed by me:
> >>
> >> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
> >> Author: Cong Wang <xiyou.wangcong@...il.com>
> >> Date:   Mon Sep 11 16:33:32 2017 -0700
> >>
> >>     net_sched: carefully handle tcf_block_put()
> >>
> >> 3. Two more bugs found by Chris:
> >> https://patchwork.ozlabs.org/patch/826696/
> >> https://patchwork.ozlabs.org/patch/826695/
> >>
> >>
> >> Usually RCU callbacks are simple, however for TC filters and actions,
> >> they are complex because at least TC actions could be destroyed
> >> together with the TC filter in one callback. And RCU callbacks are
> >> invoked in BH context, without locking they are parallel too. All of
> >> these contribute to the cause of these nasty bugs. It looks like they
> >> bring us more problems than benefits.
> >>
> >> Therefore, I have been thinking about getting rid of these callbacks,
> >> because they are not strictly necessary, callers of these call_rcu()
> >> are all on slow path and have RTNL lock, so blocking is permitted in
> >> their contexts, and _I think_ it does not harm to use
> >> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
> >> already there and is a bottleneck if we really care. :)
> >>
> >> There are 3 solutions here:
> >>
> >> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
> >> downside is this could hurt the performance of deleting TC filters,
> >> but again it is slow path comparing to skb classification path. Note,
> >> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
> >> because many call_rcu()'s are actually in list iterations, we have to
> >> use a local list and call list_del_rcu()+list_add() before
> >> synchronize_rcu() (Or is there any other API I am not aware of?). If
> >> people really hate synchronize_rcu() because of performance, we could
> >> also defer the work to a workqueue and callers could keep their
> >> performance as they are.
> >>
> >> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
> >> said in commit 1697c4bb5245 ("net_sched: carefully handle
> >> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
> >> Potentially we need to do a lot of work to make it possible, if not
> >> impossible.
> >>
> >> 3) Keep these RCU callbacks and fix all race conditions. Like what
> >> Chris tries to do in his patchset, but my argument is that we can not
> >> prove we are really race-free even with Chris' patches and his patches
> >> are already large enough.
> >>
> >>
> >> What do you think? Any other ideas?
> >
> > 4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
> > synchronize_rcu() for multiple deletions/iterations.
> 
> This is what I meant by using a local list, perhaps I didn't make it clear.

Ah, got it.

> > 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
> > The workqueue could then use blocking primitives, for example, acquiring
> > RTNL.
> 
> Yeah, this could work too but we would get one more async...
> 
> filter delete -> call_rcu() -> schedule_work() -> action destroy

True, but on the other hand you get to hold RTNL.

> > 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
> > workqueue scheduling using a timer.  This would reduce the number of
> > RTNL acquisitions.
> 
> Ouch, sounds like even one more async:
> 
> filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work()
> -> action destroy
> 
> :-(

Indeed, the price of scalability and performance is often added
asynchronous action at a distance.  But sometimes you can have
scalability, performance, -and- synchronous action.  Not sure that this
is one of those cases, but perhaps someone will come up with some trick
that we are not yet seeing.

And again, one benefit you get from the added asynchrony is the ability
to acquire RTNL.  Another is increased batching, allowing the overhead
of acquiring RTNL to be amortized over a larger number of updates.

> > 7) As with #5, have the RCU callback schedule a workqueue, but have each
> > iterator accumulate a list of things removed and do call_rcu() on the
> > list.  This is an alternative way of aggregating to reduce the number
> > of RTNL acquisitions.
> 
> Yeah, this seems working too.
> 
> > There are many other ways to skin this cat.
> 
> We still have to pick one. :) Any preference? I want to keep it as simple
> as possible, otherwise some day I would not understand it either.

I must defer to the people who actually fully understand this code.

> Thanks for all the ideas!

Please let me know how it goes!

							Thanx, Paul