[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bbbffbae-2c05-b07c-21f3-0371f02aa7f7@gmail.com>
Date: Thu, 21 Dec 2017 19:36:51 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org
Cc: jakub.kicinski@...ronome.com
Subject: Re: [Patch net-next] net_sched: call qdisc_reset() with qdisc lock
On 12/21/2017 04:03 PM, Cong Wang wrote:
> qdisc_reset() should always be called with qdisc spinlock
> and with BH disabled, otherwise qdisc ->reset() could race
> with TX BH.
>
hmm I don't see how this fixes the issue. pfifo_fast is no longer
using the qdisc lock so that doesn't help. And it is only a
local_bh_disable.
> Fixes: 7bbde83b1860 ("net: sched: drop qdisc_reset from dev_graft_qdisc")
> Reported-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
> Cc: John Fastabend <john.fastabend@...il.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
> ---
> net/sched/sch_generic.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 10aaa3b615ce..00ddb5f8f430 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1097,8 +1097,11 @@ static void dev_qdisc_reset(struct net_device *dev,
> {
> struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
>
> - if (qdisc)
> + if (qdisc) {
> + spin_lock_bh(qdisc_lock(qdisc));
> qdisc_reset(qdisc);
> + spin_unlock_bh(qdisc_lock(qdisc));
> + }
> }
>
> /**
>
OK first the cases to get to qdisc_reset that I've tracked
down are,
dev_shutdown()
qdisc_destroy()
dev_deactivate_many()
dev_qdisc_reset() <- for each txq
qdisc_reset()
chained calls from qdisc_reset ops
At the moment all the lockless qdiscs don't care about chained
calls so we can ignore that, but would be nice to keep in mind.
Next qdisc_reset() is doing a couple things calling the qdisc
ops reset call but also walking gso_skb and skb_bad_txq. The
'unlink' operations there are not safe to be called while an
enqueue/dequeue op is in-flight. Also pfifo_fast's reset op
is not safe to be called with enqueue/dequeue ops in-flight.
So I've made the assumption that qdisc_reset is _only_ ever
called after a qdisc is no longer attached on the enqueue
dev_xmit side and also any in-progress tx_action calls are
completed. For what its worth this has always been the assumption
AFAIK.
So those are the assumptions what did I miss?
The biggest gap I see is dev_deactivate_many() is supposed
to wait for all tx_action calls to complete, this bit:
/* Wait for outstanding qdisc_run calls. */
list_for_each_entry(dev, head, close_list) {
while (some_qdisc_is_busy(dev))
yield();
Where some_qdisc_is_busy in the lockless case only does
the following,
if (q->flags & TCQ_F_NOLOCK) {
val = test_bit(__QDISC_STATE_SCHED, &q->state);
Oops that only tells us something is scheduled nothing about
if something is running. Excerpt from tx_action side here
clear_bit(__QDISC_STATE_SCHED, &q->state);
qdisc_run(q);
For comparison here is the check in the qdisc locked branch of
some_qdisc_is_busy,
root_lock = qdisc_lock(q);
spin_lock_bh(root_lock);
val = (qdisc_is_running(q) ||
test_bit(__QDISC_STATE_SCHED, &q->state));
spin_unlock_bh(root_lock);
previously we had the qdisc_lock() to protect the bit clear and then in
the some_qdisc_is_busy case we still did a qdisc_is_running() call. The
easiest fix I see is add the qdisc_is_running() check to the TCQ_F_NOLOCK
case in some_qdisc_is_busy AND to be correct I guess we should set the
running bit before clearing the QDISC_STATE_SCHED bit to be correct.
Untested but perhaps as simple as,
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ab497ef..720829e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1071,7 +1071,8 @@ static bool some_qdisc_is_busy(struct net_device *dev)
q = dev_queue->qdisc_sleeping;
if (q->flags & TCQ_F_NOLOCK) {
- val = test_bit(__QDISC_STATE_SCHED, &q->state);
+ val = (qdisc_is_running(q) ||
+ test_bit(__QDISC_STATE_SCHED, &q->state));
} else {
root_lock = qdisc_lock(q);
spin_lock_bh(root_lock);
Thanks,
John
Powered by blists - more mailing lists