[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171221083908.GA1930@nanopsycho>
Date: Thu, 21 Dec 2017 09:39:08 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: John Fastabend <john.fastabend@...il.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kubakici@...pl>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU
callback"
Thu, Dec 21, 2017 at 12:34:05AM CET, john.fastabend@...il.com wrote:
>On 12/20/2017 03:23 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
>> <john.fastabend@...il.com> wrote:
>>> On 12/20/2017 02:41 PM, Cong Wang wrote:
>>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>>>> <john.fastabend@...il.com> wrote:
>>>>> RCU grace period is needed for lockless qdiscs added in the commit
>>>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>>>>
>>>>> It is needed now that qdiscs may be lockless otherwise we risk
>>>>> free'ing a qdisc that is still in use from datapath. Additionally,
>>>>> push list cleanup into RCU callback. Otherwise we risk the datapath
>>>>> adding skbs during removal.
>>>>
>>>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
>>>> It doesn't work with your "lockless" patches?
>>>>
>>>
>>> Well this is only in the 'parent == NULL' case otherwise we call
>>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
>>> sch_tree_lock().
>>>
>>> The only converted qdisc mq and mqprio at this point don't care
>>> though and do their own dev_deactivate/activate. So its not fixing
>>> anything in the above mentioned commit.
>>
>> Sure, removing a class does not impact the whole device,
>> but removing the root qdisc does.
>>
>> After your "lockless", skb_array_consume_bh() is called in
>> pfifo_fast_reset() and ptr_ring_cleanup() is called in
>> pfifo_fast_destroy(), assuming skb_array is not buggy, what race
>> do we have here with datapath?
>>
>
>None at the moment.
>
>>
>>>
>>> I still think it will need to be done eventually. If it resolves
>>> the miniq case it seems like a good idea. Although per Jakub's comment
>>> perhaps I pulled too much into the RCU handler.
>>
>> The case Jakub reported is a RCU callback missing a rcu
>> barrier. I don't understand why you keep believing it is RCU
>> readers on datapath.>
>> Not even to mention ingress is not affected by your "lockless"
>> thing.
>>
>
>I was thinking about the case where we want a lockless qdisc
>with classes. Doing the qdisc destroy after a grace period would
>solve this. Also we could start to cleanup a lot of the locking
>and extra bits around 'running' qdisc and such by doing a clean
>xchg on the qdisc layer. It seems that a dev_activate/deactivate
>just to install a new qdisc is not needed.
>
>Anyways future work. However if it resolves the miniq issue, as
>Jiri indicated, seems like a clean fix. Although Jakub's issue
>with the patch would need to be addressed. Seems he gets a WARN_ON
>if the offload is not disabled but the device is unitialized.
Why just moving qdisc_free to rcu is not enough? It would resolve this
issue and also avoid using synchronize net. Something like:
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83a3e47..487288e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -100,6 +100,7 @@ struct Qdisc {
refcount_t refcnt;
spinlock_t busylock ____cacheline_aligned_in_smp;
+ struct rcu_head rcu;
};
static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cd1b200..9beffd1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -698,8 +698,10 @@ void qdisc_reset(struct Qdisc *qdisc)
}
EXPORT_SYMBOL(qdisc_reset);
-static void qdisc_free(struct Qdisc *qdisc)
+static void qdisc_free_rcu(struct rcu_head *rcu)
{
+ struct Qdisc *qdisc = container_of(rcu, struct Qdisc, rcu);
+
if (qdisc_is_percpu_stats(qdisc)) {
free_percpu(qdisc->cpu_bstats);
free_percpu(qdisc->cpu_qstats);
@@ -732,7 +734,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(qdisc->gso_skb);
kfree_skb(qdisc->skb_bad_txq);
- qdisc_free(qdisc);
+ call_rcu(&qdisc->rcu, qdisc_free_rcu);
}
EXPORT_SYMBOL(qdisc_destroy);
--
2.9.5
Powered by blists - more mailing lists