lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 21 Dec 2017 09:39:08 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Cong Wang <xiyou.wangcong@...il.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kubakici@...pl>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU
 callback"

Thu, Dec 21, 2017 at 12:34:05AM CET, john.fastabend@...il.com wrote:
>On 12/20/2017 03:23 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
>> <john.fastabend@...il.com> wrote:
>>> On 12/20/2017 02:41 PM, Cong Wang wrote:
>>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>>>> <john.fastabend@...il.com> wrote:
>>>>> RCU grace period is needed for lockless qdiscs added in the commit
>>>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>>>>
>>>>> It is needed now that qdiscs may be lockless otherwise we risk
>>>>> free'ing a qdisc that is still in use from datapath. Additionally,
>>>>> push list cleanup into RCU callback. Otherwise we risk the datapath
>>>>> adding skbs during removal.
>>>>
>>>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
>>>> It doesn't work with your "lockless" patches?
>>>>
>>>
>>> Well this is only in the 'parent == NULL' case otherwise we call
>>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
>>> sch_tree_lock().
>>>
>>> The only converted qdisc mq and mqprio at this point don't care
>>> though and do their own dev_deactivate/activate. So its not fixing
>>> anything in the above mentioned commit.
>> 
>> Sure, removing a class does not impact the whole device,
>> but removing the root qdisc does.
>> 
>> After your "lockless", skb_array_consume_bh() is called in
>> pfifo_fast_reset() and ptr_ring_cleanup() is called in
>> pfifo_fast_destroy(), assuming skb_array is not buggy, what race
>> do we have here with datapath?
>> 
>
>None at the moment.
>
>> 
>>>
>>> I still think it will need to be done eventually. If it resolves
>>> the miniq case it seems like a good idea. Although per Jakub's comment
>>> perhaps I pulled too much into the RCU handler.
>> 
>> The case Jakub reported is a RCU callback missing a rcu
>> barrier. I don't understand why you keep believing it is RCU
>> readers on datapath.> 
>> Not even to mention ingress is not affected by your "lockless"
>> thing.
>> 
>
>I was thinking about the case where we want a lockless qdisc
>with classes. Doing the qdisc destroy after a grace period would
>solve this. Also we could start to cleanup a lot of the locking
>and extra bits around 'running' qdisc and such by doing a clean
>xchg on the qdisc layer. It seems that a dev_activate/deactivate
>just to install a new qdisc is not needed.
>
>Anyways future work. However if it resolves the miniq issue, as
>Jiri indicated, seems like a clean fix. Although Jakub's issue
>with the patch would need to be addressed. Seems he gets a WARN_ON
>if the offload is not disabled but the device is unitialized.

Why just moving qdisc_free to rcu is not enough? It would resolve this
issue and also avoid using synchronize net. Something like:

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83a3e47..487288e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -100,6 +100,7 @@ struct Qdisc {
 	refcount_t		refcnt;
 
 	spinlock_t		busylock ____cacheline_aligned_in_smp;
+	struct rcu_head		rcu;
 };
 
 static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cd1b200..9beffd1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -698,8 +698,10 @@ void qdisc_reset(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_reset);
 
-static void qdisc_free(struct Qdisc *qdisc)
+static void qdisc_free_rcu(struct rcu_head *rcu)
 {
+	struct Qdisc *qdisc = container_of(rcu, struct Qdisc, rcu);
+
 	if (qdisc_is_percpu_stats(qdisc)) {
 		free_percpu(qdisc->cpu_bstats);
 		free_percpu(qdisc->cpu_qstats);
@@ -732,7 +734,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	kfree_skb_list(qdisc->gso_skb);
 	kfree_skb(qdisc->skb_bad_txq);
-	qdisc_free(qdisc);
+	call_rcu(&qdisc->rcu, qdisc_free_rcu);
 }
 EXPORT_SYMBOL(qdisc_destroy);
 
-- 
2.9.5

Powered by blists - more mailing lists