netdev - Re: [Patch net-next] net_sched: call qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bbbffbae-2c05-b07c-21f3-0371f02aa7f7@gmail.com>
Date:   Thu, 21 Dec 2017 19:36:51 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org
Cc:     jakub.kicinski@...ronome.com
Subject: Re: [Patch net-next] net_sched: call qdisc_reset() with qdisc lock

On 12/21/2017 04:03 PM, Cong Wang wrote:
> qdisc_reset() should always be called with qdisc spinlock
> and with BH disabled, otherwise qdisc ->reset() could race
> with TX BH.
> 
hmm I don't see how this fixes the issue. pfifo_fast is no longer
using the qdisc lock so that doesn't help.  And it is only a
local_bh_disable.

> Fixes: 7bbde83b1860 ("net: sched: drop qdisc_reset from dev_graft_qdisc")
> Reported-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
> Cc: John Fastabend <john.fastabend@...il.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
> ---
>  net/sched/sch_generic.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 10aaa3b615ce..00ddb5f8f430 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1097,8 +1097,11 @@ static void dev_qdisc_reset(struct net_device *dev,
>  {
>  	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
>  
> -	if (qdisc)
> +	if (qdisc) {
> +		spin_lock_bh(qdisc_lock(qdisc));
>  		qdisc_reset(qdisc);
> +		spin_unlock_bh(qdisc_lock(qdisc));
> +	}
>  }
>  
>  /**
> 

OK first the cases to get to qdisc_reset that I've tracked
down are,

    dev_shutdown()
      qdisc_destroy()

    dev_deactivate_many()
      dev_qdisc_reset() <- for each txq
         qdisc_reset()

    chained calls from qdisc_reset ops

At the moment all the lockless qdiscs don't care about chained
calls so we can ignore that, but would be nice to keep in mind.

Next qdisc_reset() is doing a couple things calling the qdisc
ops reset call but also walking gso_skb and skb_bad_txq. The
'unlink' operations there are not safe to be called while an
enqueue/dequeue op is in-flight. Also pfifo_fast's reset op
is not safe to be called with enqueue/dequeue ops in-flight.

So I've made the assumption that qdisc_reset is _only_ ever
called after a qdisc is no longer attached on the enqueue
dev_xmit side and also any in-progress tx_action calls are
completed. For what its worth this has always been the assumption
AFAIK.

So those are the assumptions what did I miss?

The biggest gap I see is dev_deactivate_many() is supposed
to wait for all tx_action calls to complete, this bit:

        /* Wait for outstanding qdisc_run calls. */
        list_for_each_entry(dev, head, close_list) {
                while (some_qdisc_is_busy(dev))
                        yield();

Where some_qdisc_is_busy in the lockless case only does
the following,

                if (q->flags & TCQ_F_NOLOCK) {
                        val = test_bit(__QDISC_STATE_SCHED, &q->state);

Oops that only tells us something is scheduled nothing about
if something is running. Excerpt from tx_action side here

                        clear_bit(__QDISC_STATE_SCHED, &q->state);
                        qdisc_run(q);

For comparison here is the check in the qdisc locked branch of
some_qdisc_is_busy,

                        root_lock = qdisc_lock(q);
                        spin_lock_bh(root_lock);

                        val = (qdisc_is_running(q) ||
                               test_bit(__QDISC_STATE_SCHED, &q->state));

                        spin_unlock_bh(root_lock);

previously we had the qdisc_lock() to protect the bit clear and then in
the some_qdisc_is_busy case we still did a qdisc_is_running() call. The
easiest fix I see is add the qdisc_is_running() check to the TCQ_F_NOLOCK
case in some_qdisc_is_busy AND to be correct I guess we should set the
running bit before clearing the QDISC_STATE_SCHED bit to be correct.

Untested but perhaps as simple as,

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ab497ef..720829e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1071,7 +1071,8 @@ static bool some_qdisc_is_busy(struct net_device *dev)
                q = dev_queue->qdisc_sleeping;

                if (q->flags & TCQ_F_NOLOCK) {
-                       val = test_bit(__QDISC_STATE_SCHED, &q->state);
+                       val = (qdisc_is_running(q) ||
+                              test_bit(__QDISC_STATE_SCHED, &q->state));
                } else {
                        root_lock = qdisc_lock(q);
                        spin_lock_bh(root_lock);

Thanks,
John