lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZHgXL+Bsm2M+ZMiM@C02FL77VMD6R.googleapis.com>
Date: Wed, 31 May 2023 20:57:35 -0700
From: Peilin Ye <yepeilin.cs@...il.com>
To: Vlad Buslov <vladbu@...dia.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Jakub Kicinski <kuba@...nel.org>,
	Pedro Tammela <pctammela@...atatu.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
	Peilin Ye <peilin.ye@...edance.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	John Fastabend <john.fastabend@...il.com>,
	Hillf Danton <hdanton@...a.com>, netdev@...r.kernel.org,
	Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and
 clsact Qdiscs before grafting

Hi Vlad and all,

On Tue, May 30, 2023 at 03:18:19PM +0300, Vlad Buslov wrote:
> >> If livelock with concurrent filters insertion is an issue, then it can
> >> be remedied by setting a new Qdisc->flags bit
> >> "DELETED-REJECT-NEW-FILTERS" and checking for it together with
> >> QDISC_CLASS_OPS_DOIT_UNLOCKED in order to force any concurrent filter
> >> insertion coming after the flag is set to synchronize on rtnl lock.
> >
> > Thanks for the suggestion!  I'll try this approach.
> >
> > Currently QDISC_CLASS_OPS_DOIT_UNLOCKED is checked after taking a refcnt of
> > the "being-deleted" Qdisc.  I'll try forcing "late" requests (that arrive
> > later than Qdisc is flagged as being-deleted) sync on RTNL lock without
> > (before) taking the Qdisc refcnt (otherwise I think Task 1 will replay for
> > even longer?).
> 
> Yeah, I see what you mean. Looking at the code __tcf_qdisc_find()
> already returns -EINVAL when q->refcnt is zero, so maybe returning
> -EINVAL from that function when "DELETED-REJECT-NEW-FILTERS" flags is
> set is also fine? Would be much easier to implement as opposed to moving
> rtnl_lock there.

I implemented [1] this suggestion and tested the livelock issue in QEMU (-m
16G, CONFIG_NR_CPUS=8).  I tried deleting the ingress Qdisc (let's call it
"request A") while it has a lot of ongoing filter requests, and here's the
result:

                        #1         #2         #3         #4
  ----------------------------------------------------------
   a. refcnt            89         93        230        571
   b. replayed     167,568    196,450    336,291    878,027
   c. time real   0m2.478s   0m2.746s   0m3.693s   0m9.461s
           user   0m0.000s   0m0.000s   0m0.000s   0m0.000s
            sys   0m0.623s   0m0.681s   0m1.119s   0m2.770s

   a. is the Qdisc refcnt when A calls qdisc_graft() for the first time;
   b. is the number of times A has been replayed;
   c. is the time(1) output for A.

a. and b. are collected from printk() output.  This is better than before,
but A could still be replayed for hundreds of thousands of times and hang
for a few seconds.

Is this okay?  If not, is it possible (or should we) to make A really
_wait_ on Qdisc refcnt, instead of "busy-replaying"?

Thanks,
Peilin Ye

[1] Diff against v5 patch 6 (printk() calls not included):

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3e9cc43cbc90..de7b0538b309 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -94,6 +94,7 @@ struct Qdisc {
 #define TCQ_F_INVISIBLE                0x80 /* invisible by default in dump */
 #define TCQ_F_NOLOCK           0x100 /* qdisc does not require locking */
 #define TCQ_F_OFFLOADED                0x200 /* qdisc is offloaded to HW */
+#define TCQ_F_DESTROYING       0x400 /* destroying, reject filter requests */
        u32                     limit;
        const struct Qdisc_ops  *ops;
        struct qdisc_size_table __rcu *stab;
@@ -185,6 +186,11 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
        return !READ_ONCE(qdisc->q.qlen);
 }

+static inline bool qdisc_is_destroying(const struct Qdisc *qdisc)
+{
+       return qdisc->flags & TCQ_F_DESTROYING;
+}
+
 /* For !TCQ_F_NOLOCK qdisc, qdisc_run_begin/end() must be invoked with
  * the qdisc root lock acquired.
  */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2621550bfddc..3e7f6f286ac0 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1172,7 +1172,7 @@ static int __tcf_qdisc_find(struct net *net, struct Qdisc **q,
                *parent = (*q)->handle;
        } else {
                *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
-               if (!*q) {
+               if (!*q || qdisc_is_destroying(*q)) {
                        NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
                        err = -EINVAL;
                        goto errout_rcu;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 286b7c58f5b9..d6e47546c7fe 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1086,12 +1086,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
                                return -ENOENT;
                        }

-                       /* Replay if the current ingress (or clsact) Qdisc has ongoing
-                        * RTNL-unlocked filter request(s).  This is the counterpart of that
-                        * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
+                       /* If current ingress (clsact) Qdisc has ongoing filter requests, stop
+                        * accepting any more by marking it as "being destroyed", then tell the
+                        * caller to replay by returning -EAGAIN.
                         */
-                       if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
+                       q = dev_queue->qdisc_sleeping;
+                       if (!qdisc_refcount_dec_if_one(q)) {
+                               q->flags |= TCQ_F_DESTROYING;
+                               rtnl_unlock();
+                               schedule();
+                               rtnl_lock();
                                return -EAGAIN;
+                       }
                }

                if (dev->flags & IFF_UP)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ