netdev - Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30a35b1f-9f66-f7c7-61c6-048c1b68efce@mojatatu.com>
Date: Thu, 1 Jun 2023 10:03:22 -0300
From: Pedro Tammela <pctammela@...atatu.com>
To: Peilin Ye <yepeilin.cs@...il.com>, Vlad Buslov <vladbu@...dia.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Jakub Kicinski <kuba@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, Cong Wang <xiyou.wangcong@...il.com>,
 Jiri Pirko <jiri@...nulli.us>, Peilin Ye <peilin.ye@...edance.com>,
 Daniel Borkmann <daniel@...earbox.net>,
 John Fastabend <john.fastabend@...il.com>, Hillf Danton <hdanton@...a.com>,
 netdev@...r.kernel.org, Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and
 clsact Qdiscs before grafting

On 01/06/2023 00:57, Peilin Ye wrote:
> Hi Vlad and all,
> 
> On Tue, May 30, 2023 at 03:18:19PM +0300, Vlad Buslov wrote:
>>>> If livelock with concurrent filters insertion is an issue, then it can
>>>> be remedied by setting a new Qdisc->flags bit
>>>> "DELETED-REJECT-NEW-FILTERS" and checking for it together with
>>>> QDISC_CLASS_OPS_DOIT_UNLOCKED in order to force any concurrent filter
>>>> insertion coming after the flag is set to synchronize on rtnl lock.
>>>
>>> Thanks for the suggestion!  I'll try this approach.
>>>
>>> Currently QDISC_CLASS_OPS_DOIT_UNLOCKED is checked after taking a refcnt of
>>> the "being-deleted" Qdisc.  I'll try forcing "late" requests (that arrive
>>> later than Qdisc is flagged as being-deleted) sync on RTNL lock without
>>> (before) taking the Qdisc refcnt (otherwise I think Task 1 will replay for
>>> even longer?).
>>
>> Yeah, I see what you mean. Looking at the code __tcf_qdisc_find()
>> already returns -EINVAL when q->refcnt is zero, so maybe returning
>> -EINVAL from that function when "DELETED-REJECT-NEW-FILTERS" flags is
>> set is also fine? Would be much easier to implement as opposed to moving
>> rtnl_lock there.
> 
> I implemented [1] this suggestion and tested the livelock issue in QEMU (-m
> 16G, CONFIG_NR_CPUS=8).  I tried deleting the ingress Qdisc (let's call it
> "request A") while it has a lot of ongoing filter requests, and here's the
> result:
> 
>                          #1         #2         #3         #4
>    ----------------------------------------------------------
>     a. refcnt            89         93        230        571
>     b. replayed     167,568    196,450    336,291    878,027
>     c. time real   0m2.478s   0m2.746s   0m3.693s   0m9.461s
>             user   0m0.000s   0m0.000s   0m0.000s   0m0.000s
>              sys   0m0.623s   0m0.681s   0m1.119s   0m2.770s
> 
>     a. is the Qdisc refcnt when A calls qdisc_graft() for the first time;
>     b. is the number of times A has been replayed;
>     c. is the time(1) output for A.
> 
> a. and b. are collected from printk() output.  This is better than before,
> but A could still be replayed for hundreds of thousands of times and hang
> for a few seconds.
> 
> Is this okay?  If not, is it possible (or should we) to make A really
> _wait_ on Qdisc refcnt, instead of "busy-replaying"?
> 
> Thanks,
> Peilin Ye
> 
> [1] Diff against v5 patch 6 (printk() calls not included):
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 3e9cc43cbc90..de7b0538b309 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -94,6 +94,7 @@ struct Qdisc {
>   #define TCQ_F_INVISIBLE                0x80 /* invisible by default in dump */
>   #define TCQ_F_NOLOCK           0x100 /* qdisc does not require locking */
>   #define TCQ_F_OFFLOADED                0x200 /* qdisc is offloaded to HW */
> +#define TCQ_F_DESTROYING       0x400 /* destroying, reject filter requests */
>          u32                     limit;
>          const struct Qdisc_ops  *ops;
>          struct qdisc_size_table __rcu *stab;
> @@ -185,6 +186,11 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
>          return !READ_ONCE(qdisc->q.qlen);
>   }
> 
> +static inline bool qdisc_is_destroying(const struct Qdisc *qdisc)
> +{
> +       return qdisc->flags & TCQ_F_DESTROYING;
> +}
> +
>   /* For !TCQ_F_NOLOCK qdisc, qdisc_run_begin/end() must be invoked with
>    * the qdisc root lock acquired.
>    */
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 2621550bfddc..3e7f6f286ac0 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -1172,7 +1172,7 @@ static int __tcf_qdisc_find(struct net *net, struct Qdisc **q,
>                  *parent = (*q)->handle;
>          } else {
>                  *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
> -               if (!*q) {
> +               if (!*q || qdisc_is_destroying(*q)) {
>                          NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
>                          err = -EINVAL;
>                          goto errout_rcu;
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 286b7c58f5b9..d6e47546c7fe 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1086,12 +1086,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>                                  return -ENOENT;
>                          }
> 
> -                       /* Replay if the current ingress (or clsact) Qdisc has ongoing
> -                        * RTNL-unlocked filter request(s).  This is the counterpart of that
> -                        * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
> +                       /* If current ingress (clsact) Qdisc has ongoing filter requests, stop
> +                        * accepting any more by marking it as "being destroyed", then tell the
> +                        * caller to replay by returning -EAGAIN.
>                           */
> -                       if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
> +                       q = dev_queue->qdisc_sleeping;
> +                       if (!qdisc_refcount_dec_if_one(q)) {
> +                               q->flags |= TCQ_F_DESTROYING;
> +                               rtnl_unlock();
> +                               schedule();
Was this intended or just a leftover?
rtnl_lock() would reschedule if needed as it's a mutex_lock
> +                               rtnl_lock();
>                                  return -EAGAIN;
> +                       }
>                  }
> 
>                  if (dev->flags & IFF_UP)
>