[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZHG+AR8qgpJ6/Zhx@C02FL77VMD6R.googleapis.com>
Date: Sat, 27 May 2023 01:23:29 -0700
From: Peilin Ye <yepeilin.cs@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>,
Pedro Tammela <pctammela@...atatu.com>
Cc: Pedro Tammela <pctammela@...atatu.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
Peilin Ye <peilin.ye@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>,
Hillf Danton <hdanton@...a.com>, netdev@...r.kernel.org,
Cong Wang <cong.wang@...edance.com>,
Vlad Buslov <vladbu@...dia.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and
clsact Qdiscs before grafting
Hi Jakub and all,
On Fri, May 26, 2023 at 07:33:24PM -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 16:09:51 -0700 Peilin Ye wrote:
> > Thanks a lot, I'll get right on it.
>
> Any insights? Is it just a live-lock inherent to the retry scheme
> or we actually forget to release the lock/refcnt?
I think it's just a thread holding the RTNL mutex for too long (replaying
too many times). We could replay for arbitrary times in
tc_{modify,get}_qdisc() if the user keeps sending RTNL-unlocked filter
requests for the old Qdisc.
I tested the new reproducer Pedro posted, on:
1. All 6 v5 patches, FWIW, which caused a similar hang as Pedro reported
2. First 5 v5 patches, plus patch 6 in v1 (no replaying), did not trigger
any issues (in about 30 minutes).
3. All 6 v5 patches, plus this diff:
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 286b7c58f5b9..988718ba5abe 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1090,8 +1090,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
* RTNL-unlocked filter request(s). This is the counterpart of that
* qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
*/
- if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
+ if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping)) {
+ rtnl_unlock();
+ rtnl_lock();
return -EAGAIN;
+ }
}
if (dev->flags & IFF_UP)
Did not trigger any issues (in about 30 mintues) either.
What would you suggest?
Thanks,
Peilin Ye
Powered by blists - more mailing lists