lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZHG+AR8qgpJ6/Zhx@C02FL77VMD6R.googleapis.com>
Date: Sat, 27 May 2023 01:23:29 -0700
From: Peilin Ye <yepeilin.cs@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>,
	Pedro Tammela <pctammela@...atatu.com>
Cc: Pedro Tammela <pctammela@...atatu.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
	Peilin Ye <peilin.ye@...edance.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	John Fastabend <john.fastabend@...il.com>,
	Hillf Danton <hdanton@...a.com>, netdev@...r.kernel.org,
	Cong Wang <cong.wang@...edance.com>,
	Vlad Buslov <vladbu@...dia.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and
 clsact Qdiscs before grafting

Hi Jakub and all,

On Fri, May 26, 2023 at 07:33:24PM -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 16:09:51 -0700 Peilin Ye wrote:
> > Thanks a lot, I'll get right on it.
>
> Any insights? Is it just a live-lock inherent to the retry scheme
> or we actually forget to release the lock/refcnt?

I think it's just a thread holding the RTNL mutex for too long (replaying
too many times).  We could replay for arbitrary times in
tc_{modify,get}_qdisc() if the user keeps sending RTNL-unlocked filter
requests for the old Qdisc.

I tested the new reproducer Pedro posted, on:

1. All 6 v5 patches, FWIW, which caused a similar hang as Pedro reported

2. First 5 v5 patches, plus patch 6 in v1 (no replaying), did not trigger
   any issues (in about 30 minutes).

3. All 6 v5 patches, plus this diff:

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 286b7c58f5b9..988718ba5abe 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1090,8 +1090,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
                         * RTNL-unlocked filter request(s).  This is the counterpart of that
                         * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
                         */
-                       if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
+                       if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping)) {
+                               rtnl_unlock();
+                               rtnl_lock();
                                return -EAGAIN;
+                       }
                }

                if (dev->flags & IFF_UP)

   Did not trigger any issues (in about 30 mintues) either.

What would you suggest?

Thanks,
Peilin Ye


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ