[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZIEjUobtdPCu648e@C02FL77VMD6R.googleapis.com>
Date: Wed, 7 Jun 2023 17:39:46 -0700
From: Peilin Ye <yepeilin.cs@...il.com>
To: Vlad Buslov <vladbu@...dia.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Jakub Kicinski <kuba@...nel.org>,
Pedro Tammela <pctammela@...atatu.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
Peilin Ye <peilin.ye@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>,
Hillf Danton <hdanton@...a.com>, netdev@...r.kernel.org,
Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and
clsact Qdiscs before grafting
On Thu, Jun 01, 2023 at 09:20:39AM +0300, Vlad Buslov wrote:
> >> >> If livelock with concurrent filters insertion is an issue, then it can
> >> >> be remedied by setting a new Qdisc->flags bit
> >> >> "DELETED-REJECT-NEW-FILTERS" and checking for it together with
> >> >> QDISC_CLASS_OPS_DOIT_UNLOCKED in order to force any concurrent filter
> >> >> insertion coming after the flag is set to synchronize on rtnl lock.
> >> >
> >> > Thanks for the suggestion! I'll try this approach.
> >> >
> >> > Currently QDISC_CLASS_OPS_DOIT_UNLOCKED is checked after taking a refcnt of
> >> > the "being-deleted" Qdisc. I'll try forcing "late" requests (that arrive
> >> > later than Qdisc is flagged as being-deleted) sync on RTNL lock without
> >> > (before) taking the Qdisc refcnt (otherwise I think Task 1 will replay for
> >> > even longer?).
> >>
> >> Yeah, I see what you mean. Looking at the code __tcf_qdisc_find()
> >> already returns -EINVAL when q->refcnt is zero, so maybe returning
> >> -EINVAL from that function when "DELETED-REJECT-NEW-FILTERS" flags is
> >> set is also fine? Would be much easier to implement as opposed to moving
> >> rtnl_lock there.
> >
> > I implemented [1] this suggestion and tested the livelock issue in QEMU (-m
> > 16G, CONFIG_NR_CPUS=8). I tried deleting the ingress Qdisc (let's call it
> > "request A") while it has a lot of ongoing filter requests, and here's the
> > result:
> >
> > #1 #2 #3 #4
> > ----------------------------------------------------------
> > a. refcnt 89 93 230 571
> > b. replayed 167,568 196,450 336,291 878,027
> > c. time real 0m2.478s 0m2.746s 0m3.693s 0m9.461s
> > user 0m0.000s 0m0.000s 0m0.000s 0m0.000s
> > sys 0m0.623s 0m0.681s 0m1.119s 0m2.770s
> >
> > a. is the Qdisc refcnt when A calls qdisc_graft() for the first time;
> > b. is the number of times A has been replayed;
> > c. is the time(1) output for A.
> >
> > a. and b. are collected from printk() output. This is better than before,
> > but A could still be replayed for hundreds of thousands of times and hang
> > for a few seconds.
>
> I don't get where does few seconds waiting time come from. I'm probably
> missing something obvious here, but the waiting time should be the
> maximum filter op latency of new/get/del filter request that is already
> in-flight (i.e. already passed qdisc_is_destroying() check) and it
> should take several orders of magnitude less time.
Yeah I agree, here's what I did:
In Terminal 1 I keep adding filters to eth1 in a naive and unrealistic
loop:
$ echo "1 1 32" > /sys/bus/netdevsim/new_device
$ tc qdisc add dev eth1 ingress
$ for (( i=1; i<=3000; i++ ))
> do
> tc filter add dev eth1 ingress proto all flower src_mac 00:11:22:33:44:55 action pass > /dev/null 2>&1 &
> done
When the loop is running, I delete the Qdisc in Terminal 2:
$ time tc qdisc delete dev eth1 ingress
Which took seconds on average. However, if I specify a unique "prio" when
adding filters in that loop, e.g.:
$ for (( i=1; i<=3000; i++ ))
> do
> tc filter add dev eth1 ingress proto all prio $i flower src_mac 00:11:22:33:44:55 action pass > /dev/null 2>&1 &
> done ^^^^^^^
Then deleting the Qdisc in Terminal 2 becomes a lot faster:
real 0m0.712s
user 0m0.000s
sys 0m0.152s
In fact it's so fast that I couldn't even make qdisc->refcnt > 1, so I did
yet another test [1], which looks a lot better.
When I didn't specify "prio", sometimes that
rhashtable_lookup_insert_fast() call in fl_ht_insert_unique() returns
-EEXIST. Is it because that concurrent add-filter requests auto-allocated
the same "prio" number, so they collided with each other? Do you think
this is related to why it's slow?
Thanks,
Peilin Ye
[1] In a beefier QEMU setup (64 cores, -m 128G), I started 64 tc instances
in -batch mode that keeps adding a unique filter (with "prio" and "handle"
specified) then deletes it. Again, when they are running I delete the
ingress Qdisc, and here's the result:
#1 #2 #3 #4
----------------------------------------------------------
a. refcnt 64 63 64 64
b. replayed 169 5,630 887 3,442
c. time real 0m0.171s 0m0.147s 0m0.186s 0m0.111s
user 0m0.000s 0m0.009s 0m0.001s 0m0.000s
sys 0m0.112s 0m0.108s 0m0.115s 0m0.104s
Powered by blists - more mailing lists