lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 7 Jun 2023 17:39:46 -0700
From: Peilin Ye <yepeilin.cs@...il.com>
To: Vlad Buslov <vladbu@...dia.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Jakub Kicinski <kuba@...nel.org>,
	Pedro Tammela <pctammela@...atatu.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
	Peilin Ye <peilin.ye@...edance.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	John Fastabend <john.fastabend@...il.com>,
	Hillf Danton <hdanton@...a.com>, netdev@...r.kernel.org,
	Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and
 clsact Qdiscs before grafting

On Thu, Jun 01, 2023 at 09:20:39AM +0300, Vlad Buslov wrote:
> >> >> If livelock with concurrent filters insertion is an issue, then it can
> >> >> be remedied by setting a new Qdisc->flags bit
> >> >> "DELETED-REJECT-NEW-FILTERS" and checking for it together with
> >> >> QDISC_CLASS_OPS_DOIT_UNLOCKED in order to force any concurrent filter
> >> >> insertion coming after the flag is set to synchronize on rtnl lock.
> >> >
> >> > Thanks for the suggestion!  I'll try this approach.
> >> >
> >> > Currently QDISC_CLASS_OPS_DOIT_UNLOCKED is checked after taking a refcnt of
> >> > the "being-deleted" Qdisc.  I'll try forcing "late" requests (that arrive
> >> > later than Qdisc is flagged as being-deleted) sync on RTNL lock without
> >> > (before) taking the Qdisc refcnt (otherwise I think Task 1 will replay for
> >> > even longer?).
> >> 
> >> Yeah, I see what you mean. Looking at the code __tcf_qdisc_find()
> >> already returns -EINVAL when q->refcnt is zero, so maybe returning
> >> -EINVAL from that function when "DELETED-REJECT-NEW-FILTERS" flags is
> >> set is also fine? Would be much easier to implement as opposed to moving
> >> rtnl_lock there.
> >
> > I implemented [1] this suggestion and tested the livelock issue in QEMU (-m
> > 16G, CONFIG_NR_CPUS=8).  I tried deleting the ingress Qdisc (let's call it
> > "request A") while it has a lot of ongoing filter requests, and here's the
> > result:
> >
> >                         #1         #2         #3         #4
> >   ----------------------------------------------------------
> >    a. refcnt            89         93        230        571
> >    b. replayed     167,568    196,450    336,291    878,027
> >    c. time real   0m2.478s   0m2.746s   0m3.693s   0m9.461s
> >            user   0m0.000s   0m0.000s   0m0.000s   0m0.000s
> >             sys   0m0.623s   0m0.681s   0m1.119s   0m2.770s
> >
> >    a. is the Qdisc refcnt when A calls qdisc_graft() for the first time;
> >    b. is the number of times A has been replayed;
> >    c. is the time(1) output for A.
> >
> > a. and b. are collected from printk() output.  This is better than before,
> > but A could still be replayed for hundreds of thousands of times and hang
> > for a few seconds.
> 
> I don't get where does few seconds waiting time come from. I'm probably
> missing something obvious here, but the waiting time should be the
> maximum filter op latency of new/get/del filter request that is already
> in-flight (i.e. already passed qdisc_is_destroying() check) and it
> should take several orders of magnitude less time.

Yeah I agree, here's what I did:

In Terminal 1 I keep adding filters to eth1 in a naive and unrealistic
loop:

  $ echo "1 1 32" > /sys/bus/netdevsim/new_device
  $ tc qdisc add dev eth1 ingress
  $ for (( i=1; i<=3000; i++ ))
  > do
  > tc filter add dev eth1 ingress proto all flower src_mac 00:11:22:33:44:55 action pass > /dev/null 2>&1 &
  > done

When the loop is running, I delete the Qdisc in Terminal 2:

  $ time tc qdisc delete dev eth1 ingress

Which took seconds on average.  However, if I specify a unique "prio" when
adding filters in that loop, e.g.:

  $ for (( i=1; i<=3000; i++ ))
  > do
  > tc filter add dev eth1 ingress proto all prio $i flower src_mac 00:11:22:33:44:55 action pass > /dev/null 2>&1 &
  > done				     ^^^^^^^

Then deleting the Qdisc in Terminal 2 becomes a lot faster:

  real  0m0.712s
  user  0m0.000s
  sys   0m0.152s 

In fact it's so fast that I couldn't even make qdisc->refcnt > 1, so I did
yet another test [1], which looks a lot better.

When I didn't specify "prio", sometimes that
rhashtable_lookup_insert_fast() call in fl_ht_insert_unique() returns
-EEXIST.  Is it because that concurrent add-filter requests auto-allocated
the same "prio" number, so they collided with each other?  Do you think
this is related to why it's slow?

Thanks,
Peilin Ye

[1] In a beefier QEMU setup (64 cores, -m 128G), I started 64 tc instances
in -batch mode that keeps adding a unique filter (with "prio" and "handle"
specified) then deletes it.  Again, when they are running I delete the
ingress Qdisc, and here's the result:

                         #1         #2         #3         #4
   ----------------------------------------------------------
    a. refcnt            64         63         64         64
    b. replayed         169      5,630        887      3,442
    c. time real   0m0.171s   0m0.147s   0m0.186s   0m0.111s
            user   0m0.000s   0m0.009s   0m0.001s   0m0.000s
             sys   0m0.112s   0m0.108s   0m0.115s   0m0.104s


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ