lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Jun 2024 19:22:02 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: Jason Wang <jasowang@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>, 
	Heng Qi <hengqi@...ux.alibaba.com>, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, xuanzhuo@...ux.alibaba.com, 
	virtualization@...ts.linux.dev, ast@...nel.org, daniel@...earbox.net, 
	hawk@...nel.org, john.fastabend@...il.com, netdev@...r.kernel.org
Subject: Re: [patch net-next] virtio_net: add support for Byte Queue Limits

On Fri, Jun 7, 2024 at 5:57 PM Jiri Pirko <jiri@...nulli.us> wrote:
>
> Fri, Jun 07, 2024 at 08:47:43AM CEST, jasowang@...hat.com wrote:
> >On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko <jiri@...nulli.us> wrote:
> >>
> >> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasowang@...hat.com wrote:
> >> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko <jiri@...nulli.us> wrote:
> >> >>
> >> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasowang@...hat.com wrote:
> >> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> >> >> >>
> >> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote:
> >> >> >> > > If the codes of orphan mode don't have an impact when you enable
> >> >> >> > > napi_tx mode, please keep it if you can.
> >> >> >> >
> >> >> >> > For example, it complicates BQL implementation.
> >> >> >> >
> >> >> >> > Thanks
> >> >> >>
> >> >> >> I very much doubt sending interrupts to a VM can
> >> >> >> *on all benchmarks* compete with not sending interrupts.
> >> >> >
> >> >> >It should not differ too much from the physical NIC. We can have one
> >> >> >more round of benchmarks to see the difference.
> >> >> >
> >> >> >But if NAPI mode needs to win all of the benchmarks in order to get
> >> >> >rid of orphan, that would be very difficult. Considering various bugs
> >> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most
> >> >> >of the benchmark doesn't show obvious differences.
> >> >> >
> >> >> >Looking at git history, there're commits that removes skb_orphan(), for example:
> >> >> >
> >> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340
> >> >> >Author: Eric Dumazet <edumazet@...gle.com>
> >> >> >Date:   Fri Sep 28 07:53:26 2012 +0000
> >> >> >
> >> >> >    mlx4: dont orphan skbs in mlx4_en_xmit()
> >> >> >
> >> >> >    After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX
> >> >> >    completions) we no longer need to orphan skbs in mlx4_en_xmit()
> >> >> >    since skb wont stay a long time in TX ring before their release.
> >> >> >
> >> >> >    Orphaning skbs in ndo_start_xmit() should be avoided as much as
> >> >> >    possible, since it breaks TCP Small Queue or other flow control
> >> >> >    mechanisms (per socket limits)
> >> >> >
> >> >> >    Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> >> >> >    Acked-by: Yevgeny Petrilin <yevgenyp@...lanox.com>
> >> >> >    Cc: Or Gerlitz <ogerlitz@...lanox.com>
> >> >> >    Signed-off-by: David S. Miller <davem@...emloft.net>
> >> >> >
> >> >> >>
> >> >> >> So yea, it's great if napi and hardware are advanced enough
> >> >> >> that the default can be changed, since this way virtio
> >> >> >> is closer to a regular nic and more or standard
> >> >> >> infrastructure can be used.
> >> >> >>
> >> >> >> But dropping it will go against *no breaking userspace* rule.
> >> >> >> Complicated? Tough.
> >> >> >
> >> >> >I don't know what kind of userspace is broken by this. Or why it is
> >> >> >not broken since the day we enable NAPI mode by default.
> >> >>
> >> >> There is a module option that explicitly allows user to set
> >> >> napi_tx=false
> >> >> or
> >> >> napi_weight=0
> >> >>
> >> >> So if you remove this option or ignore it, both breaks the user
> >> >> expectation.
> >> >
> >> >We can keep them, but I wonder what's the expectation of the user
> >> >here? The only thing so far I can imagine is the performance
> >> >difference.
> >>
> >> True.
> >>
> >> >
> >> >> I personally would vote for this breakage. To carry ancient
> >> >> things like this one forever does not make sense to me.
> >> >
> >> >Exactly.
> >> >
> >> >> While at it,
> >> >> let's remove all virtio net module params. Thoughts?
> >> >
> >> >I tend to
> >> >
> >> >1) drop the orphan mode, but we can have some benchmarks first
> >>
> >> Any idea which? That would be really tricky to find the ones where
> >> orphan mode makes difference I assume.
> >
> >True. Personally, I would like to just drop orphan mode. But I'm not
> >sure others are happy with this.
>
> How about to do it other way around. I will take a stab at sending patch
> removing it. If anyone is against and has solid data to prove orphan
> mode is needed, let them provide those.

Honestly, we in production have to use skb orphan mode. I cannot
provide enough and strong evidence about why default mode on earth
causes performance degradation in some cases. I mean I don't know the
root cause. The only thing I can tell is if I enable the skb orphan
mode, then 1)  will let more skb pass the TCP layer, 2) some key
members like snd_cwnd in tcp will behave normally.

I know without orphan mode the skb will be controlled/limited by the
combination of TSO and napi_tx mode thanks to sk_wmem_alloc. So I
_guess_ the root cause is: the possible delay of interrupts generated
by the host machine will cause the delay of freeing skbs, resulting in
the slowing down the tx speed. If the interval between two interrupts
is very short like the real NIC, I think the issue would disappear.

That's all I can tell. Just for record. Hope this information could
also be useful to other readers.

Thanks,
Jason

>
>
> >
> >Thanks
> >
> >>
> >>
> >> >2) keep the module parameters
> >>
> >> and ignore them, correct? Perhaps a warning would be good.
> >>
> >>
> >> >
> >> >Thanks
> >> >
> >> >>
> >> >>
> >> >>
> >> >> >
> >> >> >Thanks
> >> >> >
> >> >> >>
> >> >> >> --
> >> >> >> MST
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ