lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJaqyWdhLCNs_B0gcxXHut7xufw23HMR6PaO11mqAQFoGkdfXQ@mail.gmail.com>
Date: Wed, 16 Apr 2025 12:15:46 +0200
From: Eugenio Perez Martin <eperezma@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Jon Kohler <jon@...anix.com>, "Michael S. Tsirkin" <mst@...hat.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Stefano Brivio <sbrivio@...hat.com>
Subject: Re: [PATCH] vhost/net: remove zerocopy support

On Tue, Apr 8, 2025 at 8:28 AM Jason Wang <jasowang@...hat.com> wrote:
>
> On Tue, Apr 8, 2025 at 9:18 AM Jon Kohler <jon@...anix.com> wrote:
> >
> >
> >
> > > On Apr 6, 2025, at 7:14 PM, Jason Wang <jasowang@...hat.com> wrote:
> > >
> > > !-------------------------------------------------------------------|
> > >  CAUTION: External Email
> > >
> > > |-------------------------------------------------------------------!
> > >
> > > On Fri, Apr 4, 2025 at 10:24 PM Jon Kohler <jon@...anix.com> wrote:
> > >>
> > >> Commit 098eadce3c62 ("vhost_net: disable zerocopy by default") disabled
> > >> the module parameter for the handle_tx_zerocopy path back in 2019,
> > >> nothing that many downstream distributions (e.g., RHEL7 and later) had
> > >> already done the same.
> > >>
> > >> Both upstream and downstream disablement suggest this path is rarely
> > >> used.
> > >>
> > >> Testing the module parameter shows that while the path allows packet
> > >> forwarding, the zerocopy functionality itself is broken. On outbound
> > >> traffic (guest TX -> external), zerocopy SKBs are orphaned by either
> > >> skb_orphan_frags_rx() (used with the tun driver via tun_net_xmit())
> > >
> > > This is by design to avoid DOS.
> >
> > I understand that, but it makes ZC non-functional in general, as ZC fails
> > and immediately increments the error counters.
>
> The main issue is HOL, but zerocopy may still work in some setups that
> don't need to care about HOL. One example the macvtap passthrough
> mode.
>
> >
> > >
> > >> or
> > >> skb_orphan_frags() elsewhere in the stack,
> > >
> > > Basically zerocopy is expected to work for guest -> remote case, so
> > > could we still hit skb_orphan_frags() in this case?
> >
> > Yes, you’d hit that in tun_net_xmit().
>
> Only for local VM to local VM communication.
>
> > If you punch a hole in that *and* in the
> > zc error counter (such that failed ZC doesn’t disable ZC in vhost), you get ZC
> > from vhost; however, the network interrupt handler under net_tx_action and
> > eventually incurs the memcpy under dev_queue_xmit_nit().
>
> Well, yes, we need a copy if there's a packet socket. But if there's
> no network interface taps, we don't need to do the copy here.
>

Hi!

I need more time diving into the issues. As Jon mentioned, vhost ZC is
so little used I didn't have the chance to experiment with this until
now :). But yes, I expect to be able to overcome these for pasta, by
adapting buffer sizes or modifying code etc.

> >
> > This is no more performant, and in fact is actually worse since the time spent
> > waiting on that memcpy to resolve is longer.
> >
> > >
> > >> as vhost_net does not set
> > >> SKBFL_DONT_ORPHAN.
>
> Maybe we can try to set this as vhost-net can hornor ulimit now.
>
> > >>
> > >> Orphaning enforces a memcpy and triggers the completion callback, which
> > >> increments the failed TX counter, effectively disabling zerocopy again.
> > >>
> > >> Even after addressing these issues to prevent SKB orphaning and error
> > >> counter increments, performance remains poor. By default, only 64
> > >> messages can be zerocopied, which is immediately exhausted by workloads
> > >> like iperf, resulting in most messages being memcpy'd anyhow.
> > >>
> > >> Additionally, memcpy'd messages do not benefit from the XDP batching
> > >> optimizations present in the handle_tx_copy path.
> > >>
> > >> Given these limitations and the lack of any tangible benefits, remove
> > >> zerocopy entirely to simplify the code base.
> > >>
> > >> Signed-off-by: Jon Kohler <jon@...anix.com>
> > >
> > > Any chance we can fix those issues? Actually, we had a plan to make
> > > use of vhost-net and its tx zerocopy (or even implement the rx
> > > zerocopy) in pasta.
> >
> > Happy to take direction and ideas here, but I don’t see a clear way to fix these
> > issues, without dealing with the assertions that skb_orphan_frags_rx calls out.
> >
> > Said another way, I’d be interested in hearing if there is a config where ZC in
> > current host-net implementation works, as I was driving myself crazy trying to
> > reverse engineer.
>
> See above.
>
> >
> > Happy to collaborate if there is something we could do here.
>
> Great, we can start here by seeking a way to fix the known issues of
> the vhost-net zerocopy code.
>

Happy to help here :).

Jon, could you share more details about the orphan problem so I can
speed up the help? For example, can you describe the code changes and
the code path that would lead to that assertion of
skb_orphan_frags_rx?

Thanks!


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ