lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B32E2C5D-25FB-427F-8567-701C152DFDE6@nutanix.com>
Date: Tue, 8 Apr 2025 01:18:09 +0000
From: Jon Kohler <jon@...anix.com>
To: Jason Wang <jasowang@...hat.com>
CC: "Michael S. Tsirkin" <mst@...hat.com>,
        Eugenio Pérez
	<eperezma@...hat.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] vhost/net: remove zerocopy support



> On Apr 6, 2025, at 7:14 PM, Jason Wang <jasowang@...hat.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Fri, Apr 4, 2025 at 10:24 PM Jon Kohler <jon@...anix.com> wrote:
>> 
>> Commit 098eadce3c62 ("vhost_net: disable zerocopy by default") disabled
>> the module parameter for the handle_tx_zerocopy path back in 2019,
>> nothing that many downstream distributions (e.g., RHEL7 and later) had
>> already done the same.
>> 
>> Both upstream and downstream disablement suggest this path is rarely
>> used.
>> 
>> Testing the module parameter shows that while the path allows packet
>> forwarding, the zerocopy functionality itself is broken. On outbound
>> traffic (guest TX -> external), zerocopy SKBs are orphaned by either
>> skb_orphan_frags_rx() (used with the tun driver via tun_net_xmit())
> 
> This is by design to avoid DOS.

I understand that, but it makes ZC non-functional in general, as ZC fails
and immediately increments the error counters.

> 
>> or
>> skb_orphan_frags() elsewhere in the stack,
> 
> Basically zerocopy is expected to work for guest -> remote case, so
> could we still hit skb_orphan_frags() in this case?

Yes, you’d hit that in tun_net_xmit(). If you punch a hole in that *and* in the
zc error counter (such that failed ZC doesn’t disable ZC in vhost), you get ZC
from vhost; however, the network interrupt handler under net_tx_action and
eventually incurs the memcpy under dev_queue_xmit_nit().

This is no more performant, and in fact is actually worse since the time spent
waiting on that memcpy to resolve is longer.

> 
>> as vhost_net does not set
>> SKBFL_DONT_ORPHAN.
>> 
>> Orphaning enforces a memcpy and triggers the completion callback, which
>> increments the failed TX counter, effectively disabling zerocopy again.
>> 
>> Even after addressing these issues to prevent SKB orphaning and error
>> counter increments, performance remains poor. By default, only 64
>> messages can be zerocopied, which is immediately exhausted by workloads
>> like iperf, resulting in most messages being memcpy'd anyhow.
>> 
>> Additionally, memcpy'd messages do not benefit from the XDP batching
>> optimizations present in the handle_tx_copy path.
>> 
>> Given these limitations and the lack of any tangible benefits, remove
>> zerocopy entirely to simplify the code base.
>> 
>> Signed-off-by: Jon Kohler <jon@...anix.com>
> 
> Any chance we can fix those issues? Actually, we had a plan to make
> use of vhost-net and its tx zerocopy (or even implement the rx
> zerocopy) in pasta.

Happy to take direction and ideas here, but I don’t see a clear way to fix these
issues, without dealing with the assertions that skb_orphan_frags_rx calls out.

Said another way, I’d be interested in hearing if there is a config where ZC in
current host-net implementation works, as I was driving myself crazy trying to
reverse engineer.

Happy to collaborate if there is something we could do here.

> 
> Eugenio may explain more here.
> 
> Thanks
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ