netdev - Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1503498275.8694.23.camel@klaipeden.com>
Date:   Wed, 23 Aug 2017 23:24:35 +0900
From:   Koichiro Den <den@...ipeden.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Jason Wang <jasowang@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit
 path if no tx napi

On Tue, 2017-08-22 at 13:16 -0400, Willem de Bruijn wrote:
> > > > An issue of the referenced patch is that sndbuf could be smaller than
> > > > low
> > > > watermark.
> > 
> > We cannot determine the low watermark properly because of not only sndbuf
> > size
> > issue but also the fact that the upper vhost-net cannot directly see how
> > much
> > descriptor is currently available at the virtio-net tx queue. It depends on
> > multiqueue settings or other senders which are also using the same tx queue.
> > Note that in the latter case if they constantly transmitting, the deadlock
> > could
> > not occur(*), however if it has just temporarily fulfill some portion of the
> > pool in the mean time, then the low watermark cannot be helpful.
> > (*: That is because it's reliable enough in the sense I mention below.)
> > 
> > Keep in this in mind, let me briefly describe the possible deadlock I
> > mentioned:
> > (1). vhost-net on L1 guest has nothing to do sendmsg until the upper layer
> > sets
> > new descriptors, which depends only on the vhost-net zcopy callback and
> > adding
> > newly used descriptors.
> > (2). vhost-net callback depends on the skb freeing on the xmit path only.
> > (3). the xmit path depends (possibly only) on the vhost-net sendmsg.
> > As you see, it's enough to bring about the situation above that L1 virtio-
> > net
> > reaches its limit earlier than the L0 host processing. The vhost-net pool
> > could
> > be almost full or empty, whatever.
> 
> Thanks for the context. This issue is very similar to the one that used to
> exist when running out of transmit descriptors, before the removal of
> the timer and introduction of skb_orphan in start_xmit.
> 
> To make sure that I understand correctly, let me paraphrase:
> 
> A. guest socket cannot send because it exhausted its sk budget (sndbuf, tsq,
> ..)
> 
> B. budget is not freed up until guest receives tx completion for this flow
> 
> C. tx completion is held back on the host side in vhost_zerocopy_signal_used
>    behind the completion for an unrelated skb
> 
> D. unrelated packet is delayed somewhere in the host stackf zerocopy
> completions.
>    e.g., netem
> 
> The issue that is specific to vhost-net zerocopy is that (C) enforces strict
> ordering of transmit completions causing head of line blocking behind
> vhost-net zerocopy callbacks.
> 
> This is a different problem from
> 
> C1. tx completion is delayed until guest sends another packet and
>        triggers free_old_xmit_skb
> 
> Both in host and guest, zerocopy packets should never be able to loop
> to a receive path where they can cause unbounded delay.
> 
> The obvious cases of latency are queueing, like netem. That leads
> to poor performance for unrelated flows, but I don't see how this
> could cause deadlock.

Thanks for the wrap-up. I see all the points now and also that C1 should not
cause deadlock.