netdev - Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-+AOnYafiwAf+v+fhyg_0fi-6LdQSnPYcoAcngJqYs9dg@mail.gmail.com>
Date:   Tue, 22 Aug 2017 13:16:26 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Koichiro Den <den@...ipeden.com>
Cc:     Jason Wang <jasowang@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit
 path if no tx napi

>> > An issue of the referenced patch is that sndbuf could be smaller than low
>> > watermark.
> We cannot determine the low watermark properly because of not only sndbuf size
> issue but also the fact that the upper vhost-net cannot directly see how much
> descriptor is currently available at the virtio-net tx queue. It depends on
> multiqueue settings or other senders which are also using the same tx queue.
> Note that in the latter case if they constantly transmitting, the deadlock could
> not occur(*), however if it has just temporarily fulfill some portion of the
> pool in the mean time, then the low watermark cannot be helpful.
> (*: That is because it's reliable enough in the sense I mention below.)
>
> Keep in this in mind, let me briefly describe the possible deadlock I mentioned:
> (1). vhost-net on L1 guest has nothing to do sendmsg until the upper layer sets
> new descriptors, which depends only on the vhost-net zcopy callback and adding
> newly used descriptors.
> (2). vhost-net callback depends on the skb freeing on the xmit path only.
> (3). the xmit path depends (possibly only) on the vhost-net sendmsg.
> As you see, it's enough to bring about the situation above that L1 virtio-net
> reaches its limit earlier than the L0 host processing. The vhost-net pool could
> be almost full or empty, whatever.

Thanks for the context. This issue is very similar to the one that used to
exist when running out of transmit descriptors, before the removal of
the timer and introduction of skb_orphan in start_xmit.

To make sure that I understand correctly, let me paraphrase:

A. guest socket cannot send because it exhausted its sk budget (sndbuf, tsq, ..)

B. budget is not freed up until guest receives tx completion for this flow

C. tx completion is held back on the host side in vhost_zerocopy_signal_used
   behind the completion for an unrelated skb

D. unrelated packet is delayed somewhere in the host stackf zerocopy
completions.
   e.g., netem

The issue that is specific to vhost-net zerocopy is that (C) enforces strict
ordering of transmit completions causing head of line blocking behind
vhost-net zerocopy callbacks.

This is a different problem from

C1. tx completion is delayed until guest sends another packet and
       triggers free_old_xmit_skb

Both in host and guest, zerocopy packets should never be able to loop
to a receive path where they can cause unbounded delay.

The obvious cases of latency are queueing, like netem. That leads
to poor performance for unrelated flows, but I don't see how this
could cause deadlock.