netdev - Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f70e086-7efd-ef23-8940-e86ec06ad74d@redhat.com>
Date:   Tue, 22 Aug 2017 19:47:35 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Koichiro Den <den@...ipeden.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit
 path if no tx napi



On 2017年08月22日 11:10, Willem de Bruijn wrote:
>>>> Interesting, deadlock could be treated as a a radical case of the
>>>> discussion
>>>> here https://patchwork.kernel.org/patch/3787671/.
>>>>
>>>> git grep tells more similar skb_orphan() cases. Do we need to change them
>>>> all (or part)?
>>> Most skb_orphan calls are not relevant to the issue of transmit delay.
>>
>> Yes, but at least we should audit the ones in drivers/net.
> Do you mean other virtual device driver transmit paths, like xen,
> specifically?

Git grep does not show skb_orphan() was used for xen for me. But looking 
at cxgb4/sge.c which seems to call skb_orphan() for large packet and 
reclaim transmitted packets when:

- doing ndo_start_xmit()
- or a timer.

>>>> Actually, we may meet similar issues at many other places (e.g netem).
>>> Netem is an interesting case. Because it is intended to mimic network
>>> delay, at least in the case where it calls skb_orphan, it may make
>>> sense to release all references, including calling skb_zcopy_clear.
>>>
>>> In general, zerocopy reverts to copy on all paths that may cause
>>> unbounded delay due to another process. Guarding against delay
>>> induced by the administrator is infeasible. It is always possible to
>>> just pause the nic. Netem is one instance of that, and not unbounded.
>>
>> The problem is, admin may only delay the traffic in e.g one interface, but
>> it actually delay or stall all traffic inside a VM.
> Understood. Ideally we can remove the HoL blocking cause of this,
> itself.
>
>>>> Need
>>>> to consider a complete solution for this. Figuring out all places that
>>>> could
>>>> delay a packet is a method.
>>> The issue described in the referenced patch seems like head of line
>>> blocking between two flows. If one flow delays zerocopy descriptor
>>> release from the vhost-net pool, it blocks all subsequent descriptors
>>> in that pool from being released, also delaying other flows that use
>>> the same descriptor pool. If the pool is empty, all transmission stopped.
>>>
>>> Reverting to copy tx when the pool reaches a low watermark, as the
>>> patch does, fixes this.
>>
>> An issue of the referenced patch is that sndbuf could be smaller than low
>> watermark.
>>
>>> Perhaps the descriptor pool should also be
>>> revised to allow out of order completions. Then there is no need to
>>> copy zerocopy packets whenever they may experience delay.
>>
>> Yes, but as replied in the referenced thread, windows driver may treat out
>> of order completion as a bug.
> Interesting. I missed that. Perhaps the zerocopy optimization
> could be gated on guest support for out of order completions.

Yes, we may plan to explicitly notify driver about out of order in 
future virtio.

>
>>> On the point of counting copy vs zerocopy: the new msg_zerocopy
>>> variant of ubuf_info has a field to record whether a deep copy was
>>> made. This can be used with vhost-net zerocopy, too.
>>
>> Just to make sure I understand. It's still not clear to me how to reuse this
>> for vhost-net, e.g zerocopy flag is in a union which is not used by
>> vhost_net.
> True, but that is not set in stone. I went back and forth on that when
> preparing fix 0a4a060bb204 ("sock: fix zerocopy_success regression
> with msg_zerocopy"). The field can be moved outside the union and
> initialized in the other zerocopy paths.

Ok. I see.

Thanks