netdev - Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <836932fc-9266-b73d-2ee5-645f399e1a54@redhat.com>
Date:   Fri, 14 Dec 2018 12:29:54 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     kvm@...r.kernel.org, virtualization@...ts.linux-foundation.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 0/3] vhost: accelerate metadata access through
 vmap()

On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
>> Hi:
>>
>> This series tries to access virtqueue metadata through kernel virtual
>> address instead of copy_user() friends since they had too much
>> overheads like checks, spec barriers or even hardware feature
>> toggling.
>>
>> Test shows about 24% improvement on TX PPS. It should benefit other
>> cases as well.
>>
>> Please review
> I think the idea of speeding up userspace access is a good one.
> However I think that moving all checks to start is way too aggressive.

So did packet and AF_XDP. Anyway, sharing address space and access them 
directly is the fastest way. Performance is the major consideration for 
people to choose backend. Compare to userspace implementation, vhost 
does not have security advantages at any level. If vhost is still slow, 
people will start to develop backends based on e.g AF_XDP.

> Instead, let's batch things up but let's not keep them
> around forever.
> Here are some ideas:
>
>
> 1. Disable preemption, process a small number of small packets
>     directly in an atomic context. This should cut latency
>     down significantly, the tricky part is to only do it
>     on a light load and disable this
>     for the streaming case otherwise it's unfair.
>     This might fail, if it does just bounce things out to
>     a thread.

I'm not sure what context you meant here. Is this for TX path of TUN? 
But a fundamental difference is my series is targeted for extreme heavy 
load not light one, 100% cpu for vhost is expected.

>
> 2. Switch to unsafe_put_user/unsafe_get_user,
>     and batch up multiple accesses.

As I said, unless we can batch accessing of two difference places of 
three of avail, descriptor and used. It won't help for batching the 
accessing of a single place like used. I'm even not sure this can be 
done consider the case of packed virtqueue, we have a single descriptor 
ring. Batching through unsafe helpers may not help in this case since 
it's equivalent to safe ones . And This requires non trivial refactoring 
of vhost. And such refactoring itself make give us noticeable impact 
(e.g it may lead regression).

>
> 3. Allow adding a fixup point manually,
>     such that multiple independent get_user accesses
>     can get a single fixup (will allow better compiler
>     optimizations).
>

So for metadata access, I don't see how you suggest here can help in the 
case of heavy workload.

For data access, this may help but I've played to batch the data copy to 
reduce SMAP/spec barriers in vhost-net but I don't see performance 
improvement.

Thanks

>
>
>
>> Jason Wang (3):
>>    vhost: generalize adding used elem
>>    vhost: fine grain userspace memory accessors
>>    vhost: access vq metadata through kernel virtual address
>>
>>   drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++----
>>   drivers/vhost/vhost.h |  11 ++
>>   2 files changed, 266 insertions(+), 26 deletions(-)
>>
>> -- 
>> 2.17.1