netdev - Re: [PATCH net-next V4 5/5] vhost: access vq metadata through kernel virtual address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4521d3d8-561e-53f5-98e1-bf7ace003701@redhat.com>
Date:   Thu, 24 Jan 2019 12:11:28 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH net-next V4 5/5] vhost: access vq metadata through kernel
 virtual address


On 2019/1/24 下午12:07, Jason Wang wrote:
>
> On 2019/1/23 下午10:08, Michael S. Tsirkin wrote:
>> On Wed, Jan 23, 2019 at 05:55:57PM +0800, Jason Wang wrote:
>>> It was noticed that the copy_user() friends that was used to access
>>> virtqueue metdata tends to be very expensive for dataplane
>>> implementation like vhost since it involves lots of software checks,
>>> speculation barrier, hardware feature toggling (e.g SMAP). The
>>> extra cost will be more obvious when transferring small packets since
>>> the time spent on metadata accessing become more significant.
>>>
>>> This patch tries to eliminate those overheads by accessing them
>>> through kernel virtual address by vmap(). To make the pages can be
>>> migrated, instead of pinning them through GUP, we use MMU notifiers to
>>> invalidate vmaps and re-establish vmaps during each round of metadata
>>> prefetching if necessary. For devices that doesn't use metadata
>>> prefetching, the memory accessors fallback to normal copy_user()
>>> implementation gracefully. The invalidation was synchronized with
>>> datapath through vq mutex, and in order to avoid hold vq mutex during
>>> range checking, MMU notifier was teared down when trying to modify vq
>>> metadata.
>>>
>>> Another thing is kernel lacks efficient solution for tracking dirty
>>> pages by vmap(), this will lead issues if vhost is using file backed
>>> memory which needs care of writeback. This patch solves this issue by
>>> just skipping the vma that is file backed and fallback to normal
>>> copy_user() friends. This might introduce some overheads for file
>>> backed users but consider this use case is rare we could do
>>> optimizations on top.
>>>
>>> Note that this was only done when device IOTLB is not enabled. We
>>> could use similar method to optimize it in the future.
>>>
>>> Tests shows at most about 22% improvement on TX PPS when using
>>> virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell:
>>>
>>>          SMAP on | SMAP off
>>> Before: 5.0Mpps | 6.6Mpps
>>> After:  6.1Mpps | 7.4Mpps
>>>
>>> Signed-off-by: Jason Wang <jasowang@...hat.com>
>>
>> So this is the bulk of the change.
>> Threee things that I need to look into
>> - Are there any security issues with bypassing the speculation barrier
>>    that is normally present after access_ok?
>
>
> If we can make sure the bypassing was only used in a kthread (vhost), 
> it should be fine I think.
>
>
>> - How hard does the special handling for
>>    file backed storage make testing?
>
>
> It's as simple as un-commenting vhost_can_vmap()? Or I can try to hack 
> qemu or dpdk to test this.
>
>
>>    On the one hand we could add a module parameter to
>>    force copy to/from user. on the other that's
>>    another configuration we need to support.
>
>
> That sounds sub-optimal since it leave the choice to users.
>
>
>>    But iotlb is not using vmap, so maybe that's enough
>>    for testing.
>> - How hard is it to figure out which mode uses which code.


It's as simple as tracing __get_user() usage in vhost process?

Thanks


>>
>>
>>
>> Meanwhile, could you pls post data comparing this last patch with the
>> below?  This removes the speculation barrier replacing it with a
>> (useless but at least more lightweight) data dependency.
>
>
> SMAP off
>
> Your patch: 7.2MPPs
>
> vmap: 7.4Mpps
>
> I don't test SMAP on, since it will be much slow for sure.
>
> Thanks
>
>
>>
>> Thanks!
>>
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index bac939af8dbb..352ee7e14476 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -739,7 +739,7 @@ static int vhost_copy_to_user(struct 
>> vhost_virtqueue *vq, void __user *to,
>>       int ret;
>>         if (!vq->iotlb)
>> -        return __copy_to_user(to, from, size);
>> +        return copy_to_user(to, from, size);
>>       else {
>>           /* This function should be called after iotlb
>>            * prefetch, which means we're sure that all vq
>> @@ -752,7 +752,7 @@ static int vhost_copy_to_user(struct 
>> vhost_virtqueue *vq, void __user *to,
>>                        VHOST_ADDR_USED);
>>             if (uaddr)
>> -            return __copy_to_user(uaddr, from, size);
>> +            return copy_to_user(uaddr, from, size);
>>             ret = translate_desc(vq, (u64)(uintptr_t)to, size, 
>> vq->iotlb_iov,
>>                        ARRAY_SIZE(vq->iotlb_iov),
>> @@ -774,7 +774,7 @@ static int vhost_copy_from_user(struct 
>> vhost_virtqueue *vq, void *to,
>>       int ret;
>>         if (!vq->iotlb)
>> -        return __copy_from_user(to, from, size);
>> +        return copy_from_user(to, from, size);
>>       else {
>>           /* This function should be called after iotlb
>>            * prefetch, which means we're sure that vq
>> @@ -787,7 +787,7 @@ static int vhost_copy_from_user(struct 
>> vhost_virtqueue *vq, void *to,
>>           struct iov_iter f;
>>             if (uaddr)
>> -            return __copy_from_user(to, uaddr, size);
>> +            return copy_from_user(to, uaddr, size);
>>             ret = translate_desc(vq, (u64)(uintptr_t)from, size, 
>> vq->iotlb_iov,
>>                        ARRAY_SIZE(vq->iotlb_iov),
>> @@ -855,13 +855,13 @@ static inline void __user 
>> *__vhost_get_user(struct vhost_virtqueue *vq,
>>   ({ \
>>       int ret = -EFAULT; \
>>       if (!vq->iotlb) { \
>> -        ret = __put_user(x, ptr); \
>> +        ret = put_user(x, ptr); \
>>       } else { \
>>           __typeof__(ptr) to = \
>>               (__typeof__(ptr)) __vhost_get_user(vq, ptr,    \
>>                         sizeof(*ptr), VHOST_ADDR_USED); \
>>           if (to != NULL) \
>> -            ret = __put_user(x, to); \
>> +            ret = put_user(x, to); \
>>           else \
>>               ret = -EFAULT;    \
>>       } \
>> @@ -872,14 +872,14 @@ static inline void __user 
>> *__vhost_get_user(struct vhost_virtqueue *vq,
>>   ({ \
>>       int ret; \
>>       if (!vq->iotlb) { \
>> -        ret = __get_user(x, ptr); \
>> +        ret = get_user(x, ptr); \
>>       } else { \
>>           __typeof__(ptr) from = \
>>               (__typeof__(ptr)) __vhost_get_user(vq, ptr, \
>>                                  sizeof(*ptr), \
>>                                  type); \
>>           if (from != NULL) \
>> -            ret = __get_user(x, from); \
>> +            ret = get_user(x, from); \
>>           else \
>>               ret = -EFAULT; \
>>       } \