netdev - Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ff45ea43-1145-5ea6-767c-1a99d55a9c61@redhat.com>
Date:   Tue, 12 Mar 2019 10:52:15 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>, kvm@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, peterx@...hat.com,
        linux-mm@...ck.org, Jerome Glisse <jglisse@...hat.com>
Subject: Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel
 virtual address


On 2019/3/11 下午8:48, Michael S. Tsirkin wrote:
> On Mon, Mar 11, 2019 at 03:40:31PM +0800, Jason Wang wrote:
>> On 2019/3/9 上午3:48, Andrea Arcangeli wrote:
>>> Hello Jeson,
>>>
>>> On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote:
>>>> Just to make sure I understand here. For boosting through huge TLB, do
>>>> you mean we can do that in the future (e.g by mapping more userspace
>>>> pages to kenrel) or it can be done by this series (only about three 4K
>>>> pages were vmapped per virtqueue)?
>>> When I answered about the advantages of mmu notifier and I mentioned
>>> guaranteed 2m/gigapages where available, I overlooked the detail you
>>> were using vmap instead of kmap. So with vmap you're actually doing
>>> the opposite, it slows down the access because it will always use a 4k
>>> TLB even if QEMU runs on THP or gigapages hugetlbfs.
>>>
>>> If there's just one page (or a few pages) in each vmap there's no need
>>> of vmap, the linearity vmap provides doesn't pay off in such
>>> case.
>>>
>>> So likely there's further room for improvement here that you can
>>> achieve in the current series by just dropping vmap/vunmap.
>>>
>>> You can just use kmap (or kmap_atomic if you're in preemptible
>>> section, should work from bh/irq).
>>>
>>> In short the mmu notifier to invalidate only sets a "struct page *
>>> userringpage" pointer to NULL without calls to vunmap.
>>>
>>> In all cases immediately after gup_fast returns you can always call
>>> put_page immediately (which explains why I'd like an option to drop
>>> FOLL_GET from gup_fast to speed it up).
>>>
>>> Then you can check the sequence_counter and inc/dec counter increased
>>> by _start/_end. That will tell you if the page you got and you called
>>> put_page to immediately unpin it or even to free it, cannot go away
>>> under you until the invalidate is called.
>>>
>>> If sequence counters and counter tells that gup_fast raced with anyt
>>> mmu notifier invalidate you can just repeat gup_fast. Otherwise you're
>>> done, the page cannot go away under you, the host virtual to host
>>> physical mapping cannot change either. And the page is not pinned
>>> either. So you can just set the "struct page * userringpage = page"
>>> where "page" was the one setup by gup_fast.
>>>
>>> When later the invalidate runs, you can just call set_page_dirty if
>>> gup_fast was called with "write = 1" and then you clear the pointer
>>> "userringpage = NULL".
>>>
>>> When you need to read/write to the memory
>>> kmap/kmap_atomic(userringpage) should work.
>> Yes, I've considered kmap() from the start. The reason I don't do that is
>> large virtqueue may need more than one page so VA might not be contiguous.
>> But this is probably not a big issue which just need more tricks in the
>> vhost memory accessors.
>>
>>
>>> In short because there's no hardware involvement here, the established
>>> mapping is just the pointer to the page, there is no need of setting
>>> up any pagetables or to do any TLB flushes (except on 32bit archs if
>>> the page is above the direct mapping but it never happens on 64bit
>>> archs).
>> I see, I believe we don't care much about the performance of 32bit archs (or
>> we can just fallback to copy_to_user() friends).
> Using copyXuser is better I guess.


Ok.


>
>> Using direct mapping (I
>> guess kernel will always try hugepage for that?) should be better and we can
>> even use it for the data transfer not only for the metadata.
>>
>> Thanks
> We can't really. The big issue is get user pages. Doing that on data
> path will be slower than copyXuser.


I meant if we can find a way to avoid doing gup in datapath. E.g vhost 
maintain a range tree and add or remove ranges through MMU notifier. 
Then in datapath, if we find the range, then use direct mapping 
otherwise copy_to_user().

Thanks


>   Or maybe it won't with the
> amount of mitigations spread around. Go ahead and try.
>
>