linux-kernel - Re: [PATCH] fs/vboxsf: Replace kmap() with kmap_local

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c24577db-80ca-8682-c7f8-466cf9dab0c9@redhat.com>
Date:   Thu, 29 Jun 2023 17:01:49 +0200
From:   Hans de Goede <hdegoede@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>,
        Sumitra Sharma <sumitraartsy@...il.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Ira Weiny <ira.weiny@...el.com>,
        Fabio <fmdefrancesco@...il.com>, Deepak R Varma <drv@...lo.com>
Subject: Re: [PATCH] fs/vboxsf: Replace kmap() with kmap_local_{page, folio}()

Hi,

On 6/29/23 16:42, Matthew Wilcox wrote:
> On Thu, Jun 29, 2023 at 02:28:44AM -0700, Sumitra Sharma wrote:
>> On Wed, Jun 28, 2023 at 06:15:04PM +0100, Matthew Wilcox wrote:
>>> Here's a more comprehensive read_folio patch.  It's not at all
>>> efficient, but then if we wanted an efficient vboxsf, we'd implement
>>> vboxsf_readahead() and actually do an async call with deferred setting
>>> of the uptodate flag.  I can consult with anyone who wants to do all
>>> this work.
>>
>> So, after reading the comments, I understood that the problem presented 
>> by Hans and Matthew is as follows:
>>
>> 1) In the current code, the buffers used by vboxsf_write()/vboxsf_read() are 
>> translated to PAGELIST-s before passing to the hypervisor, 
>> but inefficiently— it first maps a page in vboxsf_read_folio() and then 
>> calls page_to_phys(virt_to_page()) in the function hgcm_call_init_linaddr(). 
> 
> It does ... and I'm not even sure that virt_to_page() works for kmapped
> pages.  Has it been tested with a 32-bit guest with, say, 4-8GB of memory?
> 
>> The inefficiency in the current implementation arises due to the unnecessary 
>> mapping of a page in vboxsf_read_folio() because the mapping output, i.e. the 
>> linear address, is used deep down in file 'drivers/virt/vboxguest/vboxguest_utils.c'. 
>> Hence, the mapping must be done in this file; to do so, the folio must be passed 
>> until this point. It can be done by adding a new member, 'struct folio *folio', 
>> in the 'struct vmmdev_hgcm_function_parameter64'. 
> 
> That's not the way to do it (as Hans already said).
> 
> The other problem is that vboxsf_read() is synchronous.  It makes the
> call to the host, then waits for the outcome.  What we really need is
> a vboxsf_readahead() that looks something like this:
> 
> static void vboxsf_readahead(struct readahead_control *ractl)
> {
> 	unsigned int nr = readahead_count(ractl);
> 	req = vbg_req_alloc(... something involving nr ...);
> 	... fill in the page array ...
> 	... submit the request ...
> }
> 
> You also need to set up a kthread that will sit on the hgcm_wq and handle
> the completions that come in (where you'd call folio_mark_uptodate() if
> the call is successful, folio_unlock() to indicate the I/O has completed,
> etc, etc).
> 
> Then go back to read_folio() (which can be synchronous), and maybe factor
> out the parts of vboxsf_readahead() that can be reused for filling in
> the vbg_req.
> 
> Hans might well have better ideas about this could be structured; I'm
> new to the vbox code.

I think that moving to directly passing vbox PAGELIST structs on
read is something which should not be too much work.

Moving to an async model is a lot more involved and has a much
larger chance of causing problems. So we need not only someone
to step up to not only make the change to async, but that person
also need to commit to help with maintaining vboxsf going forward,
esp. wrt debug + fix any problems stemming from the move to async
which are likely to only pop up much later as more and more
distros move to the new code.

Regards,

Hans