[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32fd63ef-3a8f-a037-28dc-a63dc11087a3@redhat.com>
Date: Wed, 11 Aug 2021 18:17:09 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@...anix.com>,
akpm@...ux-foundation.org, viro@...iv.linux.org.uk,
christian.brauner@...ntu.com, ebiederm@...ssion.com,
adobriyan@...il.com, songmuchun@...edance.com, axboe@...nel.dk,
vincenzo.frascino@....com, catalin.marinas@....com,
peterz@...radead.org, chinwen.chang@...iatek.com,
linmiaohe@...wei.com, jannh@...gle.com, apopple@...dia.com,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, ivan.teterevkov@...anix.com,
florian.schmidt@...anix.com, carl.waldspurger@...anix.com,
jonathan.davies@...anix.com
Subject: Re: [PATCH 0/1] pagemap: swap location for shared pages
On 11.08.21 18:15, David Hildenbrand wrote:
> On 04.08.21 21:17, Peter Xu wrote:
>> On Wed, Aug 04, 2021 at 08:49:14PM +0200, David Hildenbrand wrote:
>>> TBH, I tend to really dislike the PTE marker idea. IMHO, we shouldn't store
>>> any state information regarding shared memory in per-process page tables: it
>>> just doesn't make too much sense.
>>>
>>> And this is similar to SOFTDIRTY or UFFD_WP bits: this information actually
>>> belongs to the shared file ("did *someone* write to this page", "is
>>> *someone* interested into changes to that page", "is there something"). I
>>> know, that screams for a completely different design in respect to these
>>> features.
>>>
>>> I guess we start learning the hard way that shared memory is just different
>>> and requires different interfaces than per-process page table interfaces we
>>> have (pagemap, userfaultfd).
>>>
>>> I didn't have time to explore any alternatives yet, but I wonder if tracking
>>> such stuff per an actual fd/memfd and not via process page tables is
>>> actually the right and clean approach. There are certainly many issues to
>>> solve, but conceptually to me it feels more natural to have these shared
>>> memory features not mangled into process page tables.
>>
>> Yes, we can explore all the possibilities, I'm totally fine with it.
>>
>> I just want to say I still don't think when there's page cache then we must put
>> all the page-relevant things into the page cache.
>
> [sorry for the late reply]
>
> Right, but for the case of shared, swapped out pages, the information is
> already there, in the page cache :)
>
>>
>> They're shared by processes, but process can still have its own way to describe
>> the relationship to that page in the cache, to me it's as simple as "we allow
>> process A to write to page cache P", while "we don't allow process B to write
>> to the same page" like the write bit.
>
> The issue I'm having uffd-wp as it was proposed for shared memory is
> that there is hardly a sane use case where we would *want* it to work
> that way.
>
> A UFFD-WP flag in a page table for shared memory means "please notify
> once this process modifies the shared memory (via page tables, not via
> any other fd modification)". Do we have an example application where
> these semantics makes sense and don't over-complicate the whole
> approach? I don't know any, thus I'm asking dumb questions :)
>
>
> For background snapshots in QEMU the flow would currently be like this,
> assuming all processes have the shared guest memory mapped.
>
> 1. Background snapshot preparation: QEMU requests all processes
> to uffd-wp the range
> a) All processes register a uffd handler on guest RAM
> b) All processes fault in all guest memory (essentially populating all
> memory): with a uffd-WP extensions we might be able to get rid of
> that, I remember you were working on that.
> c) All processes uffd-WP the range to set the bit in their page table
>
> 2. Background snapshot runs:
> a) A process either receives a UFFD-WP event and forwards it to QEMU or
> QEMU polls all other processes for UFFD events.
> b) QEMU writes the to-be-changed page to the migration stream.
> c) QEMU triggers all processes to un-protect the page and wake up any
> waiters. All processes clear the uffd-WP bit in their page tables.
Oh, and I forgot, whenever we save any page to the migration stream, we
have to trigger all processes to un-protect.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists