lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Aug 2021 18:17:09 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     Tiberiu A Georgescu <tiberiu.georgescu@...anix.com>,
        akpm@...ux-foundation.org, viro@...iv.linux.org.uk,
        christian.brauner@...ntu.com, ebiederm@...ssion.com,
        adobriyan@...il.com, songmuchun@...edance.com, axboe@...nel.dk,
        vincenzo.frascino@....com, catalin.marinas@....com,
        peterz@...radead.org, chinwen.chang@...iatek.com,
        linmiaohe@...wei.com, jannh@...gle.com, apopple@...dia.com,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, ivan.teterevkov@...anix.com,
        florian.schmidt@...anix.com, carl.waldspurger@...anix.com,
        jonathan.davies@...anix.com
Subject: Re: [PATCH 0/1] pagemap: swap location for shared pages

On 11.08.21 18:15, David Hildenbrand wrote:
> On 04.08.21 21:17, Peter Xu wrote:
>> On Wed, Aug 04, 2021 at 08:49:14PM +0200, David Hildenbrand wrote:
>>> TBH, I tend to really dislike the PTE marker idea. IMHO, we shouldn't store
>>> any state information regarding shared memory in per-process page tables: it
>>> just doesn't make too much sense.
>>>
>>> And this is similar to SOFTDIRTY or UFFD_WP bits: this information actually
>>> belongs to the shared file ("did *someone* write to this page", "is
>>> *someone* interested into changes to that page", "is there something"). I
>>> know, that screams for a completely different design in respect to these
>>> features.
>>>
>>> I guess we start learning the hard way that shared memory is just different
>>> and requires different interfaces than per-process page table interfaces we
>>> have (pagemap, userfaultfd).
>>>
>>> I didn't have time to explore any alternatives yet, but I wonder if tracking
>>> such stuff per an actual fd/memfd and not via process page tables is
>>> actually the right and clean approach. There are certainly many issues to
>>> solve, but conceptually to me it feels more natural to have these shared
>>> memory features not mangled into process page tables.
>>
>> Yes, we can explore all the possibilities, I'm totally fine with it.
>>
>> I just want to say I still don't think when there's page cache then we must put
>> all the page-relevant things into the page cache.
> 
> [sorry for the late reply]
> 
> Right, but for the case of shared, swapped out pages, the information is
> already there, in the page cache :)
> 
>>
>> They're shared by processes, but process can still have its own way to describe
>> the relationship to that page in the cache, to me it's as simple as "we allow
>> process A to write to page cache P", while "we don't allow process B to write
>> to the same page" like the write bit.
> 
> The issue I'm having uffd-wp as it was proposed for shared memory is
> that there is hardly a sane use case where we would *want* it to work
> that way.
> 
> A UFFD-WP flag in a page table for shared memory means "please notify
> once this process modifies the shared memory (via page tables, not via
> any other fd modification)". Do we have an example application where
> these semantics makes sense and don't over-complicate the whole
> approach? I don't know any, thus I'm asking dumb questions :)
> 
> 
> For background snapshots in QEMU the flow would currently be like this,
> assuming all processes have the shared guest memory mapped.
> 
> 1. Background snapshot preparation: QEMU requests all processes
>      to uffd-wp the range
> a) All processes register a uffd handler on guest RAM
> b) All processes fault in all guest memory (essentially populating all
>      memory): with a uffd-WP extensions we might be able to get rid of
>      that, I remember you were working on that.
> c) All processes uffd-WP the range to set the bit in their page table
> 
> 2. Background snapshot runs:
> a) A process either receives a UFFD-WP event and forwards it to QEMU or
>      QEMU polls all other processes for UFFD events.
> b) QEMU writes the to-be-changed page to the migration stream.
> c) QEMU triggers all processes to un-protect the page and wake up any
>      waiters. All processes clear the uffd-WP bit in their page tables.

Oh, and I forgot, whenever we save any page to the migration stream, we 
have to trigger all processes to un-protect.


-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ