lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 23 Jan 2023 15:03:45 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     "Huang, Kai" <kai.huang@...el.com>,
        "chao.p.peng@...ux.intel.com" <chao.p.peng@...ux.intel.com>
Cc:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "jmattson@...gle.com" <jmattson@...gle.com>,
        "Hocko, Michal" <mhocko@...e.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "Lutomirski, Andy" <luto@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "tabba@...gle.com" <tabba@...gle.com>,
        "david@...hat.com" <david@...hat.com>,
        "michael.roth@....com" <michael.roth@....com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "corbet@....net" <corbet@....net>,
        "qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
        "dhildenb@...hat.com" <dhildenb@...hat.com>,
        "bfields@...ldses.org" <bfields@...ldses.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>, "bp@...en8.de" <bp@...en8.de>,
        "ddutile@...hat.com" <ddutile@...hat.com>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "vkuznets@...hat.com" <vkuznets@...hat.com>,
        "mail@...iej.szmigiero.name" <mail@...iej.szmigiero.name>,
        "naoya.horiguchi@....com" <naoya.horiguchi@....com>,
        "qperret@...gle.com" <qperret@...gle.com>,
        "arnd@...db.de" <arnd@...db.de>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "yu.c.zhang@...ux.intel.com" <yu.c.zhang@...ux.intel.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "wanpengli@...cent.com" <wanpengli@...cent.com>,
        "vannapurve@...gle.com" <vannapurve@...gle.com>,
        "hughd@...gle.com" <hughd@...gle.com>,
        "aarcange@...hat.com" <aarcange@...hat.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jlayton@...nel.org" <jlayton@...nel.org>,
        "joro@...tes.org" <joro@...tes.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Wang, Wei W" <wei.w.wang@...el.com>,
        "steven.price@....com" <steven.price@....com>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linmiaohe@...wei.com" <linmiaohe@...wei.com>
Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to
 create restricted user memory

On 12/22/22 01:37, Huang, Kai wrote:
>>> I argue that this page pinning (or page migration prevention) is not
>>> tied to where the page comes from, instead related to how the page will
>>> be used. Whether the page is restrictedmem backed or GUP() backed, once
>>> it's used by current version of TDX then the page pinning is needed. So
>>> such page migration prevention is really TDX thing, even not KVM generic
>>> thing (that's why I think we don't need change the existing logic of
>>> kvm_release_pfn_clean()). 
>>>
> This essentially boils down to who "owns" page migration handling, and sadly,
> page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle page
> migration by itself -- it's just a passive receiver.
> 
> For normal pages, page migration is totally done by the core-kernel (i.e. it
> unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops-
>> migrate_page() to actually migrate the page).
> In the sense of TDX, conceptually it should be done in the same way. The more
> important thing is: yes KVM can use get_page() to prevent page migration, but
> when KVM wants to support it, KVM cannot just remove get_page(), as the core-
> kernel will still just do migrate_page() which won't work for TDX (given
> restricted_memfd doesn't have a_ops->migrate_page() implemented).
> 
> So I think the restricted_memfd filesystem should own page migration handling,
> (i.e. by implementing a_ops->migrate_page() to either just reject page migration
> or somehow support it).

While this thread seems to be settled on refcounts already, just wanted
to point out that it wouldn't be ideal to prevent migrations by
a_ops->migrate_page() rejecting them. It would mean cputime wasted (i.e.
by memory compaction) by isolating the pages for migration and then
releasing them after the callback rejects it (at least we wouldn't waste
time creating and undoing migration entries in the userspace page tables
as there's no mmap). Elevated refcount on the other hand is detected
very early in compaction so no isolation is attempted, so from that
aspect it's optimal.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ