lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzOF7MT15nfBX0Ma@google.com>
Date:   Tue, 27 Sep 2022 23:23:24 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Chao Peng <chao.p.peng@...ux.intel.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
        linux-doc@...r.kernel.org, qemu-devel@...gnu.org,
        Jonathan Corbet <corbet@....net>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>,
        Hugh Dickins <hughd@...gle.com>,
        Jeff Layton <jlayton@...nel.org>,
        "J . Bruce Fields" <bfields@...ldses.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shuah Khan <shuah@...nel.org>, Mike Rapoport <rppt@...nel.org>,
        Steven Price <steven.price@....com>,
        "Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
        Vlastimil Babka <vbabka@...e.cz>,
        Vishal Annapurve <vannapurve@...gle.com>,
        Yu Zhang <yu.c.zhang@...ux.intel.com>, luto@...nel.org,
        jun.nakajima@...el.com, dave.hansen@...el.com, ak@...ux.intel.com,
        aarcange@...hat.com, ddutile@...hat.com, dhildenb@...hat.com,
        Quentin Perret <qperret@...gle.com>,
        Michael Roth <michael.roth@....com>, mhocko@...e.com,
        Muchun Song <songmuchun@...edance.com>, wei.w.wang@...el.com
Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd

On Mon, Sep 26, 2022, David Hildenbrand wrote:
> On 26.09.22 16:48, Kirill A. Shutemov wrote:
> > On Mon, Sep 26, 2022 at 12:35:34PM +0200, David Hildenbrand wrote:
> > > When using DAX, what happens with the shared <->private conversion? Which
> > > "type" is supposed to use dax, which not?
> > > 
> > > In other word, I'm missing too many details on the bigger picture of how
> > > this would work at all to see why it makes sense right now to prepare for
> > > that.
> > 
> > IIUC, KVM doesn't really care about pages or folios. They need PFN to
> > populate SEPT. Returning page/folio would make KVM do additional steps to
> > extract PFN and one more place to have a bug.
> 
> Fair enough. Smells KVM specific, though.

TL;DR: I'm good with either approach, though providing a "struct page" might avoid
       refactoring the API in the nearish future.

Playing devil's advocate for a second, the counter argument is that KVM is the
only user for the foreseeable future.

That said, it might make sense to return a "struct page" from the core API and
force KVM to do page_to_pfn().  KVM already does that for HVA-based memory, so
it's not exactly new code.

More importantly, KVM may actually need/want the "struct page" in the not-too-distant
future to support mapping non-refcounted "struct page" memory into the guest.  The
ChromeOS folks have a use case involving virtio-gpu blobs where KVM can get handed a
"struct page" that _isn't_ refcounted[*].  Once the lack of mmu_notifier integration
is fixed, the remaining issue is that KVM doesn't currently have a way to determine
whether or not it holds a reference to the page.  Instead, KVM assumes that if the
page is "normal", it's refcounted, e.g. see kvm_release_pfn_clean().

KVM's current workaround for this is to refuse to map these pages into the guest,
i.e. KVM simply forces its assumption that normal pages are refcounted to be true.
To remove that workaround, the likely solution will be to pass around a tuple of
page+pfn, where "page" is non-NULL if the pfn is a refcounted "struct page".

At that point, getting handed a "struct page" from the core API would be a good
thing as KVM wouldn't need to probe the PFN to determine whether or not it's a
refcounted page.

Note, I still want the order to be provided by the API so that KVM doesn't need
to run through a bunch of helpers to try and figure out the allowed mapping size.

[*] https://lore.kernel.org/all/CAD=HUj736L5oxkzeL2JoPV8g1S6Rugy_TquW=PRt73YmFzP6Jw@mail.gmail.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ