linux-kernel - Re: [PATCH v5 00/38] New page table range API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230712103543.4304235c@p-imbrenda>
Date:   Wed, 12 Jul 2023 10:35:43 +0200
From:   Claudio Imbrenda <imbrenda@...ux.ibm.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-arch@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        linux-s390 <linux-s390@...r.kernel.org>
Subject: Re: [PATCH v5 00/38] New page table range API

On Wed, 12 Jul 2023 06:29:21 +0100
Matthew Wilcox <willy@...radead.org> wrote:

> On Tue, Jul 11, 2023 at 05:24:40PM +0200, Claudio Imbrenda wrote:
> > On Tue, 11 Jul 2023 13:36:27 +0100
> > Matthew Wilcox <willy@...radead.org> wrote:  
> > > > I think we do use PG_arch_1 on s390 for our secure page handling and
> > > > making this perf folio instead of physical page really seems wrong
> > > > and it probably breaks this code.    
> > > 
> > > Per-page flags are going away in the next few years, so you're going to  
> > 
> > For each 4k physical page frame, we need to keep track whether it is
> > secure or not.  
> 
> Do you?  Wouldn't it make more sense to track that per allocation instead

no

> of per page?  ie if we allocate a 16kB anon folio for a VMA, don't you
> want the entire folio to be marked as secure vs insecure?

if we allocate a 16k folio, it would actually be initially marked as
non-secure until the guest touches any of it, then only those 4k pages
that are needed get marked as secure.

the guest can also share the pages with the host, in which case the
individual 4k pages get marked as non-secure once I/O is attempted on
them (e.g. direct I/O)

userspace (i.e. QEMU) can also try to look into the guest, causing
individual pages to be exported (securely encrypted and then marked as
non-secure) if they were secure and not shared.

I/O cannot trigger exports, it will just fail, and that should not
happen because in some cases it can bring down the whole system. Which
is one of the main reasons why we need to keep track of the state.

> 
> I don't really know what secure means in this context.  I think it has
> something to do with which of the VM or the hypervisor can access it, but
> it feels like something new that I've never had properly explained to me.

Secure means it belongs to a secure guest (confidential VM,
protected virtualisation, Secure Execution, there are many names...).

Hardware will prevent the host (or any other entity except for the
secure guest itself) from accessing those 4k physical page frames,
regardless of how the host might try. An exception will be presented
for any attempts.

I/O will not trigger any exception, and will instead just fail.

I hope this explains why we need to track the property for each 4k
physical page frame.

> 
> > A bit in struct page seems the most logical choice. If that's not
> > possible anymore, how would you propose we should do?  
> 
> The plan is to shrink struct page down to a single pointer (which

interesting

> includes a few tag bits to say what type that pointer is -- a page
> table, anon mem, file mem, slab, etc).  So there won't be any bits
> available for something like "secure or not".  You could use a side
> structure if you really need to keep track on a per page basis.

I guess that's something we will need to work on