linux-kernel - Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210519120642.GA19235@ashkalra_ubuntu_server>
Date:   Wed, 19 May 2021 12:06:42 +0000
From:   Ashish Kalra <ashish.kalra@....com>
To:     Steve Rutherford <srutherford@...gle.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Joerg Roedel <joro@...tes.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        X86 ML <x86@...nel.org>, KVM list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Venu Busireddy <venu.busireddy@...cle.com>,
        Brijesh Singh <brijesh.singh@....com>
Subject: Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption
 status is changed

On Mon, May 17, 2021 at 07:01:09PM -0700, Steve Rutherford wrote:
> On Fri, May 14, 2021 at 3:05 AM Ashish Kalra <ashish.kalra@....com> wrote:
> >
> > On Fri, May 14, 2021 at 11:34:32AM +0200, Borislav Petkov wrote:
> > > On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote:
> > > > Ideally we should fail/stop migration even if a single guest page
> > > > encryption status cannot be notified and that should be the way to
> > > > proceed in this case, the guest kernel should notify the source
> > > > userspace VMM to block/stop migration in this case.
> > >
> > > Yap, and what I'm trying to point out here is that if the low level
> > > machinery fails for whatever reason and it cannot recover, we should
> > > propagate that error up the chain so that the user is aware as to why it
> > > failed.
> > >
> >
> > I totally agree.
> >
> > > WARN is a good first start but in some configurations those splats don't
> > > even get shown as people don't look at dmesg, etc.
> > >
> > > And I think it is very important to propagate those errors properly
> > > because there's a *lot* of moving parts involved in a guest migration
> > > and you have encrypted memory which makes debugging this probably even
> > > harder, etc, etc.
> > >
> > > I hope this makes more sense.
> > >
> > > > From a practical side, i do see Qemu's migrate_add_blocker() interface
> > > > but that looks to be a static interface and also i don't think it will
> > > > force stop an ongoing migration, is there an existing mechanism
> > > > to inform userspace VMM from kernel about blocking/stopping migration ?
> > >
> > > Hmm, so __set_memory_enc_dec() which calls
> > > notify_addr_enc_status_changed() is called by the guest, right, when it
> > > starts migrating.
> > >
> >
> > No, actually notify_addr_enc_status_changed() is called whenever a range
> > of memory is marked as encrypted or decrypted, so it has nothing to do
> > with migration as such.
> >
> > This is basically modifying the encryption attributes on the page tables
> > and correspondingly also making the hypercall to inform the hypervisor about
> > page status encryption changes. The hypervisor will use this information
> > during an ongoing or future migration, so this information is maintained
> > even though migration might never be initiated here.
> >
> > > Can an error value from it be propagated up the callchain so it can be
> > > turned into an error messsage for the guest owner to see?
> > >
> >
> > The error value cannot be propogated up the callchain directly
> > here, but one possibility is to leverage the hypercall and use Sean's
> > proposed hypercall interface to notify the host/hypervisor to block/stop
> > any future/ongoing migration.
> >
> > Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems
> > more ideal.
> >
> > Thanks,
> > Ashish
> 
> How realistic is this type of failure? If you've gotten this deep, it
> seems like something has gone very wrong if the memory you are about
> to mark as shared (or encrypted) doesn't exist and isn't mapped. In
> particular, is the kernel going to page fault when it tries to
> reinitialize the page it's currently changing the c-bit of? From what
> I recall, most paths that do either set_memory_encrypted or
> set_memory_decrypted memset the region being toggled. Note: dma_pool
> doesn't immediately memset, but the VA it's providing to set_decrypted
> is freshly fetched from a recently allocated region (something has to
> have gone pretty wrong if this is invalid if I'm not mistaken. No one
> would think twice if you wrote to that freshly allocated page).
> 
> The reason I mention this is that SEV migration is going to be the
> least of your concerns if you are already on a one-way train towards a
> Kernel oops. I'm not certain I would go so far as to say this should
> BUG() instead (I think the page fault on access might be easier to
> debug a BUG here), but I'm pretty skeptical that the kernel is going
> to do too well if it doesn't know if its kernel VAs are valid.
> 
> If, despite the above, we expect to infrequently-but-not-never disable
> migration with no intention of reenabling it, we should signal it
> differently than we currently signal migration enablement. Currently,
> if you toggle migration from on to off there is an implication that
> you are about to reboot, and you are only ephemerally unable to
> migrate. Having permanent disablement be indistinguishable from a
> really long reboot is a recipe for a really sad long timeout in
> userspace.

Also looking at set_memory_encrypted and set_memory_decrypted usage
patterns, the persistent decrypted regions like the dma pool,
bss_decrypted, etc., will be setup at boot time. Later the unused
bss_decrypted section will be freed and set_memory_encrypted called on
the freed memory at end of kernel boot.

Most of the runtime set_memory_encrypted and set_memory_decrypted calls
will be on dynamically allocated dma buffers via dma_alloc_coherent()
and dma_free_coherent().

For example, A dma_alloc_coherent() request will allocate pages and then call
set_memory_decrypted() against them. When dma_free_coherent() is called,
set_memory_encrypted() is called against the pages about to be freed 
before they are actually freed.

Now these buffers have very short life and only used for immediate I/O
and then freed, so they may not be of major concern for SEV
migration ?

So disabling migration for failure of address lookup or mapping failures
on such pages will really be an overkill. 

Might be in favor of Steve's thoughts above of doing a BUG() here
instead.

Thanks,
Ashish