linux-kernel - Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <973c6f79-38ad-aa30-bfec-c2a1c7db5d70@suse.cz>
Date:   Wed, 16 Nov 2022 10:08:05 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     "Kalra, Ashish" <ashish.kalra@....com>,
        Borislav Petkov <bp@...en8.de>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com,
        jroedel@...e.de, thomas.lendacky@....com, hpa@...or.com,
        ardb@...nel.org, pbonzini@...hat.com, seanjc@...gle.com,
        vkuznets@...hat.com, jmattson@...gle.com, luto@...nel.org,
        dave.hansen@...ux.intel.com, slp@...hat.com, pgonda@...gle.com,
        peterz@...radead.org, srinivas.pandruvada@...ux.intel.com,
        rientjes@...gle.com, dovmurik@...ux.ibm.com, tobin@....com,
        michael.roth@....com, kirill@...temov.name, ak@...ux.intel.com,
        tony.luck@...el.com, marcorr@...gle.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com, alpergun@...gle.com,
        dgilbert@...hat.com, jarkko@...nel.org,
        "Kaplan, David" <David.Kaplan@....com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR
 allocation when SNP is enabled

On 11/15/22 19:15, Kalra, Ashish wrote:
> 
> On 11/15/2022 11:24 AM, Kalra, Ashish wrote:
>> Hello Vlastimil,
>>
>> On 11/15/2022 9:14 AM, Vlastimil Babka wrote:
>>> Cc'ing memory failure folks, the beinning of this subthread is here:
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&amp;data=05%7C01%7Cashish.kalra%40amd.com%7C944b59f239c541a52ac808dac71c2089%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041220947600149%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=do9zzyMlAErkKx5rguqnL2GoG4lhsWHDI74zgwLWaZU%3D&amp;reserved=0
>>>
>>> On 11/15/22 00:36, Kalra, Ashish wrote:
>>>> Hello Boris,
>>>>
>>>> On 11/2/2022 6:22 AM, Borislav Petkov wrote:
>>>>> On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote:
>>>>>>        if (snp_lookup_rmpentry(pfn, &rmp_level)) {
>>>>>>               do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
>>>>>>               return RMP_PF_RETRY;
>>>>>
>>>>> Does this issue some halfway understandable error message why the
>>>>> process got killed?
>>>>>
>>>>>> Will look at adding our own recovery function for the same, but that will
>>>>>> again mark the pages as poisoned, right ?
>>>>>
>>>>> Well, not poisoned but PG_offlimits or whatever the mm folks agree upon.
>>>>> Semantically, it'll be handled the same way, ofc.
>>>>
>>>> Added a new PG_offlimits flag and a simple corresponding handler for it.
>>>
>>> One thing is, there's not enough page flags to be adding more (except
>>> aliases for existing) for cases that can avoid it, but as Boris says, if
>>> using alias to PG_hwpoison it depends what will become confused with the
>>> actual hwpoison.
>>>
>>>> But there is still added complexity of handling hugepages as part of
>>>> reclamation failures (both HugeTLB and transparent hugepages) and that
>>>> means calling more static functions in mm/memory_failure.c
>>>>
>>>> There is probably a more appropriate handler in mm/memory-failure.c:
>>>>
>>>> soft_offline_page() - this will mark the page as HWPoisoned and also has
>>>> handling for hugepages. And we can avoid adding a new page flag too.
>>>>
>>>> soft_offline_page - Soft offline a page.
>>>> Soft offline a page, by migration or invalidation, without killing
>>>> anything.
>>>>
>>>> So, this looks like a good option to call
>>>> soft_offline_page() instead of memory_failure() in case of
>>>> failure to transition the page back to HV/shared state via SNP_RECLAIM_CMD
>>>> and/or RMPUPDATE instruction.
>>>
>>> So it's a bit unclear to me what exact situation we are handling here. The
>>> original patch here seems to me to be just leaking back pages that are
>>> unsafe for further use. soft_offline_page() seems to fit that scenario of a
>>> graceful leak before something is irrepairably corrupt and we page fault
>>> on it.
>>> But then in the thread you discus PF handling and killing. So what is the
>>> case here? If we detect this need to call snp_leak_pages() does it mean:
>>>
>>> a) nobody that could page fault at them (the guest?) is running anymore, we
>>> are tearing it down, we just can't reuse the pages further on the host
>>
>> The host can page fault on them, if anything on the host tries to write to
>> these pages. Host reads will return garbage data.
>>
>>> - seem like soft_offline_page() could work, but maybe we could just put the
>>> pages on some leaked lists without special page? The only thing that should
>>> matter is not to free the pages to the page allocator so they would be
>>> reused by something else.
>>>
>>> b) something can stil page fault at them (what?) - AFAIU can't be resolved
>>> without killing something, memory_failure() might limit the damage
>>
>> As i mentioned above, host writes will cause RMP violation page fault.
>>
> 
> And to add here, if its a guest private page, then the above fault cannot be
> resolved, so the faulting process is terminated.

BTW would this not be mostly resolved as part of rebasing to UPM?
- host will not have these pages mapped in the first place (both kernel
directmap and qemu userspace)
- guest will have them mapped, but I assume that the conversion from private
to shared (that might fail?) can only happen after guest's mappings are
invalidated in the first place?

> Thanks,
> Ashish
> 
>>
>>>>
>>>>>
>>>>>> Still waiting for some/more feedback from mm folks on the same.
>>>>>
>>>>> Just send the patch and they'll give it.
>>>>>
>>>>> Thx.
>>>>>
>>>