[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <711e6027-1b4f-4aed-47a6-305396d05893@amd.com>
Date: Wed, 16 Nov 2022 12:01:11 -0600
From: "Kalra, Ashish" <ashish.kalra@....com>
To: Vlastimil Babka <vbabka@...e.cz>, Borislav Petkov <bp@...en8.de>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-coco@...ts.linux.dev, linux-mm@...ck.org,
linux-crypto@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com,
jroedel@...e.de, thomas.lendacky@....com, hpa@...or.com,
ardb@...nel.org, pbonzini@...hat.com, seanjc@...gle.com,
vkuznets@...hat.com, jmattson@...gle.com, luto@...nel.org,
dave.hansen@...ux.intel.com, slp@...hat.com, pgonda@...gle.com,
peterz@...radead.org, srinivas.pandruvada@...ux.intel.com,
rientjes@...gle.com, dovmurik@...ux.ibm.com, tobin@....com,
michael.roth@....com, kirill@...temov.name, ak@...ux.intel.com,
tony.luck@...el.com, marcorr@...gle.com,
sathyanarayanan.kuppuswamy@...ux.intel.com, alpergun@...gle.com,
dgilbert@...hat.com, jarkko@...nel.org,
"Kaplan, David" <David.Kaplan@....com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Miaohe Lin <linmiaohe@...wei.com>,
Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR
allocation when SNP is enabled
On 11/16/2022 4:25 AM, Vlastimil Babka wrote:
> On 11/16/22 11:19, Kalra, Ashish wrote:
>> On 11/16/2022 3:08 AM, Vlastimil Babka wrote:
>>> On 11/15/22 19:15, Kalra, Ashish wrote:
>>>>
>>>> On 11/15/2022 11:24 AM, Kalra, Ashish wrote:
>>>>> Hello Vlastimil,
>>>>>
>>>>> On 11/15/2022 9:14 AM, Vlastimil Babka wrote:
>>>>>> Cc'ing memory failure folks, the beinning of this subthread is here:
>>>>>>
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&data=05%7C01%7Cashish.kalra%40amd.com%7C174b7caaf99a473194cd08dac7bcebf3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041911481429347%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CFkAXNQqangvCqhnwDyIUJUkfiUrX67OpKDTtLGj6PU%3D&reserved=0
>>>>>>
>>>>>> On 11/15/22 00:36, Kalra, Ashish wrote:
>>>>>>> Hello Boris,
>>>>>>>
>>>>>>> On 11/2/2022 6:22 AM, Borislav Petkov wrote:
>>>>>>>> On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote:
>>>>>>>>> if (snp_lookup_rmpentry(pfn, &rmp_level)) {
>>>>>>>>> do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
>>>>>>>>> return RMP_PF_RETRY;
>>>>>>>>
>>>>>>>> Does this issue some halfway understandable error message why the
>>>>>>>> process got killed?
>>>>>>>>
>>>>>>>>> Will look at adding our own recovery function for the same, but that
>>>>>>>>> will
>>>>>>>>> again mark the pages as poisoned, right ?
>>>>>>>>
>>>>>>>> Well, not poisoned but PG_offlimits or whatever the mm folks agree upon.
>>>>>>>> Semantically, it'll be handled the same way, ofc.
>>>>>>>
>>>>>>> Added a new PG_offlimits flag and a simple corresponding handler for it.
>>>>>>
>>>>>> One thing is, there's not enough page flags to be adding more (except
>>>>>> aliases for existing) for cases that can avoid it, but as Boris says, if
>>>>>> using alias to PG_hwpoison it depends what will become confused with the
>>>>>> actual hwpoison.
>>>>>>
>>>>>>> But there is still added complexity of handling hugepages as part of
>>>>>>> reclamation failures (both HugeTLB and transparent hugepages) and that
>>>>>>> means calling more static functions in mm/memory_failure.c
>>>>>>>
>>>>>>> There is probably a more appropriate handler in mm/memory-failure.c:
>>>>>>>
>>>>>>> soft_offline_page() - this will mark the page as HWPoisoned and also has
>>>>>>> handling for hugepages. And we can avoid adding a new page flag too.
>>>>>>>
>>>>>>> soft_offline_page - Soft offline a page.
>>>>>>> Soft offline a page, by migration or invalidation, without killing
>>>>>>> anything.
>>>>>>>
>>>>>>> So, this looks like a good option to call
>>>>>>> soft_offline_page() instead of memory_failure() in case of
>>>>>>> failure to transition the page back to HV/shared state via
>>>>>>> SNP_RECLAIM_CMD
>>>>>>> and/or RMPUPDATE instruction.
>>>>>>
>>>>>> So it's a bit unclear to me what exact situation we are handling here. The
>>>>>> original patch here seems to me to be just leaking back pages that are
>>>>>> unsafe for further use. soft_offline_page() seems to fit that scenario
>>>>>> of a
>>>>>> graceful leak before something is irrepairably corrupt and we page fault
>>>>>> on it.
>>>>>> But then in the thread you discus PF handling and killing. So what is the
>>>>>> case here? If we detect this need to call snp_leak_pages() does it mean:
>>>>>>
>>>>>> a) nobody that could page fault at them (the guest?) is running
>>>>>> anymore, we
>>>>>> are tearing it down, we just can't reuse the pages further on the host
>>>>>
>>>>> The host can page fault on them, if anything on the host tries to write to
>>>>> these pages. Host reads will return garbage data.
>>>>>
>>>>>> - seem like soft_offline_page() could work, but maybe we could just put
>>>>>> the
>>>>>> pages on some leaked lists without special page? The only thing that
>>>>>> should
>>>>>> matter is not to free the pages to the page allocator so they would be
>>>>>> reused by something else.
>>>>>>
>>>>>> b) something can stil page fault at them (what?) - AFAIU can't be resolved
>>>>>> without killing something, memory_failure() might limit the damage
>>>>>
>>>>> As i mentioned above, host writes will cause RMP violation page fault.
>>>>>
>>>>
>>>> And to add here, if its a guest private page, then the above fault cannot be
>>>> resolved, so the faulting process is terminated.
>>>
>>> BTW would this not be mostly resolved as part of rebasing to UPM?
>>> - host will not have these pages mapped in the first place (both kernel
>>> directmap and qemu userspace)
>>> - guest will have them mapped, but I assume that the conversion from private
>>> to shared (that might fail?) can only happen after guest's mappings are
>>> invalidated in the first place?
>>>
>>
>> Yes, that will be true for guest private pages. But then there are host
>> allocated pages for firmware use which will remain in firmware page state or
>> reclaim state if they can't be transitioned back to HV/shared state once the
>> firmware releases them back to the host and accessing them at this point can
>> potentially cause RMP violation #PF.
>>
>> Again i don't think this is going to happen regularly or frequently so it
>> will be a rare error case where the page reclamation, i.e., the transition
>> back to HV/shared state fails and now these pages are no longer safe to be
>> used.
>>
>> Referring back to your thoughts about putting these pages on some leaked
>> pages list, do any such leaked pages list exist currently ?
>
> Not AFAIK, you could just create a list_head somewhere appropriate (some snp
> state structure?) and put the pages there, maybe with a counter exposed in
> debugs. The point would be mostly that if something goes so wrong it would
> be leaking substantial amounts of memory, we can at least recognize the
> cause (but I suppose the dmesg will be also full of messages) and e.g. find
> the pages in a crash dump.
>
Ok, so i will work on implementing this leaked pages list and put it on
a sev/snp associated structure.
Also to add here, we will actually get a not-present #PF instead of the
RMP violation #PF on writing to these leaked pages, as these pages would
have been removed from the kernel direct map.
Thanks,
Ashish
>>
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Still waiting for some/more feedback from mm folks on the same.
>>>>>>>>
>>>>>>>> Just send the patch and they'll give it.
>>>>>>>>
>>>>>>>> Thx.
>>>>>>>>
>>>>>>
>>>
>
Powered by blists - more mailing lists