lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6188f3ad-721a-cd00-2398-5ce3b0cd12d8@amd.com>
Date:   Mon, 26 Sep 2022 10:42:24 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:     Dionna Amalie Glaze <dionnaglaze@...gle.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Marcelo Cerri <marcelo.cerri@...onical.com>,
        tim.gardner@...onical.com,
        Khalid ElMously <khalid.elmously@...onical.com>,
        philip.cox@...onical.com,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-coco@...ts.linux.dev, linux-efi <linux-efi@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mike Rapoport <rppt@...ux.ibm.com>
Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory

On 9/26/22 08:38, Tom Lendacky wrote:
> On 9/26/22 07:10, Kirill A. Shutemov wrote:
>> On Sat, Sep 24, 2022 at 04:03:02AM +0300, Kirill A. Shutemov wrote:
>>> On Thu, Sep 22, 2022 at 09:31:12AM -0500, Tom Lendacky wrote:
>>>> On 9/8/22 14:28, Mike Rapoport wrote:
>>>>> On Thu, Sep 08, 2022 at 09:23:07AM -0700, Dionna Amalie Glaze wrote:
>>>>>>>
>>>>>>> Looks like the first access to the memory map fails, although I think
>>>>>>> it's not in INIT_LIST_HEAD() but rather in init_page_count().
>>>>>>>
>>>>>>> I'd start with making sure that page_alloc::memmap_alloc() actually 
>>>>>>> returns
>>>>>>> accepted memory. If you build kernel with CONFIG_DEBUG_VM=y the 
>>>>>>> memory map
>>>>>>> will poisoned in this function, so my guess is it'd crash there.
>>>>>>>
>>>>>>
>>>>>> That's a wonderful hint, thank you! I did not run this test
>>>>>> CONFIG_DEBUG_VM set, but you think it's possible it could still be
>>>>>> here?
>>>>>
>>>>> It depends on how you configured your kernel. Say, defconfig does not 
>>>>> set
>>>>> it.
>>>>>
>>>>
>>>> I also hit the issue at 256GB. My config is using 
>>>> CONFIG_SPARSEMEM_VMEMMAP
>>>> and fails in memmap_init_range() when attempting to add the first PFN. It
>>>> looks like the underlying page that is backing the vmemmap has not been
>>>> accepted (I receive a #VC 0x404 => page not validated).
>>>>
>>>> Kirill, is this a path that you've looked at? It would appear that 
>>>> somewhere
>>>> in the vmemmap_populate_hugepages() path, some memory acceptance needs 
>>>> to be
>>>> done for the pages that are used to back vmemmap. I'm not very 
>>>> familiar with
>>>> this code, so I'm not sure why everything works for a guest with 255GB of
>>>> memory, but then fails for a guest with 256GB of memory.
>>>
>>> Hm. I don't have machine that large at hands at the moment. And I have not
>>> looked at the codepath before.
>>>
>>> I will try to look into the issue.
>>
>> I'm not able to trigger the bug.
>>
>> With help of vm.overcommit_memory=1, I was managed boot TDX guest to shell
>> with 256G and 1T of guest memory just fine.
>>
>> Any chance it is SEV-SNP specific?
> 
> There's always a chance. I'll do some more tracing and see what I can find 
> to try and be certain.

Indeed, it was an issue in the SNP path, shifting the number of pages as 
an unsigned int by PAGE_SIZE and adding to an unsigned long. The value to 
be shifted was 0x100000 and resulted in 0. Changing the number of pages to 
an unsigned long fixed it.

There are a few places where that is being done, so I'll address those 
with some pre-patches as fixes.

Thanks for verifying it was working for you.

Tom

> 
>>
>> Or maybe there some difference in kernel config? Could you share yours?
> 
> Yes, I'll send that to you off-list.
> 
> Thanks,
> Tom
> 
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ