linux-kernel - Re: [PATCH 1/5] mm: Add support for unaccepted memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f445da8b-c044-3765-65f2-f911dbf6a507@intel.com>
Date:   Tue, 10 Aug 2021 12:46:30 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Andi Kleen <ak@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>
Cc:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Tom Lendacky <thomas.lendacky@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Varad Gautam <varad.gautam@...e.com>,
        Dario Faggioli <dfaggioli@...e.com>, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        linux-kernel@...r.kernel.org,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH 1/5] mm: Add support for unaccepted memory

On 8/10/21 12:23 PM, Andi Kleen wrote:
>> But, not everyone is going to agree with me.
> 
> Both the Intel TDX and the AMD SEV side independently came to opposite
> conclusions. In general people care a lot about boot time of VM guests.

I was also saying that getting to userspace fast is important to me.
Almost everyone agrees there.

>> This also begs the question of how folks know when this "blip" is over.
>>   Do we have a counter for offline pages?  Is there any way to force page
>> acceptance?  Or, are we just stuck allocating a bunch of memory to warm
>> up the system?
>>
>> How do folks who care about these new blips avoid them?
> 
> It's not different than any other warmup. At warmup time you always have
> lots of blips until the working set stabilizes. For example in
> virtualization first touch of a new page is usually an EPT violation
> handled to the host. Or in the native case you may need to do IO or free
> memory. Everybody who based their critical latency percentiles around a
> warming up process would be foolish, the picture would be completely
> distorted.
> 
> So the basic operation is adding some overhead, but I don't think
> anything is that unusual compared to the state of the art.

Except that today, you can totally avoid the allocation latency (not
sure about the EPT violation/fill latency) from things like QEMU's
-mem-prealloc.

> Now perhaps the locking might be a problem if the other operations all
> run in parallel, causing unnecessary serialization If that's really a
> problem I guess we can optimize later. I don't think there's anything
> fundamental about the current locking.

These boot blips are not the biggest issue in the world.  But, it is
fully under the guest's control and I think the guest has some
responsibility to provide *some* mitigation for it.

1. Do background acceptance, as opposed to relying 100% on demand-driven
   acceptance.  Guarantees a limited window in which blips can occur.
2. Do acceptance based on user input, like from sysfs.
3. Add a command-line argument to accept everything up front, or at
   least before userspace runs.
4. Add some statistic for how much unaccepted memory remains.

I can think of at least four ways we could mitigate it.  A sysfs
statistic file would probably take ~30 lines of code to loop over the
bitmap.  A command-line option would probably be <10 lines of code to
just short-circuit the bitmap and accept everything up front.  A file to
force acceptance would probably be pretty quick too.

Nothing there seem too onerous.