linux-kernel - Re: [PATCH 1/5] mm: Add support for unaccepted memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17b6a3a3-bd7d-f57e-8762-96258b16247a@intel.com>
Date:   Tue, 10 Aug 2021 11:56:12 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Andi Kleen <ak@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>
Cc:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Tom Lendacky <thomas.lendacky@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Varad Gautam <varad.gautam@...e.com>,
        Dario Faggioli <dfaggioli@...e.com>, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        linux-kernel@...r.kernel.org,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH 1/5] mm: Add support for unaccepted memory

On 8/10/21 11:30 AM, Andi Kleen wrote:
>> So, this is right in the fast path of the page allocator.  It's a
>> one-time thing per 2M page, so it's not permanent.
>>
>> *But* there's both a global spinlock and a firmware call hidden in
>> clear_page_offline().  That's *GOT* to hurt if you were, for instance,
>> running a benchmark while this code path is being tickled.  Not just to
>>
>> That could be just downright catastrophic for scalability, albeit
>> temporarily
> 
> This would be only a short blib at initialization until the system
> reaches steady state. So yes it would be temporary, but very short at that.

But it can't be *that* short or we wouldn't be going to all this trouble
in the first place.  This can't simultaneously be both bad enough that
this series exists, but minor enough that nobody will notice or care at
runtime.

In general, I'd rather have a system which is running userspace, slowly,
than one where I'm waiting for the kernel.  The trade-off being made is
a *good* trade-off for me.  But, not everyone is going to agree with me.

This also begs the question of how folks know when this "blip" is over.
 Do we have a counter for offline pages?  Is there any way to force page
acceptance?  Or, are we just stuck allocating a bunch of memory to warm
up the system?

How do folks who care about these new blips avoid them?

Again, I don't particularly care about how this affects the
benchmarkers.  But, I do care that they're going to hound us when these
blips start impacting their 99th percentile tail latencies.