linux-kernel - Re: [PATCH v2] mm/memory_hotplug: drain per-cpu pages again during memory offline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <C6941572-4380-4E07-A622-1BB63AE30622@redhat.com>
Date:   Thu, 3 Sep 2020 21:35:35 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     David Hildenbrand <david@...hat.com>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        linux-kernel@...r.kernel.org, mhocko@...e.com, linux-mm@...ck.org,
        osalvador@...e.de, richard.weiyang@...il.com, vbabka@...e.cz,
        rientjes@...gle.com
Subject: Re: [PATCH v2] mm/memory_hotplug: drain per-cpu pages again during memory offline



> Am 03.09.2020 um 21:31 schrieb Andrew Morton <akpm@...ux-foundation.org>:
> 
> On Thu, 3 Sep 2020 19:36:26 +0200 David Hildenbrand <david@...hat.com> wrote:
> 
>> (still on vacation, back next week on Tuesday)
>> 
>> I didn't look into discussions in v1, but to me this looks like we are
>> trying to hide an actual bug by implementing hacks in the caller
>> (repeated calls to drain_all_pages()). What about alloc_contig_range()
>> users - you get more allocation errors just because PCP code doesn't
>> play along.
>> 
>> There *is* strong synchronization with the page allocator - however,
>> there seems to be one corner case race where we allow to allocate pages
>> from isolated pageblocks.
>> 
>> I want that fixed instead if possible, otherwise this is just an ugly
>> hack to make the obvious symptoms (offlining looping forever) disappear.
>> 
>> If that is not possible easily, I'd much rather want to see all
>> drain_all_pages() calls being moved to the caller and have the expected
>> behavior documented instead of specifying "there is no strong
>> synchronization with the page allocator" - which is wrong in all but PCP
>> cases (and there only in one possible race?).
>> 
> 
> It's a two-line hack which fixes a bug in -stable kernels, so I'm
> inclined to proceed with it anyway.  We can undo it later on as part of
> a better fix, OK?

Agreed as a stable fix, but I really want to see a proper fix (e.g., disabling PCP while having isolated pageblocks) on top.

> 
> Unless you think there's some new misbehaviour which we might see as a
> result of this approach?
> 

We basically disable PCP by keeping to flush it. But performance shouldn‘t matter.