lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 5 Nov 2021 14:02:55 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Naoya Horiguchi <naoya.horiguchi@...ux.dev>
Cc:     Naoya Horiguchi <naoya.horiguchi@....com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Ding Hui <dinghui@...gfor.com.cn>,
        Tony Luck <tony.luck@...el.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>, Peter Xu <peterx@...hat.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/3] mm/hwpoison: fix unpoison_memory()

On 05.11.21 12:49, Naoya Horiguchi wrote:
> On Fri, Nov 05, 2021 at 11:58:15AM +0100, David Hildenbrand wrote:
>> On 05.11.21 06:50, Naoya Horiguchi wrote:
>>> Hi,
>>>
>>> I updated the unpoison patchset based ou discussions over v2.
>>> Please see individual patches for details of updates.
>>>
>>> ----- (cover letter copied from v2) -----
>>> Main purpose of this series is to sync unpoison code to recent changes
>>> around how hwpoison code takes page refcount.  Unpoison should work or
>>> simply fail (without crash) if impossible.
>>>
>>> The recent works of keeping hwpoison pages in shmem pagecache introduce
>>> a new state of hwpoisoned pages, but unpoison for such pages is not
>>> supported yet with this series.
>>>
>>> It seems that soft-offline and unpoison can be used as general purpose
>>> page offline/online mechanism (not in the context of memory error).
>>
>> I'm not sure what the target use case would be TBH ... for proper memory
>> offlining/memory hotunplug we have to offline whole memory blocks. For
>> memory ballooning based mechanisms we simply allocate random free pages
>> and eventually trigger reclaim to make more random free pages available.
>> For memory hotunplug via virtio-mem we're using alloc_contig_range() to
>> allocate ranges of interest we logically unplug.
> 
> I heard about it from two people independently and I think that that's maybe
> a rough idea, so if no one shows the clear use case or someone logically
> shows that we don't need it, I do not head for it.

I'd love to learn about use cases!

> 
>>
>> The only benefit compared to alloc_contig_range() might be that we can
>> offline smaller chunks -- alloc_contig_range() isn't optimized for
>> sub-MAX_ORDER granularity yet. But then, alloc_contig_range() should
>> much rather be extended.
> 
> If alloc_contig_range() supports memory offline in arbitrary size of
> granurality (including a single page), maybe soft offline can be (partially
> I guess) unified to it.

Conceptually, memory offlining, alloc_contig_range(), soft-offlining all
perform roughly the same thing just with different flavors: evacuate a
given PFN area and decide what shall happen with it.

After memory offlining, memory cannot get reused (e.g., allocated via
the buddy) before re-onlining that memory. It's some kind of "fake
allocation" + advanced magic to actually remove the memory from the system.

alloc_contig_range() is just a "real" allocation, and can be used (e.g.,
by virtio-mem) for fake offlining by some additional magic on top.

soft-offlining is just another special type of "special allocation",
however, focused on individual page.


The biggest difference between soft-offlining and
alloc_contig_range()+memory offlining is right now the granularity.
While alloc_contig_range() can be used to allocate individual pages on
ZONE_MOVABLE and MIGRATE_CMA, it cannot deal with other zones with such
small granularity yet -- too many false negatives, meaning an allocation
might fail although the single page actually could get allocated.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ