lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 1 Mar 2022 10:53:25 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Miaohe Lin <linmiaohe@...wei.com>, akpm@...ux-foundation.org,
        naoya.horiguchi@....com, osalvador@...e.de
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] mm/memory-failure.c: fix memory failure race with
 memory offline

On 26.02.22 10:40, Miaohe Lin wrote:
> There is a theoretical race window between memory failure and memory
> offline. Think about the below scene:
> 
>   CPU A					  CPU B
> memory_failure				offline_pages
>   mutex_lock(&mf_mutex);
>   TestSetPageHWPoison(p)
> 					  start_isolate_page_range
> 					    has_unmovable_pages
> 					      --PageHWPoison is movable
> 					  do {
> 					    scan_movable_pages
> 					    do_migrate_range
> 					      --PageHWPoison isn't migrated
> 					  }
> 					  test_pages_isolated
> 					    --PageHWPoison is isolated
> 					remove_memory
>   access page... bang
>   ...

I think the motivation for the offlining code was to not block memory
hotunplug (especially on ZONE_MOVABLE) just because there is a
HWpoisoned page. But how often does that happen?

It's all semi-broken either way. Assume you just offlined a memory block
with a hwpoisoned page. The memmap is stale and the information about
hwpoison is lost. You can happily re-online that memory block and use
*all* memory, including previously hwpoisoned memory. Note that this
used to be different in the past, when the memmap was initialized when
adding memory, not when onlining that memory.


IMHO, we should stop special casing hwpoison. Either fail offlining
completely if we stumble over a hwpoisoned page, or allow offlining only
if the refcount==0 -- just as any other page.


-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ