[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <542473d1-b687-55b8-24d1-96af715aed56@huawei.com>
Date: Sat, 24 Sep 2022 20:27:35 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: Naoya Horiguchi <naoya.horiguchi@...ux.dev>, <linux-mm@...ck.org>
CC: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Yang Shi <shy828301@...il.com>,
Oscar Salvador <osalvador@...e.de>,
Muchun Song <songmuchun@...edance.com>,
Jane Chu <jane.chu@...cle.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison
counter counter
On 2022/9/23 22:12, Naoya Horiguchi wrote:
> There seems another build error in aarch64 with MEMORY_HOTPLUG disabled.
> https://lore.kernel.org/lkml/20220923110144.GA1413812@ik1-406-35019.vs.sakura.ne.jp/
> , so let me revise this patch again to handle it.
>
> - Naoya Horiguchi
>
> ---
> From: Naoya Horiguchi <naoya.horiguchi@....com>
> Date: Fri, 23 Sep 2022 22:51:20 +0900
> Subject: [PATCH v5 4/4] mm/hwpoison: introduce per-memory_block hwpoison counter
>
> Currently PageHWPoison flag does not behave well when experiencing memory
> hotremove/hotplug. Any data field in struct page is unreliable when the
> associated memory is offlined, and the current mechanism can't tell whether
> a memory section is onlined because a new memory devices is installed or
> because previous failed offline operations are undone. Especially if
> there's a hwpoisoned memory, it's unclear what the best option is.
>
> So introduce a new mechanism to make struct memory_block remember that
> a memory block has hwpoisoned memory inside it. And make any online event
> fail if the onlined memory block contains hwpoison. struct memory_block
> is freed and reallocated over ACPI-based hotremove/hotplug, but not over
> sysfs-based hotremove/hotplug. So it's desirable to implement hwpoison
> counter on this struct.
>
> Note that clear_hwpoisoned_pages() is relocated to be called earlier than
> now, just before unregistering struct memory_block. Otherwise, the
> per-memory_block hwpoison counter is freed and we fail to adjust global
> hwpoison counter properly.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@....com>
> Reported-by: kernel test robot <lkp@...el.com>
LGTM with some nits below. Thanks.
Reviewed-by: Miaohe Lin <linmiaohe@...wei.com>
> ---
> ChangeLog v4 -> v5:
> - add Reported-by of lkp bot,
> - check both CONFIG_MEMORY_FAILURE and CONFIG_MEMORY_HOTPLUG in introduced #ifdefs,
> intending to fix "undefined reference" errors in aarch64.
>
> ChangeLog v3 -> v4:
> - fix build error (https://lore.kernel.org/linux-mm/202209231134.tnhKHRfg-lkp@intel.com/)
> by using memblk_nr_poison() to access to the member ->nr_hwpoison
> ---
> drivers/base/memory.c | 34 ++++++++++++++++++++++++++++++++++
> include/linux/memory.h | 3 +++
> include/linux/mm.h | 24 ++++++++++++++++++++++++
> mm/internal.h | 8 --------
> mm/memory-failure.c | 31 ++++++++++---------------------
> mm/sparse.c | 2 --
> 6 files changed, 71 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 9aa0da991cfb..99e0e789616c 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -183,6 +183,9 @@ static int memory_block_online(struct memory_block *mem)
> struct zone *zone;
> int ret;
>
> + if (memblk_nr_poison(start_pfn))
> + return -EHWPOISON;
> +
> zone = zone_for_pfn_range(mem->online_type, mem->nid, mem->group,
> start_pfn, nr_pages);
>
> @@ -864,6 +867,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
> mem = find_memory_block_by_id(block_id);
> if (WARN_ON_ONCE(!mem))
> continue;
> + clear_hwpoisoned_pages(memblk_nr_poison(start));
clear_hwpoisoned_pages seems not a proper name now? PageHWPoison info is kept now. But this should be trivial.
> unregister_memory_block_under_nodes(mem);
> remove_memory_block(mem);
> }
> @@ -1164,3 +1168,33 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> }
> return ret;
> }
> +
> +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG)
> +void memblk_nr_poison_inc(unsigned long pfn)
> +{
> + const unsigned long block_id = pfn_to_block_id(pfn);
> + struct memory_block *mem = find_memory_block_by_id(block_id);
> +
> + if (mem)
> + atomic_long_inc(&mem->nr_hwpoison);
> +}
> +
> +void memblk_nr_poison_sub(unsigned long pfn, long i)
> +{
> + const unsigned long block_id = pfn_to_block_id(pfn);
> + struct memory_block *mem = find_memory_block_by_id(block_id);
> +
> + if (mem)
> + atomic_long_sub(i, &mem->nr_hwpoison);
> +}
> +
> +unsigned long memblk_nr_poison(unsigned long pfn)
memblk_nr_poison() is only used inside this file. Make it static?
Thanks,
Miaohe Lin
Powered by blists - more mailing lists