[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83a8698a-fe11-42e2-8a4b-ea236721f93f@oracle.com>
Date: Fri, 6 Dec 2024 16:17:17 -0800
From: jane.chu@...cle.com
To: Miaohe Lin <linmiaohe@...wei.com>,
"Tomohiro Misono (Fujitsu)" <misono.tomohiro@...itsu.com>,
'Jiaqi Yan' <jiaqiyan@...gle.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Naoya Horiguchi <nao.horiguchi@...il.com>
Subject: Re: [RFC PATCH] mm: memory-failure: add soft-offline stat in mf_stats
>>> And
>>> 1. total = recovered + ignored + failed + delayed
>>> 2. recovered = soft_offline + hard_offline
>> Do you mean mf_stats now have 7 entries in sysfs?
>> (total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline + soft_offline)
>> Or 6 entries ? (in that case, hard_offline = recovered - soft_offline)
>> It might be simpler to understand for user if total is just the sum of other entries like this RFC,
>> but I'd like to know other opinions.
> Will it be better to have below items?
> "
> total
> ignored
> failed
> dalayed
> hard_offline
> soft_offline
> "
The existing "ignored, failed, delayed, recovered" apply to UEs while
"soft_offline" applies to CE. The difference between UE and CE is that
even a recovered UE page has PG_hwpoison set, but a soft offlined page
does not and thus could be re-deployed.
So if we want to flag CE pages, they seem to belong to a different
category, something like -
/sys/devices/system/node/node0/memory_failure/Uncorrected/{ignored, delayed, failed, recovered}
/sys/devices/system/node/node0/memory_failure/Corrected/{offlined}
Thanks,
-jane
>
> though this will break the previous interface.
> Any thoughts?
>
> Thanks.
> .
>
Powered by blists - more mailing lists