linux-kernel - Re: [RFC PATCH] mm: memory-failure: add soft-offline stat in mf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <83a8698a-fe11-42e2-8a4b-ea236721f93f@oracle.com>
Date: Fri, 6 Dec 2024 16:17:17 -0800
From: jane.chu@...cle.com
To: Miaohe Lin <linmiaohe@...wei.com>,
        "Tomohiro Misono (Fujitsu)" <misono.tomohiro@...itsu.com>,
        'Jiaqi Yan' <jiaqiyan@...gle.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <nao.horiguchi@...il.com>
Subject: Re: [RFC PATCH] mm: memory-failure: add soft-offline stat in mf_stats

>>> And
>>> 1. total = recovered + ignored + failed + delayed
>>> 2. recovered = soft_offline + hard_offline
>> Do you mean mf_stats now have 7 entries in sysfs?
>> (total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline + soft_offline)
>> Or 6 entries ? (in that case, hard_offline = recovered - soft_offline)
>> It might be simpler to understand for user if total is just the sum of other entries like this RFC,
>> but I'd like to know other opinions.
> Will it be better to have below items?
> "
> total
> ignored
> failed
> dalayed
> hard_offline
> soft_offline
> "

The existing "ignored, failed, delayed, recovered" apply to UEs while 
"soft_offline" applies to CE. The difference between UE and CE is that 
even a recovered UE page has PG_hwpoison set, but a soft offlined page 
does not and thus could be re-deployed.

So if we want to flag CE pages, they seem to belong to a different 
category, something like -

/sys/devices/system/node/node0/memory_failure/Uncorrected/{ignored, delayed, failed, recovered}
/sys/devices/system/node/node0/memory_failure/Corrected/{offlined}

Thanks,

-jane

>
> though this will break the previous interface.
> Any thoughts?
>
> Thanks.
> .
>