lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <TYCPR01MB96175844C1C9DD89BC16675EE53D2@TYCPR01MB9617.jpnprd01.prod.outlook.com>
Date: Tue, 10 Dec 2024 08:46:20 +0000
From: "Tomohiro Misono (Fujitsu)" <misono.tomohiro@...itsu.com>
To: "'jane.chu@...cle.com'" <jane.chu@...cle.com>, 'Miaohe Lin'
	<linmiaohe@...wei.com>, 'Jiaqi Yan' <jiaqiyan@...gle.com>
CC: "'linux-mm@...ck.org'" <linux-mm@...ck.org>,
	"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>, 'Andrew
 Morton' <akpm@...ux-foundation.org>, 'Naoya Horiguchi'
	<nao.horiguchi@...il.com>
Subject: RE: [RFC PATCH] mm: memory-failure: add soft-offline stat in mf_stats

> >>> And
> >>> 1. total = recovered + ignored + failed + delayed
> >>> 2. recovered = soft_offline + hard_offline
> >> Do you mean mf_stats now have 7 entries in sysfs?
> >> (total, ignored, failed, delayed, recovered, hard_offline, soft_offline, then recovered = hard_offline +
> soft_offline)
> >> Or 6 entries ? (in that case, hard_offline = recovered - soft_offline)
> >> It might be simpler to understand for user if total is just the sum of other entries like this RFC,
> >> but I'd like to know other opinions.
> > Will it be better to have below items?
> > "
> > total
> > ignored
> > failed
> > dalayed
> > hard_offline
> > soft_offline
> > "
> 
> The existing "ignored, failed, delayed, recovered" apply to UEs while
> "soft_offline" applies to CE. The difference between UE and CE is that
> even a recovered UE page has PG_hwpoison set, but a soft offlined page
> does not and thus could be re-deployed.

Hi, thanks for your comments.

If I understand correctly, PG_hwpoison is also set in soft offlined page (and thus
counted in HardwareCorrupted too):
  https://github.com/torvalds/linux/blob/v6.13-rc2/mm/memory-failure.c#L206
Also, unpoison works but can only be used via debugfs by hwpoison-inject module.
Is this correct?

> 
> So if we want to flag CE pages, they seem to belong to a different
> category, something like -
> 
> /sys/devices/system/node/node0/memory_failure/Uncorrected/{ignored, delayed, failed, recovered}
> /sys/devices/system/node/node0/memory_failure/Corrected/{offlined}

This makes sense. But as I stated in other thread, I don't think we can change the
current I/F for "Uncorrected". Is it worth to create "Corrected" dir only?

Regards
Tomohiro Misono

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ