lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Mar 2023 11:58:12 +0100
From:   Tomas Mudrunka <tomas.mudrunka@...il.com>
To:     akpm@...ux-foundation.org
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, rppt@...nel.org, linux-doc@...r.kernel.org,
        corbet@....net, tomas.mudrunka@...il.com
Subject: Re: Re: [PATCH] Add results of early memtest to /proc/meminfo

> meminfo is rather top-level and important.  Is this data sufficiently
> important to justify a place there?

There is already "HardwareCorrupted" which is similar, but for
errors reported by ECC, so it makes sense to have both in single place.

> Please describe the value.  The use-case(s).  Why would people want
> this?

When running large fleet of devices without ECC RAM it's currently not
easy to do bulk monitoring for memory corruption. You have to parse dmesg,
but that's ring buffer so the error might disappear after some time.
In general i do not consider dmesg to be great API to query RAM status.

In several companies i've seen such errors remain undetected and cause
issues for way too long. So i think it makes sense to provide monitoring
API, so that we can safely detect and act upon them.

> Comment layout is unconventional.

Fixed in PATCH v2

> Coding style is unconventional (white spaces).
> I expect this code would look much cleaner if some temporaries were used.

Fixed in PATCH v2

> I don't understand this logic anyway.  Why not just print the value of
> early_memtest_bad_size>>10 and be done with it.

I think 0 should be reported only when there was no error found at all.
If memtest detect 256 corrupt bytes it would report 0 kB without such logic.
Rounding down to 0 like that is not a good idea in my opinion,
because it will hide the fact something is wrong with the RAM in such case.
Therefore i've added logic that prevents rounding down to 0.

> The name implies a bool, but the comment says otherwise.

Fixed in PATCH v2

> It's a counter, but it's used as a boolean.  Why not make it bool, and do

Fixed in PATCH v2

> Also, your email client is replacing tabs with spaces.

Fixed in PATCH v2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ