lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAChUvXPgntzMbAhcg8y2HJyyZWwaqNPe66efmpfyHLgN1JSW_w@mail.gmail.com>
Date:   Wed, 5 Dec 2018 10:37:52 -0600
From:   Tracy Smith <tlsmith3777@...il.com>
To:     bp@...en8.de
Cc:     york.sun@....com, linux-edac@...r.kernel.org,
        util-linux@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>
Subject: Re: edac driver injection of uncorrected errors & utils

This was very helpful. Tracing through the code, it doesn't do a panic
before Linux crashes from multi-bit errors because as York has
indicated, this type of memory controller doesn't limit the number of
errors.

I do have a general question about single bit errors.  The EDAC driver
corrects single bit errors by doing a scrub, is this correct?  The
edac code does not do periodic scrubs, but I see scrubs when a
correctable error is found (edac_mc_scrub_block and edac_atomic_scrub
in edac_mc.c)?

This is more directed toward York for layerscape. I see some edac code
that seem to do periodic scrubs based on intervals or scrub rate, but
that is not needed for the layerscape driver to correct errors because
errors are scrubbed when found by the edac scrub block or is it
because the memory controller itself does the correction/scrubbing
when an error is found?

thx,
Tracy



On Wed, Nov 28, 2018 at 5:44 PM Borislav Petkov <bp@...en8.de> wrote:
>
> On Wed, Nov 28, 2018 at 04:14:24PM -0600, Tracy Smith wrote:
> > Is there another way of creating an uncorrected error without crashing
> > Linux using the layerscape driver? I would like to see a UE error
> > collected without a Linux crash scenario because I need to validate
> > UEs are being collected.
>
> It depends on whether the hardware is causing the crash on uncorrectable
> error to prevent data corruption or the error handler is calling panic()
> or somesuch. If it is the former, then you need to disable that feature
> - if at all possible (no clue what that platform does).
>
> If it is the latter, you can comment out the panic() for testing
> purposes only and inject then. For an example what x86 does, see
> "tolerant" here:
>
> Documentation/x86/x86_64/machinecheck
>
> HTH.
>
> --
> Regards/Gruss,
>     Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.



-- 
Confidentiality notice: This e-mail message, including any
attachments, may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), please
immediately notify the sender and delete this e-mail message.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ