[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAChUvXPgntzMbAhcg8y2HJyyZWwaqNPe66efmpfyHLgN1JSW_w@mail.gmail.com>
Date: Wed, 5 Dec 2018 10:37:52 -0600
From: Tracy Smith <tlsmith3777@...il.com>
To: bp@...en8.de
Cc: york.sun@....com, linux-edac@...r.kernel.org,
util-linux@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>
Subject: Re: edac driver injection of uncorrected errors & utils
This was very helpful. Tracing through the code, it doesn't do a panic
before Linux crashes from multi-bit errors because as York has
indicated, this type of memory controller doesn't limit the number of
errors.
I do have a general question about single bit errors. The EDAC driver
corrects single bit errors by doing a scrub, is this correct? The
edac code does not do periodic scrubs, but I see scrubs when a
correctable error is found (edac_mc_scrub_block and edac_atomic_scrub
in edac_mc.c)?
This is more directed toward York for layerscape. I see some edac code
that seem to do periodic scrubs based on intervals or scrub rate, but
that is not needed for the layerscape driver to correct errors because
errors are scrubbed when found by the edac scrub block or is it
because the memory controller itself does the correction/scrubbing
when an error is found?
thx,
Tracy
On Wed, Nov 28, 2018 at 5:44 PM Borislav Petkov <bp@...en8.de> wrote:
>
> On Wed, Nov 28, 2018 at 04:14:24PM -0600, Tracy Smith wrote:
> > Is there another way of creating an uncorrected error without crashing
> > Linux using the layerscape driver? I would like to see a UE error
> > collected without a Linux crash scenario because I need to validate
> > UEs are being collected.
>
> It depends on whether the hardware is causing the crash on uncorrectable
> error to prevent data corruption or the error handler is calling panic()
> or somesuch. If it is the former, then you need to disable that feature
> - if at all possible (no clue what that platform does).
>
> If it is the latter, you can comment out the panic() for testing
> purposes only and inject then. For an example what x86 does, see
> "tolerant" here:
>
> Documentation/x86/x86_64/machinecheck
>
> HTH.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
--
Confidentiality notice: This e-mail message, including any
attachments, may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), please
immediately notify the sender and delete this e-mail message.
Powered by blists - more mailing lists