lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 30 Oct 2015 18:51:15 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	Mark Rutland <mark.rutland@....com>,
	Brijesh Singh <brijeshkumar.singh@....com>
Cc:	linux-arm-kernel@...ts.infradead.org, robh+dt@...nel.org,
	pawel.moll@....com, ijc+devicetree@...lion.org.uk,
	galak@...eaurora.org, dougthompson@...ssion.com,
	mchehab@....samsung.com, devicetree@...r.kernel.org,
	guohanjun@...wei.com, andre.przywara@....com, arnd@...db.de,
	sboyd@...eaurora.org, linux-kernel@...r.kernel.org,
	linux-edac@...r.kernel.org
Subject: Re: [PATCH v4] EDAC: Add ARM64 EDAC

On Fri, Oct 30, 2015 at 05:06:06PM +0000, Mark Rutland wrote:
> > * Correctable errors does not generate any interrupt:
> >   If we have to implement error parsing inside the firmware then work need
> >   to be split between OS and firmware. Maybe OS can call SMC instruction to 
> >   dial into firmware and then firmware can check error syndrome registers; 
> >   if it finds correctable error then build HEST table. This method will introduce
> >   performance issue because it require OS executing SMC every 100ms or so to just
> >   poll for correctable error. If you have any other recommendation then please share it.
> 
> I agree that this is a problem, and is an unfortunate hardware
> limitation.
> 
> I am still wary of making use of IMPLEMENTATION DEFINED features like
> this in the kernel.

Well, you could do all the correctable errors collecting in the firmware
and only report those errors to the OS when they're overflowing/reach a
certain threshold.

The idea behind it being that you don't really want to upset the user
about *every* correctable error happening because it was correctable and
the hardware, well, doh, corrected it. No problem.

But when those errors start repeating and hitting the same DIMM and
addresses in close proximity, there might be a problem which you should
report.

Btw, we have been looking for doing something like that on x86:

https://lkml.kernel.org/r/1404242623-10094-1-git-send-email-bp@alien8.de

and one of those days I'll upstream the damn thing!

:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ