lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ae5e7a3464f9d8e16b112cd371957ea20472864.camel@kernel.crashing.org>
Date:   Tue, 11 Jun 2019 15:50:40 +1000
From:   Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     James Morse <james.morse@....com>,
        "Hawa, Hanna" <hhhawa@...zon.com>,
        "robh+dt@...nel.org" <robh+dt@...nel.org>,
        "Woodhouse, David" <dwmw@...zon.co.uk>,
        "paulmck@...ux.ibm.com" <paulmck@...ux.ibm.com>,
        "mchehab@...nel.org" <mchehab@...nel.org>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "nicolas.ferre@...rochip.com" <nicolas.ferre@...rochip.com>,
        "devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
        "Shenhar, Talel" <talel@...zon.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Chocron, Jonathan" <jonnyc@...zon.com>,
        "Krupnik, Ronen" <ronenk@...zon.com>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "Hanoch, Uri" <hanochu@...zon.com>
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Sat, 2019-06-08 at 11:05 +0200, Borislav Petkov wrote:
> On Sat, Jun 08, 2019 at 10:16:11AM +1000, Benjamin Herrenschmidt wrote:
> > Those IP blocks don't need any SW coordination at runtime. The drivers
> > don't share data nor communicate with each other. There is absolultely
> > no reason to go down that path.
> 
> Let me set one thing straight: the EDAC "subsystem" if you will - or
> that pile of code which does error counting and reporting - has its
> limitations in supporting one EDAC driver per platform. And whenever we
> have two drivers loadable on a platform, we have to do dirty hacks like
> 
>   301375e76432 ("EDAC: Add owner check to the x86 platform drivers")
> 
> What that means is, that if you need to call EDAC logging routines or
> whatnot from two different drivers, there's no locking, no nothing. So
> it might work or it might set your cat on fire.

Should we fix that then instead ? What are the big issues with adding
some basic locking ? being called from NMIs ?

If the separate drivers operate on distinct counters I don't see a big
problem there.

> IOW, having multiple separate "drivers" or representations of RAS
> functionality using EDAC facilities is something that hasn't been
> done. Well, almost. highbank_mc_edac.c and highbank_l2_edac.c is one
> example but they make sure they don't step on each other's toes by using
> different EDAC pieces - a device vs a memory controller abstraction.

That sounds like a reasonable requirement.

> And now the moment all of a sudden you decide you want for those
> separate "drivers" to synchronize on something, you need to do something
> hacky like the amd_register_ecc_decoder() thing, for example, because we
> need to call into the EDAC memory controller driver to decode a DRAM ECC
> error properly, while the rest of the error types get decoded somewhere
> else...
> 
> Then there comes the issue with code reuse - wouldn't it be great if a
> memory controller driver can be shared between platform drivers instead of
> copying it in both?
> 
> We already do that - see fsl_ddr_edac.c which gets shared between PPC
> *and* ARM. drivers/edac/skx_common.c is another example for Intel chips.
> 
> Now, if you have a platform with 10 IP blocks which each have RAS
> functionality, are you saying you'll do 10 different pieces called
> 
> <platform_name>_<ip_block#>_edac.c
> 
> ?
> 
> And if <next_platform> has an old IP block with the old RAS
> functionality, you load <platform_name>_<ip_block>_edac.c on the new
> platform too?

I'n not sure why <platform_name> ...

Anyway, let's get back to the specific case of our Amazon platform here
since it's a concrete example.

Hanna, can you give us a reasonably exhaustive list of how many such
"drivers" we'll want in the EDAC subsystem and whether you envision any
coordination requirement between them or not ?

Cheers,
Ben.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ