linux-kernel - Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190612084213.4fb9e054@coco.lan>
Date:   Wed, 12 Jun 2019 08:42:13 -0300
From:   Mauro Carvalho Chehab <mchehab@...nel.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        James Morse <james.morse@....com>,
        "Hawa, Hanna" <hhhawa@...zon.com>,
        "robh+dt@...nel.org" <robh+dt@...nel.org>,
        "Woodhouse, David" <dwmw@...zon.co.uk>,
        "paulmck@...ux.ibm.com" <paulmck@...ux.ibm.com>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "nicolas.ferre@...rochip.com" <nicolas.ferre@...rochip.com>,
        "devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
        "Shenhar, Talel" <talel@...zon.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Chocron, Jonathan" <jonnyc@...zon.com>,
        "Krupnik, Ronen" <ronenk@...zon.com>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "Hanoch, Uri" <hanochu@...zon.com>
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Em Wed, 12 Jun 2019 13:00:39 +0200
Borislav Petkov <bp@...en8.de> escreveu:

> On Wed, Jun 12, 2019 at 07:42:42AM -0300, Mauro Carvalho Chehab wrote:
> > That's said, from the admin PoV, it makes sense to have a single
> > daemon that collect errors from all error sources and take the
> > needed actions.  
> 
> Doing recovery actions in userspace is too flaky. Daemon can get killed
> at any point in time and there are error types where you want to do
> recovery *before* you return to userspace.

Yeah, some actions would work a lot better at Kernelspace. Yet, some
actions would work a lot better if implemented on userspace.

For example, a server with multiple network interfaces may re-route
the traffic to a backup interface if the main one has too many errors.

This can easily be done on userspace.

> Yes, we do have different error reporting facilities but I still think
> that concentrating all the error information needed in order to do
> proper recovery action is the better approach here. And make that part
> of the kernel so that it is robust. Userspace can still configure it and
> so on.

If the error reporting facilities are for the same hardware "group"
(like the machine's memory controllers), I agree with you: it makes
sense to have a single driver. 

If they are for completely independent hardware then implementing
as separate drivers would work equally well, with the advantage of
making easier to maintain and make it generic enough to support
different vendors using the same IP block.

Thanks,
Mauro