linux-kernel - Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1502810766.2042.149.camel@hpe.com>
Date:   Tue, 15 Aug 2017 15:35:51 +0000
From:   "Kani, Toshimitsu" <toshi.kani@....com>
To:     "bp@...en8.de" <bp@...en8.de>
CC:     "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "mchehab@...nel.org" <mchehab@...nel.org>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: [PATCH v2 4/7] ghes_edac: avoid multiple calls to dmi_walk()

On Mon, 2017-08-14 at 22:39 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 08:17:54PM +0000, Kani, Toshimitsu wrote:
> > I think the current code design of allocating mci & ghes_edac_pvt
> > for each GHES source entry makes sense.
> 
> And I don't.
> 
> > edac_raw_mc_handle_error() also has the same expectation that the
> > call is serialized per mci.
> 
> There's no such thing as "per mci" if the driver scans *all DIMMs*
> per register call. If it does it this way, then it is only one mci.

ghes_edac instantiates an mci as a pseudo device representing a GHES
error source.  Each error source associates with all DIMMs, and may
report errors independently.  As ghes_edac is an GHES error-reporting
wrapper to edac, this abstraction makes sense.

> It is actually wrong right now because if you register more than one
> mci and you do edac_inc_ce_error()/edac_inc_ue_error(), potentially
> different counters get incremented for the same errors. Exactly
> because each instance registered is *wrongly* responsible for all
> DIMMs on the system.

I do not see a problem in having counters for each GHES error source. 
This is just statistics info, and ghes_edac does not expect any OS
action from the counters.

> So you either need to partition the DIMMs per mci (which I can't
> imagine how it would work) or introduce locking when incrementing the
> mci->counters.

I do not think changing the calling convention to edac library
interfaces is a good idea for a special case like ghes_edac.  Such
changes can be a burden for us going forward.  I think ghes_edac just
needs to work with the current prerequisite.

User apps like ras-mc-ctl works as expected for a given (not-so-great)
DIMM info from SMBIOS as well.  I do not see a probelm from user
perspective, either.

Thanks,
-Toshi