lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Feb 2019 17:31:24 +0000
From:   James Morse <james.morse@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Rui Zhao <ruizhao@...rosoft.com>, Sasha Levin <sashal@...nel.org>,
        "mchehab@...nel.org" <mchehab@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linux Kernel <linux-kernel@...rosoft.com>,
        "will.deacon@....com" <will.deacon@....com>,
        "okaya@...nel.org" <okaya@...nel.org>
Subject: Re: [PATCH] EDAC, dmc520:: add DMC520 EDAC driver

Hi Boris,

On 23/01/2019 18:46, Borislav Petkov wrote:
> On Wed, Jan 23, 2019 at 06:36:23PM +0000, James Morse wrote:
>>> Would like to know what's the impact if this error happens, and how to fit it
>>> with current reporting in EDAC core.
>>
>> At a guess the interrupt triggers when link_err_count increases. (link_err has
>> an overflow bit, so the interrupt must be related to a counter).
>>
>> If we could associate a link with a layer in edac, we could report errors
>> against that point. But I've no idea how 'links' correspond with 'ranks and banks'!


> Well, I have no clue what kind of links you guys are talking but if
> those are per-chance coherent links used by cores to communicate in a
> coherent fabric, or cores and devices, what would showing those errors
> to the user bring ya?

(I mentioned this because its the next interrupt in the register, its an example
of something that may be added for another platform in the future, which affects
the DT and probing)


> Or are ya talking about different kinds of links?

... whatever the manual means by 'link', good point, it could be the
interconnect side.

'alert_mode_next', in the feature control register talks about DIMM training,
and says 'dfi_err' is treated a a link error. DFI is defined earlier as the 'DDR
PHY interface', so these must be links between the DMC520 and DDR.


> In any case, the first question to ask would be, can some agent or the
> user do something with the information that X or Y link errors happened?
> 
> If not, then why bother?
> If yes, then that's a different story.

I agree. Surely if the DIMMs are socketed link-errors are another reason to
replace the DIMM.

It looks like this doesn't matter on Rui's platform,


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ