[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2105800-db52-513f-e163-3769a26d1dfb@arm.com>
Date: Tue, 5 Feb 2019 17:31:24 +0000
From: James Morse <james.morse@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: Rui Zhao <ruizhao@...rosoft.com>, Sasha Levin <sashal@...nel.org>,
"mchehab@...nel.org" <mchehab@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Linux Kernel <linux-kernel@...rosoft.com>,
"will.deacon@....com" <will.deacon@....com>,
"okaya@...nel.org" <okaya@...nel.org>
Subject: Re: [PATCH] EDAC, dmc520:: add DMC520 EDAC driver
Hi Boris,
On 23/01/2019 18:46, Borislav Petkov wrote:
> On Wed, Jan 23, 2019 at 06:36:23PM +0000, James Morse wrote:
>>> Would like to know what's the impact if this error happens, and how to fit it
>>> with current reporting in EDAC core.
>>
>> At a guess the interrupt triggers when link_err_count increases. (link_err has
>> an overflow bit, so the interrupt must be related to a counter).
>>
>> If we could associate a link with a layer in edac, we could report errors
>> against that point. But I've no idea how 'links' correspond with 'ranks and banks'!
> Well, I have no clue what kind of links you guys are talking but if
> those are per-chance coherent links used by cores to communicate in a
> coherent fabric, or cores and devices, what would showing those errors
> to the user bring ya?
(I mentioned this because its the next interrupt in the register, its an example
of something that may be added for another platform in the future, which affects
the DT and probing)
> Or are ya talking about different kinds of links?
... whatever the manual means by 'link', good point, it could be the
interconnect side.
'alert_mode_next', in the feature control register talks about DIMM training,
and says 'dfi_err' is treated a a link error. DFI is defined earlier as the 'DDR
PHY interface', so these must be links between the DMC520 and DDR.
> In any case, the first question to ask would be, can some agent or the
> user do something with the information that X or Y link errors happened?
>
> If not, then why bother?
> If yes, then that's a different story.
I agree. Surely if the DIMMs are socketed link-errors are another reason to
replace the DIMM.
It looks like this doesn't matter on Rui's platform,
Thanks,
James
Powered by blists - more mailing lists