[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190422171532.GH21457@zn.tnic>
Date: Mon, 22 Apr 2019 19:15:32 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time
On Mon, Apr 22, 2019 at 03:59:16PM +0000, Luck, Tony wrote:
> > Err, this all sounds to me like the storm detection code should
> > *automatically* disable the CEC in such cases, I'd say.
>
> Sounds good. But we should distinguish storms that have many different
> addresses from storms that just ping a few addresses. CEC will see counts
> hit the threshold in the latter case, but it might not be able to take the pages
> offline (because they are locked, or in-use by kernel).
>
> So I think the change might be to the return value from NOTIFY_STOP to NOTIFY_DONE
> ... but only if we are in the middle of a storm AND the CEC array is full.
Well, regardless of this specific use case, isn't that a generic enough
action that we should do always? I mean, the aspect of falling back to
logging to external agent.
However, currently we don't signal that the CEC is full - we simply
remove the LRU element in cec_add_elem() before we insert the new one.
We can either return a specific retval to say, CEC is full and we had to
delete an elem or we can add a cec_is_full() accessor...
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Powered by blists - more mailing lists