[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150216113959.GA4458@pd.tnic>
Date: Mon, 16 Feb 2015 12:40:00 +0100
From: Borislav Petkov <bp@...en8.de>
To: Daniel J Blueman <daniel@...ascale.com>
Cc: Doug Thompson <dougthompson@...ssion.com>,
Mauro Carvalho Chehab <mchehab@....samsung.com>,
linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
Steffen Persvold <sp@...ascale.com>
Subject: Re: [PATCH] x86: Prevent oops with >16 memory controllers
On Sat, Feb 14, 2015 at 11:18:40AM +0800, Daniel J Blueman wrote:
> When ECC interrupts occur on memory controllers after EDAC_MAX_MCS (16), the
I knew this artificial limit would come back to bite us someday :-\
> kernel fatally dereferences unallocated structures [1]; this occurs on at
> least NumaConnect systems.
>
> Minimally fix by checking if a memory controller info structure is allocated;
> candidate for stable.
>
> Signed-off-by: Daniel J Blueman <daniel@...ascale.com>
>
> -- [1]
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
> IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
> PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
> Oops: 0000 [#2] SMP
> Modules linked in:
> CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1
CPU 224?! What node is that? :)
> ---
> drivers/edac/amd64_edac.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 17638d7..baccc0e 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -2175,7 +2175,7 @@ static void __log_bus_error(struct mem_ctl_info *mci, struct err_info *err,
> static inline void decode_bus_error(int node_id, struct mce *m)
> {
> struct mem_ctl_info *mci = mcis[node_id];
> - struct amd64_pvt *pvt = mci->pvt_info;
> + struct amd64_pvt *pvt;
> u8 ecc_type = (m->status >> 45) & 0x3;
> u8 xec = XEC(m->status, 0x1f);
> u16 ec = EC(m->status);
> @@ -2190,6 +2190,11 @@ static inline void decode_bus_error(int node_id, struct mce *m)
> if (xec && xec != F10_NBSL_EXT_ERR_ECC)
> return;
>
> + /* Unable to decode on memory controllers after EDAC_MAX_MCS, as no mci is allocated */
> + if (!mci)
> + return;
> + pvt = mci->pvt_info;
Hmm, so we have all the facilities to fix that properly, IINM:
edac_mc_find(), add_mc_to_global_list() and so on.
Would looking through the list of the memory controllers help instead,
i.e. if you do:
static inline void decode_bus_error(int node_id, struct mce *m)
{
struct mem_ctl_info *mci = edac_mc_find(node_id);
if (!mci)
return;
?
Then we can get rid of that local mcis dumbness and do it properly...
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists