[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd947c4ec6044521a92e2cc39eae5406@huawei.com>
Date: Thu, 27 Aug 2020 14:02:27 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Borislav Petkov <bp@...en8.de>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mchehab@...nel.org" <mchehab@...nel.org>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"james.morse@....com" <james.morse@....com>,
"rrichter@...vell.com" <rrichter@...vell.com>,
Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in
ghes_edac_register()
Hello Boris,
Thanks for reviewing.
>-----Original Message-----
>From: linux-edac-owner@...r.kernel.org [mailto:linux-edac-
>owner@...r.kernel.org] On Behalf Of Borislav Petkov
>Sent: 26 August 2020 09:52
>To: Shiju Jose <shiju.jose@...wei.com>
>Cc: linux-edac@...r.kernel.org; linux-kernel@...r.kernel.org;
>mchehab@...nel.org; tony.luck@...el.com; james.morse@....com;
>rrichter@...vell.com; Linuxarm <linuxarm@...wei.com>
>Subject: Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in
>ghes_edac_register()
>
>On Tue, Aug 25, 2020 at 02:01:08PM +0100, Shiju Jose wrote:
>> After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on
>driver init")'
>> applied, following error has occurred in ghes_edac_register() when
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null
>ghes_hw.dimms
>> pointer in the mci_for_each_dimm() of ghes_edac_register() caused the
>error.
>>
>> The error occurs when all the previously initialized ghes instances
>> are removed and then probe a new ghes instance. In this case, the
>> ghes_refcount would be 0, ghes_hw.dimms and mci already freed. The
>> ghes_hw.dimms would be null because ghes_scan_system() would not call
>enumerate_dimms() again.
>
>Try the below instead and see if it fixes the issue for you too.
>
>If it does, pls send it as v2 but do not add the splat to the commit message -
>that's a lot of noise for something which is clear why it happens and you
>explain it properly in text anyway.
I tested with your changes and it fixes the issue. I will send v2.
>
>Thx.
>
>---
>diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
>da60c29468a7..54ebc8afc6b1 100644
>--- a/drivers/edac/ghes_edac.c
>+++ b/drivers/edac/ghes_edac.c
>@@ -55,6 +55,8 @@ static DEFINE_SPINLOCK(ghes_lock); static bool
>__read_mostly force_load; module_param(force_load, bool, 0);
>
>+static bool system_scanned;
>+
> /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry {
> u8 type;
>@@ -225,14 +227,12 @@ static void enumerate_dimms(const struct
>dmi_header *dh, void *arg)
>
> static void ghes_scan_system(void)
> {
>- static bool scanned;
>-
>- if (scanned)
>+ if (system_scanned)
> return;
>
> dmi_walk(enumerate_dimms, &ghes_hw);
>
>- scanned = true;
>+ system_scanned = true;
> }
>
> void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
>*mem_err) @@ -631,6 +631,8 @@ void ghes_edac_unregister(struct ghes
>*ghes)
>
> mutex_lock(&ghes_reg_mutex);
>
>+ system_scanned = false;
>+
> if (!refcount_dec_and_test(&ghes_refcount))
> goto unlock;
>
>
>--
>Regards/Gruss,
> Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette
Thanks,
Shiju
Powered by blists - more mailing lists