[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c58ba57-96ff-4d9b-a014-dfe3fc0c44b5@stanley.mountain>
Date: Tue, 11 Mar 2025 11:12:54 +0300
From: Dan Carpenter <dan.carpenter@...aro.org>
To: David Thompson <davthompson@...dia.com>
Cc: Borislav Petkov <bp@...en8.de>, Shravan Ramani <shravankr@...dia.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [bug report] EDAC, mellanox: Add ECC support for BlueField DDR4
On Tue, Mar 11, 2025 at 02:22:20AM +0000, David Thompson wrote:
> > -----Original Message-----
> > From: Borislav Petkov <bp@...en8.de>
> > Sent: Monday, March 3, 2025 9:52 AM
> > To: Dan Carpenter <dan.carpenter@...aro.org>; David Thompson
> > <davthompson@...dia.com>
> > Cc: Shravan Ramani <sramani@...lanox.com>; linux-edac@...r.kernel.org; lkml
> > <linux-kernel@...r.kernel.org>
> > Subject: Re: [bug report] EDAC, mellanox: Add ECC support for BlueField DDR4
> >
> > On Thu, Oct 24, 2024 at 11:20:45AM +0300, Dan Carpenter wrote:
> > > Hello Shravan Kumar Ramani,
> > >
> > > Commit 82413e562ea6 ("EDAC, mellanox: Add ECC support for BlueField
> > > DDR4") from Jun 25, 2019 (linux-next), leads to the following Smatch
> > > static checker warning:
> > >
> > > drivers/edac/bluefield_edac.c:205 bluefield_gather_report_ecc() error:
> > uninitialized symbol 'dram_syndrom'.
> > > drivers/edac/bluefield_edac.c:219 bluefield_gather_report_ecc() error:
> > uninitialized symbol 'dram_additional_info'.
> > > drivers/edac/bluefield_edac.c:231 bluefield_gather_report_ecc() error:
> > uninitialized symbol 'edea0'.
> > > drivers/edac/bluefield_edac.c:231 bluefield_gather_report_ecc() error:
> > uninitialized symbol 'edea1'.
> > > drivers/edac/bluefield_edac.c:256 bluefield_edac_check() error: uninitialized
> > symbol 'ecc_count'.
> > >
> > > drivers/edac/bluefield_edac.c
> > > 173 static void bluefield_gather_report_ecc(struct mem_ctl_info *mci,
> > > 174 int error_cnt,
> > > 175 int is_single_ecc)
> > > 176 {
> > > 177 struct bluefield_edac_priv *priv = mci->pvt_info;
> > > 178 u32 dram_additional_info, err_prank, edea0, edea1;
> > > 179 u32 ecc_latch_select, dram_syndrom, serr, derr, syndrom;
> > > 180 enum hw_event_mc_err_type ecc_type;
> > > 181 u64 ecc_dimm_addr;
> > > 182 int ecc_dimm, err;
> > > 183
> > > 184 ecc_type = is_single_ecc ? HW_EVENT_ERR_CORRECTED :
> > > 185 HW_EVENT_ERR_UNCORRECTED;
> > > 186
> > > 187 /*
> > > 188 * Tell the External Memory Interface to populate the relevant
> > > 189 * registers with information about the last ECC error occurrence.
> > > 190 */
> > > 191 ecc_latch_select = MLXBF_ECC_LATCH_SEL__START;
> > > 192 err = bluefield_edac_writel(priv, MLXBF_ECC_LATCH_SEL,
> > ecc_latch_select);
> > > 193 if (err)
> > > 194 dev_err(priv->dev, "ECC latch select write failed.\n");
> > > 195
> > > 196 /*
> > > 197 * Verify that the ECC reported info in the registers is of the
> > > 198 * same type as the one asked to report. If not, just report the
> > > 199 * error without the detailed information.
> > > 200 */
> > > 201 err = bluefield_edac_readl(priv, MLXBF_SYNDROM,
> > &dram_syndrom);
> > > 202 if (err)
> > > 203 dev_err(priv->dev, "DRAM syndrom read failed.\n");
> > >
> > > If bluefield_edac_readl() fails then dram_syndrom is uninitialized.
> > >
> > > 204
> > > --> 205 serr = FIELD_GET(MLXBF_SYNDROM__SERR, dram_syndrom);
> > > 206 derr = FIELD_GET(MLXBF_SYNDROM__DERR, dram_syndrom);
> > > 207 syndrom = FIELD_GET(MLXBF_SYNDROM__SYN, dram_syndrom);
> > >
> >
> > This looks forgotten.
> >
> > I'm thinking of:
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 8e0736dc2ee0..061149ade8c0 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8222,8 +8222,7 @@ F:
> > Documentation/devicetree/bindings/edac/aspeed-sdram-edac.txt
> > F: drivers/edac/aspeed_edac.c
> >
> > EDAC-BLUEFIELD
> > -M: Shravan Kumar Ramani <shravankr@...dia.com>
> > -S: Supported
> > +S: Orphan
> > F: drivers/edac/bluefield_edac.c
> >
> > EDAC-CALXEDA
> >
> > but lemme Cc people who have touched this recently first.
> >
> > Thx.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
>
> Shravan and I will investigate solving this 'smatch' warning.
>
> I setup a recent linux-next repo, installed the sparse package, and downloaded + built the smatch tool.
> I execute the following command to run 'smatch' over the entire linux kernel, including bluefield_edac.c:
> $ make CHECK="/path/to/smatch -p=kernel --file-output" C=1 -j64
>
> I don't see any warnings related to the bluefield_edac.c module. That is, the generated file
> "drivers/edac/bluefield_edac.c.smatch" has size 0, and thus no content.
>
> What is the recommended command line options to pass to Linux kernel build?
> Is there something I am missing here?
The actual issue is the first "return -1" in secure_readl().
You need build the cross function database to see the warning in Smatch.
In fact you'd need to build it twice because it's two functions away.
It's not hard to build the database but it takes a few hours.
~/progs/smatch/release/smatch_scripts/build_kernel_data.sh
I wouldn't bother trying to silence the warning. We don't worry about
false positives. The only thing to consider is the kernel code.
Probably bluefield_edac_readl() can't actually fail in real life can
it? There are a lot of functions like that in the kernel. If reading
from the PCI stops working, there is nothing you can do in the software
to recover from that, you need to buy new hardware. If you want I can
just add bluefield_edac_readl() to the list of functions which don't
actually fail.
regards,
dan carpenter
Powered by blists - more mailing lists