[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250811134339.GA616@yaz-khff2.amd.com>
Date: Mon, 11 Aug 2025 09:43:39 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: "Naik, Avadhut" <avadnaik@....com>
Cc: Avadhut Naik <avadhut.naik@....com>, linux-edac@...r.kernel.org,
bp@...en8.de, john.allen@....com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] EDAC/amd64: Incorporate DRAM Address in EDAC message
On Wed, Aug 06, 2025 at 04:08:07PM -0500, Naik, Avadhut wrote:
[...]
> >> @@ -2808,11 +2824,13 @@ static void umc_get_err_info(struct mce *m, struct err_info *err)
> >> static void decode_umc_error(int node_id, struct mce *m)
> >> {
> >> u8 ecc_type = (m->status >> 45) & 0x3;
> >> + struct dram_addr dram_addr;
> >> struct mem_ctl_info *mci;
> >> unsigned long sys_addr;
> >> struct amd64_pvt *pvt;
> >> struct atl_err a_err;
> >> struct err_info err;
> >> + int ret;
> >>
> >> node_id = fixup_node_id(node_id, m);
> >>
> >> @@ -2822,6 +2840,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> >>
> >> pvt = mci->pvt_info;
> >>
> >> + memset(&dram_addr, 0, sizeof(dram_addr));
> >> memset(&err, 0, sizeof(err));
> >>
> >> if (m->status & MCI_STATUS_DEFERRED)
> >> @@ -2853,6 +2872,10 @@ static void decode_umc_error(int node_id, struct mce *m)
> >> goto log_error;
> >> }
> >>
> >> + ret = amd_convert_umc_mca_addr_to_dram_addr(&a_err, &dram_addr);
> >> + if (!ret)
> >> + err.dram_addr = &dram_addr;
> >
> > I feel like it is not necessary to pass a second struct if it is already
> > contained in another.
> >
> > You could just clear (or not set) that field if an error occurs.
> >
> Slightly confused here.
> Do you mean we should avoid passing dram_addr as second parameter
> for amd_convert_umc_mca_addr_to_dram_addr() and instead just pass
> struct err_info instance err?
>
> And, in case some error occurs, we should just do
> err.dram_addr = 0x0;
> ?
>
Sorry, I think I misread this before.
I was thinking you can add 'struct dram_addr' to 'struct atl_err'. The
intent of 'struct atl_err' is to collect all needed parameters for the
translation functions.
Additionally, I just realized, we should have an 'invalid' default value
for dram_addr. Technically, bank=0, row=0, col=0, etc., would be a valid
DRAM address.
Thanks,
Yazen
Powered by blists - more mailing lists