[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230322164505.GA1641600@bgt-140510-bm03>
Date: Wed, 22 Mar 2023 16:45:15 +0000
From: Fan Ni <fan.ni@...sung.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: "alison.schofield@...el.com" <alison.schofield@...el.com>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
"ira.weiny@...el.com" <ira.weiny@...el.com>,
"bwidawsk@...nel.org" <bwidawsk@...nel.org>,
"Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
Adam Manzanares <a.manzanares@...sung.com>,
"dave@...olabs.net" <dave@...olabs.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cxl/hdm: Fix hdm decoder init by adding COMMIT field
check
On Fri, Mar 03, 2023 at 02:36:25PM -0800, Dan Williams wrote:
> Fan Ni wrote:
> [..]
> > > I think a separate fix for that crash is needed, can you send the
> > > backtrace? I.e. I worry that crash can be triggered by other means.
> > Hi Dan,
> > See backtrace below.
>
> Thanks, I'll take a look.
>
> [..]
> > > > @@ -710,10 +711,11 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> > > > base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
> > > > size = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which));
> > > > committed = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED);
> > > > + should_commit = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMIT);
> > >
> > > This change looks like a good idea in general given the ambiguity of
> > > 'committed'. However just combine the two checks into the @committed
> > > variable with something like this:
> > >
> > > commit_mask = CXL_HDM_DECODER0_CTRL_COMMITTED|CXL_HDM_DECODER0_CTRL_COMMIT;
> > > committed = (ctrl & commit_mask) == commit_mask;
>
> Did you also notice this ^^^ request for a fixed up version of the
> current patch?
Hi Dan,
Jonathan sent out a qemu patch to fix the committed field
reset as below, and the patch fixed the system crash discussed here.
https://lore.kernel.org/linux-cxl/20230322102731.4219-1-Jonathan.Cameron@huawei.com/T/#me5283349b37d53abc93904a2428910a2f6a354f6
Do you think we need a separate fix at kernel side to fix the
possible system crash when cxl_dpa_release is called and dpa_res is
null? I have noticed at some location, dpa_res is checked before
calling cxl_dpa_release for example in function cxl_dpa_free, but no guard
from other callers. If it is needed, I have a simple fix and ready
to send out.
Fan
Powered by blists - more mailing lists