[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140909071252.GB27821@gchen.bj.intel.com>
Date: Tue, 9 Sep 2014 03:12:52 -0400
From: "Chen, Gong" <gong.chen@...ux.intel.com>
To: Bjorn Helgaas <bhelgaas@...gle.com>
Cc: rdunlap@...radead.org, bp@...en8.de, tony.luck@...el.com,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RESEND RFC 5/5] PCIe, AER: Update initial value of UC error mask
On Fri, Sep 05, 2014 at 05:34:21PM -0600, Bjorn Helgaas wrote:
> Date: Fri, 5 Sep 2014 17:34:21 -0600
> From: Bjorn Helgaas <bhelgaas@...gle.com>
> To: "Chen, Gong" <gong.chen@...ux.intel.com>
> Cc: rdunlap@...radead.org, bp@...en8.de, tony.luck@...el.com,
> linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
> Subject: Re: [RESEND RFC 5/5] PCIe, AER: Update initial value of UC error
> mask
> User-Agent: Mutt/1.5.21 (2010-09-15)
>
> On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote:
> > In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register
> > is redefined and it has an explicit requirement that when writing
> > this field, a value of 1b is the only choice. So change previous
> > initial maks from 0 to 1.
> >
> > Signed-off-by: Chen, Gong <gong.chen@...ux.intel.com>
> > ---
> > NOTE: After scratching all use cases, this is the most obvious use
> > case to violate the SPEC. Most of use cases just read first and
> > then overwrite for clear purpose. Even so, such fix is obvious to
> > not compatiable with previous SPEC definition. Do we need a dirty
> > hack?
> >
> > arch/mips/pci/pci-octeon.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
> > index 59cccd95688b..f1bfdc201297 100644
> > --- a/arch/mips/pci/pci-octeon.c
> > +++ b/arch/mips/pci/pci-octeon.c
> > @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev)
> > dconfig);
> > /* Enable reporting of all uncorrectable errors */
> > /* Uncorrectable Error Mask - turned on bits disable errors */
> > - pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0);
> > + pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1);
>
> I see the text in the spec that says we should only write 1 to bit 0 (sec
> 7.10.3, for anybody following along). It looks like that change was made
> between PCIe r1.0 and r1.1. It would really be nice to have more context
> about why the change was made, because if there's hardware in the field
> that implements r1.0 behavior, this patch will change the way it works, and
> I don't know how to verify that is safe.
>
> Does this actually change fix a problem? If it fixes a problem that
> happens on real hardware, that's a much better reason to make a change than
> just to comply with the spec.
>
> Sec 7.10.2 also says we should ignore the value of bit 0 in the
> Uncorrectable Error Status register, and I don't see any place where we
> follow that advice.
>
That's why I mark this patch as RFC. As you mentioned above, these are my
concerns, too. I submit such a patch not for merging but throwing a potential
issue. As I noted above, I don't know if it is deserved to fix all affected
placed to comply with spec change. After all, no one reports such an
issue (or maybe have happened :-))
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists