[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAErSpo5=LOxrcSZZy2+9rTNcgJvzTXMoYBDPHNvz+e+j9P5xkg@mail.gmail.com>
Date: Fri, 30 Sep 2011 11:57:37 -0600
From: Bjorn Helgaas <bhelgaas@...gle.com>
To: Jon Mason <mason@...i.com>
Cc: Avi Kivity <avi@...hat.com>, Sven Schnelle <svens@...ckframe.org>,
Simon Kirby <sim@...tway.ca>,
Eric Dumazet <eric.dumazet@...il.com>,
Niels Ole Salscheider <niels_ole@...scheider-online.de>,
Jesse Barnes <jbarnes@...tuousgeek.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
Ben Hutchings <bhutchings@...arflare.com>
Subject: Re: Workaround for Intel MPS errata
On Fri, Sep 30, 2011 at 11:38 AM, Jon Mason <mason@...i.com> wrote:
>
> On Fri, Sep 30, 2011 at 12:17 PM, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
> > On Fri, Sep 30, 2011 at 9:35 AM, Jon Mason <mason@...i.com> wrote:
> >>
> >> On Fri, Sep 30, 2011 at 12:01 AM, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
> >> > On Thu, Sep 29, 2011 at 6:16 PM, Jon Mason <mason@...i.com> wrote:
> >> >> Hey Avi,
> >> >> Can you try this patch? It should resolve the issue you are seeing.
> >> >>
> >> >> Thanks,
> >> >> Jon
> >> >>
> >> >> PCI: Workaround for Intel MPS errata
> >> >>
> >> >> Intel 5000 and 5100 series memory controllers have a known issue if read
> >> >> completion coalescing is enabled (the default setting) and the PCI-E
> >> >> Maximum Payload Size is set to 256B. To work around this issue, disable
> >> >> read completion coalescing if the MPS is 256B.
> >> >
> >> > I'd much rather see this done as an early quirk so it doesn't clutter probe.c.
> >> >
> >> > I don't know how you decide whether
> >> > - no coalescing with MPS=256, or
> >> > - coalescing with MPS=128
> >> > is better. I suspect that having a quirk that doesn't change the
> >> > setting, but merely limits MPS to 128 if the BIOS enabled coalescing,
> >> > would be simplest and would stay in the best-tested chipset
> >> > configuration.
> >>
> >> This is what I was debating yesterday. Is it better to disable
> >> coalescing and get better throughput (which could be a net negative if
> >> the MPS isn't 256) or never allow it to be greater than 128? There is
> >> no way of knowing at quirk time if the disable is necessary or not,
> >> only when setting the MPS is it known (which is why I did it this
> >> way). I could, as you suggest, simply read the bit and see if it is
> >> enabled by the BIOS (which I'd bet it is every single time), and then
> >> limit the MPS to 128 as a quirk. This would be fairly simple to do.
> >> However, the errata from Intel says Windows 2008 always disables the
> >> coalescing and sets the MPS to 256B. With this known, Linux's I/O
> >> performance would be less than Windows on these systems. ...
> >
> > Presumably coalescing improves performance, too, and I don't have the
> > evidence that says "no coalescing with MPS=256" performs better than
> > "coalescing with MPS=128."
> >
> > But the fact that Windows 2008 disables coalescing is worth a lot (if
> > this is in a public erratum, a URL would be good). Given that, I'd
>
> The URLs were in the top of the patch, but perhaps they weren't obvious. ...
> Search for "MPS", as they are several pages into the errata chapter.
I saw the errata, but searching for "Windows" failed. But now I see
the "Microsoft Server 2008" reference in the 5100 errata. The wording
is interesting, though... it says:
Microsoft Server 2008* forces the maximum payload size to 256 B in the
PEXDEVCTRL register; therefore, the workaround with coalescing disabled
must be used for Microsoft Server 2008*.
I would read that as saying "Windows Server 2008 always sets MPS to
256 but doesn't do anything with the coalescing setting. Therefore,
if you want to run Server 2008 on your system, make sure the BIOS
leaves coalescing disabled."
> > Maybe the quirk could be moved out of the generic code by making
> > pcie_set_mps() a weak function, so x86 could supply a version that
> > disables coalescing if MPS=256?
>
> Not sure what you mean here. Are you saying to make the function
> defined differently on each arch?
You can define the generic pcie_set_mps() with the __weak attribute in
drivers/pci/probe.c. Then if you put a normal (non-weak)
pcie_set_mps() definition in the arch code, it will be used instead of
the weak definition. If the arch doesn't supply a definition, the
generic weak version will be used.
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists