linux-kernel - Re: [PATCH 2/3] pci: Clamp pcie_set_readrq() when using "performance" settings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111004164430.GH19130@kvack.org>
Date:	Tue, 4 Oct 2011 12:44:30 -0400
From:	Benjamin LaHaise <bcrl@...ck.org>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:	Jon Mason <mason@...i.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH 2/3] pci: Clamp pcie_set_readrq() when using "performance" settings

On Tue, Oct 04, 2011 at 06:19:57PM +0200, Benjamin Herrenschmidt wrote:
> On Tue, 2011-10-04 at 11:59 -0400, Benjamin LaHaise wrote:
...
> > My concern, which I forgot to put in the original message is that allowing 
> > a bridge to have a large MPS than the endpoints will result in things 
> > failing when a large write occurs.  Aiui, we don't restrict the size of 
> > memcpy_toio() type functions, and there are PCIe devices which do not 
> > perform DMA.  Clamping MPS on the bridge is a requirement of correctness.
> 
> Yes, this is a problem if your bridge can generate large writes. Ours
> can't so this is safe. Our bridges can't write combine beyond 128 bytes.

> Originally, I had that whole logic implemented in arch specific code,
> but when Jon started fiddling with MPS/MRRS in generic code, I suggested
> to have the generic code have the option of doing what I want. Maybe we
> should remove it completely as a user-visible option and leave it back
> to platform code only. In any case, it isn't the default.

Ah, I see now.  Yes, if that can be put into arch specific code, it makes 
far more sense since things will work correctly assuming the bridge also 
limits read completions with data to 128 bytes?

> Our performance guys say that between 128 and 512 or 2048 which is what
> we face in some cases here, it's worth it. I haven't yet had a chance to
> verify by myself.

Just to give you an idea of the problems with generating small read 
read requests: the on-the-wire cost of sending a 1500 byte ethernet packet 
with 128 byte read requests is the difference between 1x 12-16 byte TLP 
on the transmit link of the endpoint vs 12x 12-16 byte TLPs (the overhead 
for a TLP is at least 8 bytes of header and CRC iirc, maybe more for the 
DLLP flag bytes) or going from 20-24 bytes per tx packet to 256-288 bytes 
of read requests.  This won't matter much for devices with lots of excess 
bandwidth, but for PCIe 1x devices (ie common gige chipsets) that really 
hurts.  It also means that only 2 or 3 packets can have read requests 
outstanding with the default limit of 32 tags, but hopefully the device 
supports extended tags.

		-ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/