netdev - Re: [PATCH 2/3] x86_64: Define 128-bit memory-mapped I/O operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1345650075.2709.22.camel@bwh-desktop.uk.solarflarecom.com>
Date:	Wed, 22 Aug 2012 16:41:15 +0100
From:	Ben Hutchings <bhutchings@...arflare.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	"H. Peter Anvin" <hpa@...or.com>,
	David Miller <davem@...emloft.net>, <tglx@...utronix.de>,
	<mingo@...hat.com>, <netdev@...r.kernel.org>,
	<linux-net-drivers@...arflare.com>, <x86@...nel.org>
Subject: Re: [PATCH 2/3] x86_64: Define 128-bit memory-mapped I/O operations

On Wed, 2012-08-22 at 07:50 -0700, Linus Torvalds wrote:
> On Wed, Aug 22, 2012 at 7:24 AM, Ben Hutchings
> <bhutchings@...arflare.com> wrote:
> >
> > Sorry, I'll paste it below.
> 
> The thing you pasted isn't actually the thing in the subject line.
> It's just you *using* it.
> 
> I wanted to see what that "writeo()" looks like for x86-64.
> 
> But I got google to find it for me by looking for "__raw_writeo", so I
> can see the patch now. It looks like it might work. Does it really
> help performance despite always doing those TS games in CR0 for each
> access?

I haven't run the experiment myself, but my colleagues observed a net
reduction of 100s of nanoseconds of latency.  That may not sound like
much, but for a small packet traversing an uncongested twinax link it's
around 5-10% of the total latency from the descriptor pointer write to
DMA completion on the peer.

Later, you wrote:
> Btw, are we even certain that a 128-bit PCIe write is going to remain
> atomic across a bus (ie over various PCIe bridges etc)?

I don't think PCIe bridges are allowed to split up TLPs (this is why the
PCI core has to be so careful about programming Max Payload Size).  What
happens between the processor core and the host bridge is another
matter, though.

> Do you you
> care? Is it just a "one transaction is cheaper than two", and it
> doesn't really have any ordering constraints? If the thing gets split
> into two 64-bit transactions (in whatever order) by a bridge on the
> way, would that be ok?

We care if the two transactions are not in ascending address order;
that's why we had to abandon write combining.

> We've seen buses split accesses before (ie 64-bit writes on 32-bit
> PCI).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html