[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AE90C24D6B3A694183C094C60CF0A2F6026B6FC1@saturn3.aculab.com>
Date: Wed, 22 Aug 2012 16:27:56 +0100
From: "David Laight" <David.Laight@...LAB.COM>
To: "H. Peter Anvin" <hpa@...or.com>,
"Ben Hutchings" <bhutchings@...arflare.com>
Cc: "Benjamin LaHaise" <bcrl@...ck.org>,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
"David Miller" <davem@...emloft.net>, <tglx@...utronix.de>,
<mingo@...hat.com>, <netdev@...r.kernel.org>,
<linux-net-drivers@...arflare.com>, <x86@...nel.org>
Subject: RE: [PATCH 2/3] x86_64: Define 128-bit memory-mapped I/O operations
> Your architecture sounds similar to one I once worked on (Orion
> Microsystems CNIC/OPA-2). That architecture had a descriptor ring in
> device memory, and a single trigger bit would move the head pointer.
>
> We used write combining to write out a set of descriptors, and then
> used
> a non-write-combining write to do the final write which bumps the head
> pointer. The UC write flushes the write combiners ahead of it, so it
> ends up with two transactions (one for the WC data and one for the UC
> trigger) but it could frequently push quite a few descriptors in that
> operation.
The code actually looks more like a normal ethernet ring interface
with an 'owner' bit in each entry.
So it is important to write the owner bit last.
It might be possibly to set multiple ring entries in two TLPs
by first writing all of them (maybe with write combining)
but without changing the ownership of the first entry.
Then doing a second transfer to update the owner bit it
the first entry.
The order of the writes in the first transfer would then not
matter.
FWIW can you even guarantee to do an atomic 64bit PCIe transfer
on many systems (without resorting to a dma unit).
David
Powered by blists - more mailing lists