[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200806030952.10360.jbarnes@virtuousgeek.org>
Date: Tue, 3 Jun 2008 09:52:09 -0700
From: Jesse Barnes <jbarnes@...tuousgeek.org>
To: Nick Piggin <nickpiggin@...oo.com.au>
Cc: Jes Sorensen <jes@....com>, Jeremy Higdon <jeremy@....com>,
Roland Dreier <rdreier@...co.com>, benh@...nel.crashing.org,
Arjan van de Ven <arjan@...radead.org>,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
tpiepho@...escale.com, linuxppc-dev@...abs.org,
scottwood@...escale.com, torvalds@...ux-foundation.org,
David Miller <davem@...emloft.net>, alan@...rguk.ukuu.org.uk
Subject: Re: MMIO and gcc re-ordering issue
On Monday, June 02, 2008 9:33 pm Nick Piggin wrote:
> On Monday 02 June 2008 19:56, Jes Sorensen wrote:
> > Jeremy Higdon wrote:
> > > We don't actually have that problem on the Altix. All writes issued
> > > by CPU X will be ordered with respect to each other. But writes by
> > > CPU X and CPU Y will not be, unless an mmiowb() is done by the
> > > original CPU before the second CPU writes. I.e.
> > >
> > > CPU X writel
> > > CPU X writel
> > > CPU X mmiowb
> > >
> > > CPU Y writel
> > > ...
> > >
> > > Note that this implies some sort of locking. Also note that if in
> > > the above, CPU Y did the mmiowb, that would not work.
> >
> > Hmmm,
> >
> > Then it's less bad than I thought - my apologies for the confusion.
> >
> > Would we be able to use Ben's trick of setting a per cpu flag in
> > writel() then and checking that in spin unlock issuing the mmiowb()
> > there if needed?
>
> Yes you could, but your writels would still not be strongly ordered
> within (or outside) spinlock regions, which is what Linus wants (and
> I kind of agree with).
I think you mean wrt cacheable memory accesses here (though iirc on ia64
spin_unlock has release semantics, so at least it'll barrier other stores).
> This comes back to my posting about mmiowb and io_*mb barriers etc.
>
> Despite what you say, what you've done really _does_ change the semantics
> of wmb() for all drivers. It is a really sad situation we've got ourselves
> into somehow, AFAIKS in the hope of trying to save ourselves a tiny bit
> of work upfront :( (this is not just the sgi folk with mmiowb I'm talking
> about, but the whole random undefinedness of ordering and io barriers).
>
> The right way to make any change is never to weaken the postcondition of
> an existing interface *unless* you are willing to audit the entire tree
> and fix it. Impossible for drivers, so the correct thing to do is introduce
> a new interface, and move things over at an easier pace. Not rocket
> science.
Well, given how undefined things have been in the past, each arch has had to
figure out what things mean (based on looking at drivers & core code) then
come up with appropriate primitives. On Altix, we went both directions: we
made regular PIO reads (readX etc.) *very* expensive to preserve
compatibility with what existing drivers expect, and added a readX_relaxed to
give a big performance boost to tuned drivers.
OTOH, given that posted PCI writes were nothing new to Linux, but the Altix
network topology was, we introduced mmiowb() (with lots of discussion I might
add), which has clear and relatively simple usage guidelines.
Now, in hindsight, using a PIO write set & test flag approach in
writeX/spin_unlock (ala powerpc) might have been a better approach, but iirc
that never came up in the discussion, probably because we were focused on PCI
posting and not uncached vs. cached ordering.
> The argument that "Altix only uses a few drivers so this way we can just
> fix these up rather than make big modifications to large numbers of
> drivers" is bogus. It is far worse even for Altix if you make incompatible
> changes, because you first *break* every driver on your platform, then you
> have to audit and fix them. If you make compatible changes, then you have
> to do exactly the same audits to move them over to the new API, but you go
> from slower->faster rather than broken->non broken. As a bonus, you haven't
> got random broken stuff all over the tree that you forgot to audit.
I agree, but afaik the only change Altix ended up forcing on people was
mmiowb(), but that turned out to be necessary on mips64 (and maybe some other
platforms?) anyway.
> I don't know how there is still so much debate about this :(
>
> I have a proposal: I am a neutral party here, not being an arch maintainer,
> so I'll take input and write up a backward compatible API specification
> and force everybody to conform to it ;)
Aside from the obvious performance impact of making all the readX/writeX
routines strongly ordered, both in terms of PCI posting and cacheable vs.
uncacheable accesses, it also makes things inconsistent. Both core code &
drivers will still have to worry about regular, cacheable memory barriers for
correctness, but it looks like you're proposing that they not have to think
about I/O ordering.
At any rate, I don't think anyone would argue against defining the ordering
semantics of all of these routines (btw you should also include ordering wrt
DMA & PCI posting); the question is what's the best balance between keeping
the driver requirements simple and the performance cost on complex arches.
Jesse
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists