[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1212073289.3428.30.camel@localhost.localdomain>
Date: Thu, 29 May 2008 10:01:29 -0500
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Jes Sorensen <jes@....com>
Cc: Roland Dreier <rdreier@...co.com>, benh@...nel.crashing.org,
Arjan van de Ven <arjan@...radead.org>,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
tpiepho@...escale.com, linuxppc-dev@...abs.org,
scottwood@...escale.com, torvalds@...ux-foundation.org,
David Miller <davem@...emloft.net>, alan@...rguk.ukuu.org.uk
Subject: Re: MMIO and gcc re-ordering issue
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
> >>>>> "Roland" == Roland Dreier <rdreier@...co.com> writes:
>
> >> This is a different issue. We deal with it on powerpc by having
> >> writel set a per-cpu flag and spin_unlock() test it, and do the
> >> barrier if needed there.
>
> Roland> Cool... I assume you do this for mutex_unlock() etc?
>
> Roland> Is there any reason why ia64 can't do this too so we can kill
> Roland> mmiowb and save everyone a lot of hassle? (mips, sh and frv
> Roland> have non-empty mmiowb() definitions too but I'd guess that
> Roland> these are all bugs based on misunderstandings of the mmiowb()
> Roland> semantics...)
>
> Hi Roland,
>
> Thats not going to solve the problem on Altix. On Altix the issue is
> that there can be multiple paths through the NUMA fabric from cpuX to
> PCI bridge Y.
>
> Consider this uber-cool<tm> ascii art - NR is my abbrevation for NUMA
> router:
>
> ------- -------
> |cpu X| |cpu Y|
> ------- -------
> | \____ ____/ |
> | \/ |
> | ____/\____ |
> | / \ |
> ----- ------
> |NR 1| |NR 2|
> ------ ------
> \ /
> \ /
> -------
> | PCI |
> -------
>
> The problem is that your two writel's, despite being both issued on
> cpu X, due to the spin lock, in your example, can end up with the
> first one going through NR 1 and the second one going through NR 2. If
> there's contention on NR 1, the write going via NR 2 may hit the PCI
> bridge prior to the one going via NR 1.
>
> Of course, the bigger the system, the worse the problem....
>
> The only way to guarantee ordering in the above setup, is to either
> make writel() fully ordered or adding the mmiowb()'s inbetween the two
> writel's. On Altix you have to go and read from the PCI brige to
> ensure all writes to it have been flushed, which is also what mmiowb()
> is doing. If writel() was to guarantee this ordering, it would make
> every writel() call extremely expensive :-(
So if a read from the bridge achieves the same effect, can't we just put
one after the writes within the spinlock (an unrelaxed one). That way
this whole sequence will look like a well understood PCI posting flush
rather than have to muck around with little understood (at least by most
driver writers) io barriers?
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists