linux-kernel - Re: [RFC] MMIO accessors & barriers documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200609111506.01389.jbarnes@virtuousgeek.org>
Date:	Mon, 11 Sep 2006 15:05:59 -0700
From:	Jesse Barnes <jbarnes@...tuousgeek.org>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:	Linux Kernel list <linux-kernel@...r.kernel.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"David S. Miller" <davem@...emloft.net>,
	Jeff Garzik <jgarzik@...ox.com>,
	Paul Mackerras <paulus@...ba.org>,
	Linus Torvalds <torvalds@...l.org>,
	Andrew Morton <akpm@...l.org>,
	Segher Boessenkool <segher@...nel.crashing.org>
Subject: Re: [RFC] MMIO accessors & barriers documentation

On Monday, September 11, 2006 2:45 pm, Benjamin Herrenschmidt wrote:
> > These sound fine.  I think PPC64 is the only platform that will need
> > them?
>
> Ah ? What about the comment in e1000 saying that it needs a wmb()
> between descriptor updates in memory and the mmio to kick them ? That
> would typically be a memory_to_io_wb(). Or are your MMIOs ordered cs.
> your cacheable stores ?

I think that's a separate issue?  As Jeff points out, those macros are 
intended to provide memory vs. I/O ordering, but isn't PPC the only platform 
that will reorder accesses so aggressively and independently?  I don't think 
ia64 for example will reorder them separately, so a regular memory barrier 
*should* be enough to ensure ordering in both domains.

> They are, but I was thinking about providing more IO-like examples. I
> suppose I could refer to memory-barriers.txt from here and update it
> with IO-like examples.

Yeah, either way.  Not sure if adding more I/O examples to the existing doc is 
better or worse than an I/O specific document.

> > But isn't this how you'll implement io_to_lock_wb() on PPC anyway?  If
> > so, might be best to name it and document it that way (though keeping the
> > idea of barriering before unlocking prominent in the documentation).
>
> Well, the whole question is what does the linux semantics guarantee to
> driver writers (accross archs), not what PowerPC implements :) I'd
> rather not add guarantees that aren't useful to drivers even if all
> current implementations happen to provide them. I'm trying to find a
> case where ordering MMIO W + memory W is useful and I can't see any
> since the MMIO W will take any time to go to the device anyway. The lock
> rule seems to be the only useful, thus the only I think I'll guarantee.

Sure, that's fair.  If any potential application of the more precise semantics 
is just theoretical, we may as well limit our guarantees to locks only.

> Well, as far as I'm concerned, the whole point is rule #2 and #3 :)
> Those are the ones biting us on PowerPC (we haven't seen the lock
> problem but then it can't happen the way our current accessors are
> written. However, if we change our accessors to provide rule #2 more
> specifically, we'll end up with 2 sync instructions in writel, one for
> rule #2 before the store and one for rule #4, thus we go from expensive
> to very expensive). It's also my understanding that mmiowb is very
> expensive on ia64 and gets worse as the box grows bigger.

Yeah, that's true (I see your point about being more worried about other 
things on PPC as well ;).

> Hence the question: do we provide -fully- ordered accessors in class 1,
> or do we provide -mostly- ordered accessors, ordered in all means except
> rule #4 vs locks. ia64 is afaik by far the platform taking the biggest
> hit if you have to provide #4, so I'm interesting in your point of view
> here.

Either way is fine with me as long as we have a way to get at the fast and 
loose stuff (and required barriers of course) in a portable way.  And that we 
don't regress the existing users of mmiowb().

> We don't need counters, just a flag. We did a test implementation, seems
> to work. We also clear the flag in spin_lock. That means that MMIOs
> issued before a lock aren't ordered vs. the locked section. But because
> of rule #1, they should be ordered vs. other MMIOs inside the locked
> section and thus implicitely get ordered anyway.

Oh right, a flag would be enough.  Is it good enough for -mm yet?  Might be 
fun to run on an Altix machine with a bunch of supported devices (not that I 
work with them anymore...).

> > For ia64 in particular it doesn't matter, though there was speculation
> > several years that it might be necessary.  No actual examples stepped
> > forward though, so the current implementation doesn't take an argument.
>
> Ok. My question is wether it would improve the implementation to take
> it. If we define a new macro with a new name, we can do it....

Right, but unless there's a real need at this point, we probably shouldn't 
bother.  Let the poor sucker with the future machine needing the device 
argument do the work. :)

Thanks,
Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/