[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47780B93.4050606@reed.com>
Date: Sun, 30 Dec 2007 16:20:19 -0500
From: "David P. Reed" <dpreed@...d.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Rene Herman <rene.herman@...access.nl>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Ingo Molnar <mingo@...e.hu>, Islam Amer <pharon@...il.com>,
hpa@...or.com, Pavel Machek <pavel@....cz>,
Ingo Molnar <mingo@...hat.com>,
Andi Kleen <andi@...stfloor.org>,
Thomas Gleixner <tglx@...utronix.de>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: provide a DMI based port 0x80 I/O delay override
I am so happy that there will be a way for people who don't build their
own kernels to run Linux on their HP and Compaq laptops that have
problems with gazillions of writes to port 80, and I'm also happy that
some of the strange driver code will be cleaned up over time. Thank you
all. Some thoughts you all might consider, take or leave, in this
process, from an old engineering manager who once had to worry about QA
for software on nearly every personal computer model in the 1980-1992
period:
You know, there is a class of devices that are defined to use port
0x80... it's that historically useful class of devices that show/record
the POST diagnostics. It certainly was not designed for "delay"
purposes. In fact, some of those same silly devices are still used in
industry during manufacturing test. I wonder what would happen if
Windows were not part of manufacturing test, and instead Linux were the
"standard" for some category of machines...
When I was still working at Lotus in the late '80's, when we still
supported machines like 286's, there were lots of problems with timing
loops in drivers in applications (even Win 3.0 had some in hard disk
drivers, as did some of our printer drivers, ...), as clock speeds
continued to ramp. There were major news stories of machines that
"crashed when xyz application or zyx peripheral were added". It was
Intel, as I recall, that started "publicly" berating companies in the PC
industry for using the "two short jumps" solutions, and suggesting that
they measure the processor speed at bootup, using the BIOS standard for
doing that with the int 15 BIOS elapsed time calls, and always use
"calibrated" timing loops. Which all of us who supported device
drivers started to do (remember, apps had device drivers in those days
for many devices that talked directly with the registers).
I was impressed when I dug into Linux eventually, that this operating
system "got it right" by measuring the timing during boot and creating a
udelay function that really worked!
So I have to say, that when I was tracing down the problem that
originally kicked off this thread, which was that just accessing the RTC
using the standard CMOS_READ macros in a loop caused a hang, that these
"outb al,80h" things were there. And I noticed your skeptical comment
in the code, Linus. Knowing that there was never in any of the
documented RTC chipsets a need for a pause between accesses (going back
to my days at Software Arts working on just about every old machine
there was...) I changed it on a lark to do no pause at all. And my
machine never hung...
Now what's interesting is that the outb to port 80 is *faster* than an
outb to an unused port, on my machine. So there's something there -
actually accepting the bus transaction. In the ancient 5150 PC, 80 was
unused because it was the DMA controller port that drove memory refresh,
and had no meaning.
Now my current hypothesis (not having access to quanta's design specs
for a board they designed and have shipped in quantity, or having taken
the laptop apart recently) is that there is logic there on port 80,
doing something. Perhaps even "POST diagnostic recording" as every PC
since the XT has supported... perhaps supporting post-crash
dignostics... And that that something has a buffer, perhaps even in
the "Embedded Controller" that may need emptying periodically. It
takes several tens of thousands of "outb" to port 80 to hang the
hardware solid - so something is either rare or overflowing. In any
case, if this hypothesis is correct - the hardware may have an erratum,
but the hardware is doing a very desirable thing - standardizing on an
error mechanism that was already in the "standard" as an option... It's
Linux that is using a "standard" in a wrong way (a diagnostic port as a
delay).
So I say all this, mainly to point out that Linux has done timing loops
right (udelay and ndelay) - except one place where there was some
skepticism expressed, right there in the code. Linus may have some
idea why it was thought important to do an essential delay with a bus
transaction that had uncertain timing. My hypothesis is that
"community" projects have the danger of "magical theories" and
"coolness" overriding careful engineering design practices.
Cleaning up that "clever hack" that seemed so good at the time is hugely
difficult, especially when the driver writer didn't write down why he
used it.
Thus I would suggest that the _p functions be deprecated, and if there
needs to be a timing-delay after in/out instructions, define
in_pause(port, nsec_delay) with an explicit delay. And if the delay is
dependent on bus speeds, define a bus-speed ratio calibration.
Thus in future driver writing, people will be forced to think clearly
about what the timing characteristics of their device on its bus must
be. That presupposes that driver writers understand the timing
issues. If they do not, they should not be writing drivers.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists