linux-kernel - Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4758D584.90105@keyaccess.nl>
Date:	Fri, 07 Dec 2007 06:09:24 +0100
From:	Rene Herman <rene.herman@...access.nl>
To:	Robert Hancock <hancockr@...w.ca>
CC:	"David P. Reed" <dpreed@...d.com>, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64
 with MCP51 laptops

On 07-12-07 01:23, Robert Hancock wrote:

> David P. Reed wrote:
>> After much, much testing (months, off and on, pursuing hypotheses), 
>> I've discovered that the use of "outb al,0x80" instructions to "delay" 
>> after inb and outb instructions causes solid freezes on my HP dv9000z 
>> laptop, when ACPI is enabled.
>>
>> It takes a fair number of out's to 0x80, but the hard freeze is 
>> reliably reproducible by writing a driver that solely does a loop of 
>> 50 outb's to 0x80 and calling it in a loop 1000 times from user 
>> space.  !!!
>>
>> The serious impact is that the /dev/rtc and /dev/nvram devices are 
>> very unreliable - thus "hwclock" freezes very reliably while looping 
>> waiting for a new second value and calling "cat /dev/nvram" in a loop 
>> freezes the machine if done a few times in a row.
>>
>> This is reproducible, but requires a fair number of outb's to the 0x80 
>> diagnostic port, and seems to require ACPI to be on.
>>
>> io_64.h is the source of these particular instructions, via the 
>> CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h.  
>> (I wonder if the same problem occurs in 32-bit mode).
>>
>> I'm happy to complete and test a patch, but I'm curious what the right 
>> approach ought to be.  I have to say I have no clue as to what ACPI is 
>> doing on this chipset  (nvidia MCP51) that would make port 80 do 
>> this.  A raw random guess is that something is logging POST codes, but 
>> if so, not clear what is problematic in ACPI mode.
>>
>> ANy help/suggestions?
>>
>> Changing the delay instruction sequence from the outb to short jumps 
>> might be the safe thing.  But Linus, et al. may have experience with 
>> that on other architectures like older Pentiums etc.
> 
> The fact that these "pausing" calls are needed in the first place seems 
> rather cheesy. If there's hardware that's unable to respond to IO port 
> writes as fast as possible, then surely there's a better solution than 
> trying to stall the IOs by an arbitrary and hardware-dependent amount of 
> time, like udelay calls, etc. Does any remotely recent hardware even 
> need this?

The idea is that the delay is not in fact hardware dependent. With in the 
the absense of a POST board port 0x80 being sort of guaranteeed to not be 
decoded on PCI but forwarded to and left to die on ISA/LPC one should get 
the effect that the _next_ write will have survived an ISA/LPC bus address 
cycle acknowledgement timeout.

I believe.

And no, I don't believe any remotely recent hardware needs it and have in 
fact wondered about it since actual 386 days, having since that time never 
found a device that wouldn't in fact take back to back I/O even. Even back 
then (ie, legacy only systems, no forwarding from PCI or anything) BIOSes 
provided ISA bus wait-state settings which should be involved in getting 
insanely stupid and old hardware to behave...

Port 0xed has been suggested as an alternate port. Probably not a great 
"fix" but if replacing the out with a simple udelay() isn't that simple 
(during early boot I gather) then it might at least be something for you to 
try. I'd hope that the 0x80 in include/asm/io.h:native_io_delay() would be 
the only one you are running into, so you could change that to 0xed and see 
what catches fire.

If there are no sensible fixes, an 0x80/0xed choice could I assume be hung 
of DMI or something (if that _is_ parsed soon enough).

Rene.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/