lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <201007110133.36973.zbiggy@o2.pl>
Date:	Sun, 11 Jul 2010 01:33:27 +0200
From:	Zbigniew Luszpinski <zbiggy@...pl>
To:	linux-kernel@...r.kernel.org
Subject: [PATCH]: partially fixes APIC interrupts to almost eliminate usb ohci hang on Nvidia MCP78S (nForce7xx, 8200, etc...) chipsets - help needed to fix this fully

Hello,

long history short:

the io_apic2.patch provides two kernel parameters:
nofasteoiapic - replaces fasteoi handler with level one for all fasteoi 
interrupts.

nofasteoiapic=<list of irqs numbers> - replaces fasteoi handler with
level for given irqs. This parameter does not work yet. I made mistake in 
this parameter code I can not find.

why needed:
This patch with nofasteoiapic parameter activated improves ohci stability 
by 80% for middle speed usb devices on Nvidia nForce MCP78S chipset 
(10de:077b, 10de:077d usb ohci controllers). Without the patch any usb 1.1 
device will work for few minutes and hang after random time with timeout - 
usb device is not responding.
It will not work with fast speed devices like usb audio - they will keep 
hanging. Only Linux has hanging ohci. Windows XP does not. So this is 
software incompatibility.

What can be done and I can not do:
-find better solution to have usb ohci stable 100% on all usb devices 
without changing fasteoi to level.
-add autodetection to apply patch only for 10de:077b, 10de:077d interrupt 
handlers. At interrupt setup code Linux does not know which device which 
interrupt has so it is hard/impossible to do autodetection to apply the 
patch only for devices which needs it.
-find bug in nofasteoiapic=<list of irqs numbers> procedure.
-do not use interrupts for ohci - use i/o registers polling

This task is for someone brave and skilled here. I do not feel powerful 
enough to handle these tasks. I barely made this attached patch. If you 
have any suggestion or pieces of code I could test (experimental fixes 
which may help or debug/diagnose aids) please send them to me. Especially 
I would like to test code which will use polling instead of interrupts for 
ohci only.

I reported this bug to Nvidia, they reproduced it and confirmed it's 
existence. Level interrupt handler improves ohci stability.
Unfortunately they also do not know so far how to fix this.
This mailing list is last hope. If nothing can be done we should blacklist 
these mcp78s ohci controllers as broken to avoid people reporting all usb 
devices as broken when actually ohci controller breaks everything.

----

full history:

All Nvidia MCP78S family chipsets (nForce7xx, 8200, 9x00) have probably
silicon bug which causes integrated usb ohci controllers:
10de:077b, 10de:077d to hang on Linux only. WindowsXP SP3 is not affected
- even on clean install with bare windows CD only - without external
drivers. I'm very curious how they have done that only Linux crashes.
Oldest tested kernel 2.6.18 from RHEL5, the newest: 2.6.34.1.

The ohci hang moment depends on usb load - the bigger and more constant
transfer the sooner the hang will happen. Let's divide usb 1.1 devices:
idle - when usb devices are connected but do nothing - rock solid
no crash.
slow - usb keyboard/mouse - never hangs. Usb mouse can hang ohci if
waving/moving mouse like crazy. Normal use no hang.
medium speed - usb adsl modem 1Mbit ISP subscription. Without patch
ohci hangs after few minutes of use. Checking several rss channels
for news hangs ohci. With patch it does not. However opening 63 tabs in 
firefox at once will hang ohci with patch enabled.
Without the patch connecting usb pendrive/hdd will hang ohci on plugin or 
soon after. With patch enabled no hang.
fast - usb fm radio using alsa usb audio as transmission way: 16bit 96kHz 
stereo stream. Always hangs in less than 2 minutes no matter if patch is 
enabled or not. The same goes to IrDA usb dongle 4Mbit Only noapic kernel 
boot parameter makes it stable 90% of time.

I checked acpi tables and they are clean. So no Linux trap.
The bug exist not only on my mainboard but all from different
manufacturers. All these mainboards with this bug has only one in common:
Nvidia MCP78S chipset. So this must be silicon bug in chipset.

After playing with kernel boot parameters I found that noapic or 
acpi=noirq
parameters workarounds the bug in 95%. acpi=noirq just disables APIC 
interrupt controller so does the same as noapic.

To fix this bug on Linux we have to make Linux Windows XP compatible.
I made first step with the patch included. Linux by default uses fasteoi 
interrupt handler. WindowsXP level handler. So Linux when forced by patch 
to use level interrupt handler have ohci stable by 80% of the time.
In noapic mode it is 90% stable.

noapic solution is bad: limits CPU to 1 core only, no 100% stable ohci :(
nofasteoi parameter provided by patch is better: 80% stability, all cpu 
cores active but usb audio hangs and stability of other devices is weak.

My previous mainboard: Nvidia MCP51 chipset based worked excellent.
After replacing it with Nvidia MCP78S chipset based mainboard usb ohci bug 
appeared.

List of hardware used:
previous mainboard: Asus A8N-VM CSM (MCP51 chipset works excellent)
current mainboard: Asrock K10N78FullHD-hSLI rev. 3.0 with current bios 
(broken ohci usb only on Linux everything else excellent).

usb devices used:
pendrive: Kingston 8 GB
usb hdd: Seagate 80GB SATA1 in ICY BOX usb case
usb irda dongle: Stir4200 module/chipset/Linux driver
usb adsl modems: Speedtouch 330 and ZXDSL852 unicorn2 chipset/Linux driver
usb radio: Silabs fm radio usb: radio_usb_si470x linux driver
usb printer: hp deskjet 5940
usb keyboard: genius
usb mouse: Logitech pilot mouse and logitech trackman trackball and pixart 
mouse

my bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=13405
(now I do not think this is acpi problem)

list of attached files:
io_apic2.patch - copy it to /usr/src/linux-2.6.34.1/arch/x86/kernel/apic 
and do patch -p0 < io_apic2.patch
after kernel compilation boot new kernel with nofasteoiapic parameter 
added.
ohcifail.tar.gz - dumps of dmesg, interrupts, /proc and /sys important 
files.

have a nice day,
Zbigniew Luszpinski

View attachment "io_apic2.patch" of type "text/x-patch" (2311 bytes)

Download attachment "ohcifail.tar.gz" of type "application/x-compressed-tar" (35143 bytes)

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4595 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ