linux-kernel - Re: 2.6.{30,31} x86_64 ahci problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4AE5B9E7.3070500@sbg.ac.at>
Date:	Mon, 26 Oct 2009 16:01:59 +0100
From:	Alexander Huemer <alexander.huemer@....ac.at>
To:	Jean Delvare <jdelvare@...e.de>
CC:	Tejun Heo <tj@...nel.org>, Frans Pop <elendil@...net.nl>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	Jeff Garzik <jgarzik@...ox.com>, alexander.huemer@....ac.at
Subject: Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared

Jean Delvare wrote:
> Le mercredi 21 octobre 2009, Alexander Huemer a écrit :
>   
>> Jean Delvare wrote:
>>     
>>> OK, here I am, sorry for the delay. I've read the discussion thread.
>>> Here are the few data points I can offer, in the hope it will help:
>>>
>>> * While the i2c-i801 driver received some changes in kernel 2.6.30,
>>>   none of these are related to PCI nor interrupts. So as the problem
>>>   is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to
>>>   cause it. This may, however, be a combination of something i2c-i801
>>>   does and something the pci subsystem does since kernel 2.6.30. For
>>>   this reason, I would still recommend a bisection if the problem can
>>>   be reliably reproduced. I know it takes time, but it is always
>>>   easier to fix a bug when we know which commit introduced it.
>>>
>>> * The i2c-i801 driver does _not_ make use of interrupts. It is
>>>   poll-based (I am not exactly proud of that, but that's the way it
>>>   is.)
>>>
>>>   #define ENABLE_INT9		0	/* set to 0x01 to enable - untested */
>>>
>>>   So I am very surprised to read that this driver would cause an IRQ
>>>   storm.
>>>
>>> * One thing the i2c-i801 driver does on the PCI device is:
>>>
>>>   err = pci_enable_device(dev);
>>>
>>>   I presume this is what causes the following message in dmesg:
>>>
>>>   i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
>>>
>>>   Basically, even though the driver doesn't make use of interrupts,
>>>   the IRQ is still registered because this is how the hardware is
>>>   setup.
>>>
>>> As a conclusion, I suspect that 2 things may be happening: either
>>> the SMBus is triggering interrupts when told not to. The ICH6 is a
>>> bit different from all the other supported chips, I'll double check
>>>       
>
> My bad, it's an 63xxESB-based board, not ICH6. I must have been
> mixing data from a different bug.
>
>   
>>> if we may have missed something. Or, something else is triggering
>>> SMBus transactions. SMI and ACPI come to mind. If this is the case
>>> then you do not want to use i2c-i801 on this motherboard.
>>>
>>> Questions to Alexander :
>>>
>>> * Can I please see the output of "sensors" on your system?
>>> * What are the brand and model of your motherboard?
>>> * Can we get an acpidump for your system?
>>>
>>>   
>>>       
>> many thanks for your response. i appreciate that.
>> first, the data you requested:
>>
>>     sensors:        http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt
>>     acpidump:       http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt
>>     
>
> The good news is that I can't see any access to the SMBus in the
> ACPI tables. Nothing can be said about the SMIs though, without an
> intimate knowledge of the BIOS.
>
>   
>>     motherboard:    tyan tempest i5400pw/s5397 with one intel xeon e5420.
>>
>> the output of sensors was made _without_ i801_smbus in the kernel.
>>     
>
> Then please once again with it. My whole point was to know whether
> there was any hardware monitoring chip connected to the SMBus. Your
> initial kernel configuration suggests that you have a W83793G chip
> there.
>
>   
>> i noticed that the data of w83627hf-isa-0290 is quite weird. i do not
>> have an explanation for that.
>>     
>
> I do. This happens when the manufacturer decides that the hardware
> monitoring features of the Super-I/O are insufficient for their
> needs. They add a dedicated chip for the hardware monitoring. This
> is particularly frequent on server boards from Tyan and SuperMicro.
> Ideally they would _also_ disable the feature on the Super-I/O side,
> but often then do not, so the driver still loads, but outputs
> garbage.
>
> You can see the following messages in your log:
> [    3.878703] w83627hf w83627hf.656: Enabling temp2, readings might not make sense
> [    3.881708] w83627hf w83627hf.656: Enabling temp3, readings might not make sense
> This is a good hint that this is the case (if the nonsensical data
> displayed by "sensors" wasn't enough to convince you.)
>
> So you should stop loading/including kernel module w83627hf.
>
>   
>> if a bisection is what will bring light into this, i am willing to take
>> the time.
>> so that would be a bisection between 2.6.29 and 2.6.30 ?
>> a quicker test case would be good for that, but i don't have one yet,
>> just the compilation of gcc, which takes time, even on this machine with
>> tmpfs and ccache.
>>     
>
>   
here is the output you requested:
http://xx.vu/~ahuemer/sensors_ahuemer_with_i801_20091026.txt
i am currently in the middle of a bisection between 2.6.29 and 2.6.30, 8
steps left.
many thanks for the info on hardware monitoring.
i'll report back when bisection is finished.

regards
-alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/