lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGS+omA84dm8QxJS0vdrYkvD0YXshRh4EkQfqVpbwvNVH+HxQg@mail.gmail.com>
Date:	Fri, 17 May 2013 17:54:33 +0800
From:	Daniel Kurtz <djkurtz@...omium.org>
To:	Jean Delvare <khali@...ux-fr.org>
Cc:	Robert Norris <robn@...ra.com>, linux-kernel@...r.kernel.org,
	Linux I2C <linux-i2c@...r.kernel.org>
Subject: Re: PROBLEM: modprobe hang at startup (3.8.x, 3.9.x, IBM x3550)

On Fri, May 17, 2013 at 4:36 PM, Jean Delvare <khali@...ux-fr.org> wrote:
> Hi Robert,
>
> On Thu, 16 May 2013 13:44:55 +1000, Robert Norris wrote:
>> On Wed, May 15, 2013 at 09:49:23PM +0200, Jean Delvare wrote:
>> > >     Interrupt: pin B routed to IRQ 0
>> >
>> > Hmm, this "IRQ 0" is quite odd. I'm wondering if this could be the
>> > reason for this hang. Was it with the i2c-i801 driver loaded, or
>> > blacklisted? Please check if it makes a difference.
>>
>> That was without the driver loaded (blacklisted). After loading (with
>> interrupts enabled) we get:
>>
>>     Interrupt: pin B routed to IRQ 20
>
> For the record, I also see the IRQ value change after loading the
> i2c-i801 driver on my system (with an ICH10 south bridge.) From 14 to
> 22 in my case. So it's a bit different (no IRQ 0) but not still
> somewhat similar, so I'm still not sure if this has anything to do with
> your issue.
>
>>
>> > Do you see the same (and more generally, this issue) on one, some or
>> > all of your x3550 servers?
>>
>> The issue has occured on at least three x3550s (we have 11). I haven't
>> tested more, because knowingly crashing production machines sucks.
>
> Yes of course, I understand, I did not expect you to do that ;)
>
>> This appears to be the case on other machines. With the module
>> blacklisted (never loaded), lspci shows IRQ 0. After load, IRQ 20.
>> (tested on 3.4 and 3.9).
>
> OK.
>
>> > Are you using IPMI on these machines?
>>
>> Yes, but only for monitoring/sensors, if that makes a difference.
>
> IPMI is still likely to access the SMBus controller. If there's a BMC
> in the machine, it can also access the SMBus slave with its own
> controller. It would be good to rule this out by disabling IPMI
> completely, removing the BMC from the machine if it has one, and
> checking if it makes the issue go away or not.
>
>> > I would appreciate if you could test the following:
>> > * Blacklist i2c-i801 and ics932s401 so that none of them get
>> >   auto-loaded.
>>
>> Done.
>>
>> > * Manually load i2c-i801 with interrupts enabled, and see what
>> >   happens.
>>
>> Returned immediately:
>>
>> [   60.527140] i801_smbus 0000:00:1f.3: SMBus using PCI Interrupt
>
> This confirms that the i2c-i801 driver loading itself isn't the problem.
>
>> > * If no hang happens, load i2c-dev, find the i801 bus number with
>> >   i2cdetect -l (from the i2c-tools package - it should be 4 according
>> >   to what you reported so far but there is no guarantee that it won't
>> >   change across reboots.)
>>
>> $ i2cdetect -l
>> i2c-0   i2c         Radeon i2c bit bus DVI_DDC          I2C adapter
>> i2c-1   i2c         Radeon i2c bit bus VGA_DDC          I2C adapter
>> i2c-2   i2c         Radeon i2c bit bus MONID            I2C adapter
>> i2c-3   i2c         Radeon i2c bit bus CRT2_DDC         I2C adapter
>> i2c-4   smbus       SMBus I801 adapter at 0440          SMBus adapter
>>
>> > Then do a simple read from a random address
>> >   with:
>> >   # i2cget 4 0x50 0x00
>> >   (Adjust the bus number as needed.)
>> >   I am curious if this will hang as well or only when accessing the
>> >   clock chip at address 0x69.
>>
>> Yep, that one hangs. The hung task handler picked it up after a few
>> minutes.
>
> OK, this means that any transaction request to the SMBus controller
> causes the hang.
>
> The i2c-i801 driver is optimistically using wait_event() when waiting
> for an interrupt to arrive. I suppose that the interrupt is never
> delivered in your case (all 0 in /proc/interrupts.)
>
> Daniel, shouldn't we use wait_event_timeout() instead to catch issues
> like this and fail cleanly? Maybe even fallback to polling
> automatically?

We could try to do something like that, I guess.  The only question is
how long to wait, b/c SMBus can pretty slow.
But that kind of hack sounds more like something you'd do if irqs were
getting sporadically lost on an otherwise correctly configured system.

In this case, it sounds like there are never interrupts, but we are
expecting some due to an incorrectly assuming that irqs are supported.
 What is different about his configuration where there would be no
IRQs?

Was Robert able to get the system working without hangs by disabling
the IRQ feature of i2c-i801 module when it was builtin?

>
> --
> Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ