linux-kernel - Re: sky2 panic in 2.6.32.1 under load (new oops)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <4B3B92B8.7010208@majjas.com>
Date:	Wed, 30 Dec 2009 12:49:44 -0500
From:	Michael Breuer <mbreuer@...jas.com>
To:	Stephen Hemminger <shemminger@...tta.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"Berck E. Nash" <flyboy@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	netdev@...r.kernel.org
Subject: Re: sky2 panic in 2.6.32.1 under load (new oops)

On 12/30/2009 2:58 AM, Stephen Hemminger wrote:
> On Wed, 30 Dec 2009 02:23:20 -0500
> Michael Breuer<mbreuer@...jas.com>  wrote:
>
>    
>> Ok - I called dump_txring from sky2_net_intr:
>> --- a/drivers/net/sky2.c
>> +++ b/drivers/net/sky2.c
>> @@ -2725,8 +2791,10 @@ static void sky2_watchdog(unsigned long arg)
>>    /* Hardware/software error handling */
>>    static void sky2_err_intr(struct sky2_hw *hw, u32 status)
>>    {
>> -       if (net_ratelimit())
>> +       if (net_ratelimit()) {
>>                   dev_warn(&hw->pdev->dev, "error interrupt
>> status=%#x\n", status);
>> +               dump_txring(hw, 0);
>> +       }
>>
>>           if (status&  Y2_IS_HW_ERR)
>>                   sky2_hw_intr(hw);
>>
>> And got this:
>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
>> status=0x40000008
>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
>> status=0x40000008
>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=28...30 report=29 done=29
>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=28...30 report=29 done=29
>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt status=0x8
>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt status=0x8
>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=30...32 report=30 done=31
>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=30...32 report=30 done=31
>>
>>      
> I notice that you have NOUVEAU Nvidia drivers loaded? The one difference in HW
> between your board and mine is that I have ATI video card.
>
>    
Seems the problem is linked to auditd and X11 (but not nouveau).

Today, I ran a bunch of scenarios. I first determined that the problem 
only manifest in runlevel 5. Next, this occurred with or without KMS and 
with or without nouveau. This happened whether or not I was logged in 
(local or remote), and regardless of window manager (xdm, gdm, kdm). I 
then checked to see what else was different between runlevel 3 and 5 - 
only thing was auditd. I disabled auditd and reran - no errors.

Now for the odd stuff:

The errors only manifest if the high throughput data transfer is 
initiated when the system is in runlevel 5 and auditd was started by 
init when transitioning from runlevel 3 to 5. For example, the following 
scenarios do not cause the errors to manifest:

runlevel3; start auditd runlevel 5; start transfer
runlevel3; chkconfig auditd off; runlevel5; start auditd; start transfer
runlevel3; start transfer (note: errors do not occur if I transition to 
runlevel 5 after the high bandwidth transfer has started)
runlevel3; startx; start transfer

The only way I get the problem to manifest is transition to runlevel 5 
with chkconfig auditd on (level 5 only) and then initate the windows 
backup.

I'm guessing that there is some sort of race condition happening between 
X (xdm/gdm/kdm/greeter?) and auditd that is somehow corrupting 
something. I'd hazard a more or less obvious guess that whatever's being 
corrupted differs when there is already a high throughput transfer under 
way.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/