linux-kernel - Re: SATA resume slowness, e1000 MSI warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45EEEC2C.5090609@intel.com>
Date:	Wed, 07 Mar 2007 08:45:32 -0800
From:	"Kok, Auke" <auke-jan.h.kok@...el.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	Ingo Molnar <mingo@...e.hu>, Jeff Garzik <jeff@...zik.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Michael S. Tsirkin" <mst@...lanox.co.il>,
	Pavel Machek <pavel@....cz>,
	Jens Axboe <jens.axboe@...cle.com>,
	Adrian Bunk <bunk@...sta.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-pm@...ts.osdl.org,
	Michal Piotrowski <michal.k.k.piotrowski@...il.com>
Subject: Re: SATA resume slowness, e1000 MSI warning

Kok, Auke wrote:
> Eric W. Biederman wrote:
>> "Kok, Auke" <auke-jan.h.kok@...el.com> writes:
>>
>>> Ingo Molnar wrote:
>>>> * Kok, Auke <auke-jan.h.kok@...el.com> wrote:
>>>>
>>>>>>> BUG: at drivers/pci/msi.c:611 pci_enable_msi()
>>>>>> I would poke Eric Biederman(sp?) about this one.  Maybe its even solved by
>>>>>> the MSI-enable-related patch he posted in the past 24-48 hours.
>>>>> I tried the 3-patch series "[PATCH 0/3] Basic msi bug fixes.." and they fix
>>>>> this problem for me. Were you expecting the OOPS in the first place? [...]
>>>> the bug was the warning message (a WARN_ON()) above - not an oops. So that
>>>> warning message is gone in your testing?
>>> yes.
>> Sorry for the slow delay.  I was out of town for my brothers wedding the last few
>> days.
>>
>> I wasn't exactly expecting the WARN_ON to trigger.  What I fixed was
>> an inconsistency in handling our state bits.  Fixing that
>> inconsistency appears to have fixed the e1000 usage scenario mostly by
>> accident.
>>
>> The basic issue is that pci_save_state saves the current msi state
>> along with other registers, and then the e1000 driver goes and
>> disables the msi irq after we have saved the irq state as on.
>>
>> My code notices that the msi irq was disabled before restore time, so
>> it skips the restore.  However we now have a leak of the msi saved cap
>> because we are not freeing it. 
>>
>> This leaves with some basic questions.
>> - Does it make sense for suspend/resume methods to request/free irqs?
>> - Does it make sense for suspend/resume methods to allocate/free msi irqs?
>> - Do we want pci_save/restore_cap to save/restore msi state?
>>
>> The path of least resistance is to just free the extra state and we
>> are good.  I'm just not quite certain that is sane and it has been a
>> long day.
> 
> we used to have a lengthy e1000_pci_save|restore_state in our code, which is now 
> gone, so I'm all for that. A separate pci_save_pxie|msi(x)_state for every 
> driver seems completely unnecessary. I can't think of a use case where 
> saving+restoring everything hurts. That's what you want I presume.
> 
> We currently free all irq's and msi before going into suspend in e1000, and I 
> think that is probably a good thing, somehow I can think of bad things happening 
> if we dont, but I admit that I haven't tried it without alloc/free. We do this 
> in e100 as well and it works.
> 
> Another motivation would be to leave this up to the driver: if the driver 
> chooses to free/alloc interrupts because it makes sense, you probably would want 
> to keep that choice available. Devices that don't need this can skip the 
> alloc/free, but leave the choice open for others.

ah, looking at the code in e1000 we do:

_suspend:
	pci_save_state();
	free_irq()

_resume:
	pci_restore_state();
	alloc_irq();

I suppose that's not good either, and the major cause of the warning in the 
first place.

Maybe I can rollback your latest patches and try to fix that mess by postponing 
the pci_save_state until after we free'd the irq's.

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/