linux-kernel - Re: [BUG 2.6.30+] e100 sometimes causes oops during resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4AB29F4A.3030102@intel.com>
Date:	Thu, 17 Sep 2009 13:42:50 -0700
From:	"Graham, David" <david.graham@...el.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
CC:	Karol Lewandowski <karol.k.lewandowski@...il.com>,
	"e1000-devel@...ts.sourceforge.net" 
	<e1000-devel@...ts.sourceforge.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [BUG 2.6.30+] e100 sometimes causes oops during resume

Rafael J. Wysocki wrote:
> On Tuesday 15 September 2009, Karol Lewandowski wrote:
>> Hello,
>>
>> I'm getting following oops sometimes during resume on my Thinkpad T21
>> (where "sometimes" means about 10/1 good/bad ratio):
>>
>> ifconfig: page allocation failure. order:5, mode:0x8020
> 
> Well, this only tells you that an attempt to make order 5 allocation failed,
> which is not unusual at all.
> 
> Allocations of this order are quite likely to fail if memory is fragmented,
> the probability of which rises with the number of suspend-resume cycles already
> carried out.
> 
> I guess the driver releases its DMA buffer during suspend and attempts to
> allocate it back on resume, which is not really smart (if that really is the
> case).
> 
Yes, we free a 70KB block (0x80 by 0x230 bytes) on suspend and 
reallocate on resume, and so that's an Order 5 request. It looks 
symmetric, and hasn't changed for years. I don't think we are leaking 
memory, which points back to that the memory is too fragmented to 
satisfy the request.

I also concur that Rafael's commit 6905b1f1 shouldn't change the logic 
in the driver for systems with e100 (like yours Karol) that could 
already sleep, and I don't see anything else in the driver that looks to 
be relevant. I'm expecting that your test result without commit 6905b1f1 
will still show the problem.

So I wonder if this new issue may be triggered by some other change in 
the memory subsystem ?

Karol, how much physical RAM do you have in this system ? I'd expect 
that the fragmentation would be less of an issue if there's simply more 
memory in total.

Unfortunately I still have no actual repro in house.

I can try to rework the codepaths around suspend & resume so that we 
don't free & reallocate this order 5 memory, but I think its risky. I'm 
looking into that now.

Thanks

> Thanks,
> Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/