linux-kernel - Re: [Bug #14141] order 2 page allocation failures in iwlagn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1255557317.21134.82.camel@rc-desk>
Date:	Wed, 14 Oct 2009 14:55:17 -0700
From:	reinette chatre <reinette.chatre@...el.com>
To:	Frans Pop <elendil@...net.nl>
Cc:	Mel Gorman <mel@....ul.ie>, David Rientjes <rientjes@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
	Karol Lewandowski <karol.k.lewandowski@...il.com>,
	"Abbas, Mohamed" <mohamed.abbas@...el.com>,
	"John W. Linville" <linville@...driver.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Wed, 2009-10-14 at 14:33 -0700, Frans Pop wrote:
> On Wednesday 14 October 2009, reinette chatre wrote:
> > We do queue the GFP_KERNEL allocations when there are only a few buffers
> > remaining in the queue (8 right now) ...
> 
> Are you sure of this? I have zero messages in my logs about allocation 
> failures with GFP_KERNEL, but I do have plenty with "Only 0 free buffers 
> remaining" with GFP_ATOMIC.

That does make sense to me. We do not expect allocations with GFP_KERNEL
to fail. Considering how I understand how things work I am considering
the following scenario:

* start with system low on available memory
* now introduce incoming traffic (causing the RX code to run)
* upon receipt of frame we attempt an allocation (to reclaim the buffer)
with GFP_ATOMIC (state: num RX buffer free > watermark)
  * this fails since memory is not available
  * num RX buffer free reduces
  * does _not_ queue replenishment of buffers with GFP_KERNEL
* repeat above until we hit the watermark (currently 8)
* upon receipt of frame we attempt an allocation (to reclaim the buffer)
with GFP_ATOMIC (state: num RX buffer free <= watermark) 
  * this fails (now user sees big warning)
  * queue replenishment of buffers with GFP_KERNEL

Essentially what I suspect could happen is that
we do attempt to replenish the buffers with GFP_KERNEL after several
failures with GFP_ATOMIC, but at that point we have already run out
completely.

One way to test this theory is to queue the GFP_KERNEL allocation
earlier (when we still have a significant number of RX buffers
available), 8 may turn out to be too small.

> Does that indicate a bug or could they fall under the ratelimit somehow?

In your kernel log I do see that the driver's error messages related to
GFP_ATOMIC are rate limited (we see many more "order-2 allocation
failure" messages than the "Failed to allocate" messages). All of these
allocation failures are from the "replenish_now" code though, which is
GFP_ATOMIC. So even though we do not see the "Failed to allocate" errors
(which are rate limited) it seems that all allocation failures are from
that (the GFP_ATOMIC) code.

Reinette

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/