linux-kernel - Re: GFP_ATOMIC page allocation failures.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47F3D285.7030404@intel.com>
Date:	Wed, 02 Apr 2008 11:37:57 -0700
From:	"Kok, Auke" <auke-jan.h.kok@...el.com>
To:	Jeff Garzik <jeff@...zik.org>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Chris Snook <csnook@...hat.com>,
	Dave Jones <davej@...emonkey.org.uk>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	NetDev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: GFP_ATOMIC page allocation failures.

Jeff Garzik wrote:
> Andrew Morton wrote:
>> After you've read Nick's comments (which I pray you have not), and after
>> you've convinced us and yourself of their wrongness, you might like to
>> consider adding a __GFP_NOWARN to netdev_alloc_skb().
> 
> Already done so.   Adding __GFP_NOWARN to netdev_alloc_skb() is wrong
> for several reasons.
> 
> It doesn't change the underlying conditions.
> It doesn't fix the desire to stamp other drivers in this manner.
> 
> And most importantly, it is not even correct:  the handling of the
> allocation failure remains delegated to the netdev_alloc_skb() users,
> which may or may not be properly handling allocation failures.
> 
> Put simply, you don't know if the caller is stupid or smart.  And there
> are a _lot_ of callers, do you really want to flag all of them?
> 
> Many modern net drivers are smart, and quite gracefully handle
> allocation failure without skipping a beat.
> 
> But some are really dumb, and leave big holes in their DMA rings when
> allocations fail.
> 
> The warnings are valid _sometimes_, but not for others.  So adding
> __GFP_NOWARN to netdev_alloc_skb() unconditionally makes no sense,
> except as an admission that the "spew when there is memory pressure"
> idea was silly.
> 
> 
> 
> Turning to Nick's comment,
> 
>> It's still actually nice to know how often it is happening even for
>> these known good sites because too much can indicate a problem and
>> that you could actually bring performance up by tuning some things.
> 
> then create a counter or acculuation buffer somewhere.
> 
> We don't need spew every time there is memory pressure of this magnitude.
> 
> IMO there are much better ways than printk(), to inform tasks, and
> humans, of allocation failures.


FYI e1000 and family already count various levels of alloc failures resulting from
this:

  alloc_rx_buff_failed - page alloc failure (might be harmless)
  rx_no_buffer_count - no buffer available for HW to use (harmless, hw will retry)
  rx_missed_errors - hw dropped a packet because of above failures

still I personally think the page alloc warnings are a good thing and we've had
several issues resolve quickly because of them.

shutting them up completely moves the focus to our driver which ends up being a
victim of suspicion, and we have to circle around hard to convince the user otherwise.

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/