lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 13 Apr 2009 13:11:46 -0700
From:	Reeve Yang <reeve.yang@...il.com>
To:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"bugzilla-daemon@...zilla.kernel.org" 
	<bugzilla-daemon@...zilla.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
	"Allan, Bruce W" <bruce.w.allan@...el.com>,
	"Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>,
	"Ronciak, John" <john.ronciak@...el.com>
Subject: Re: [Bugme-new] [Bug 13084] New: page allocation failure. order:0, 
	mode:0x20

Here is the memory snapshot when problem happening:

MemTotal:      8307844 kB
MemFree:       6091208 kB
Buffers:          6524 kB
Cached:        1121528 kB
SwapCached:          0 kB
Active:        1361052 kB
Inactive:        25784 kB
HighTotal:     7470464 kB
HighFree:      6083688 kB
LowTotal:       837380 kB
LowFree:          7520 kB
SwapTotal:     2047992 kB
SwapFree:      2047992 kB
Dirty:          744488 kB
Writeback:           0 kB
Mapped:         285532 kB
Slab:           797500 kB
CommitLimit:   6201912 kB
Committed_AS:   459788 kB
PageTables:       3532 kB
VmallocTotal:   118776 kB
VmallocUsed:      2432 kB
VmallocChunk:   116084 kB

You can see I have lots of physical RAM available. The LowFree
reduction rate is about 10M/Second.

On Mon, Apr 13, 2009 at 1:06 PM, Brandeburg, Jesse
<jesse.brandeburg@...el.com> wrote:
> On Mon, 13 Apr 2009, Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Mon, 13 Apr 2009 19:27:27 GMT
>> bugzilla-daemon@...zilla.kernel.org wrote:
>>
>> > http://bugzilla.kernel.org/show_bug.cgi?id=13084
>> >
>> >            Summary: page allocation failure. order:0, mode:0x20
>> >            Product: Memory Management
>> >            Version: 2.5
>> >     Kernel Version: 2.6.17.4
>> >           Platform: All
>> >         OS/Version: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: high
>> >           Priority: P1
>> >          Component: Page Allocator
>> >         AssignedTo: akpm@...ux-foundation.org
>> >         ReportedBy: reeve.yang@...il.com
>> >         Regression: No
>> >
>> >
>> > Created an attachment (id=20964)
>> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=20964)
>> > kernel config file.
>> >
>> > The system is Intel Xeon Quad core with 8G physical RAM. When it's under UPD
>> > loads, e.g., DNS queries, the box is stuck in terms it cannot be pinged or
>> > login. By checking syslog, I'm seeing following trace back from various
>> > dameon/processes. The network controller is E1000 82571 with NAPI enabled in
>> > kernel.
>> >
>> > page allocation failure. order:0, mode:0x20
>>
>> This is very common.  e1000 attempts to do large memory allocations
>> from within interrupt context and the page allocator cannot satisfy the
>> allocation and is not allowed to do the necessary work to make the
>> allocation attempt succeed.  It's the same with all net drivers, but
>> e1000 is especially prone, apparently because of hardware suckiness.
>
> while in jumbo mode, andrew's statement is true, but with order:0
> allocation failures it is just normal networking goo that causes the
> memory allocator to run out of free pages, seems much less frequent in
> newer kernels.
>
>> However the networking stack should just drop the packet and the system
>> will recover.
>
> I think at that point the kernel gets quite busy printing warnings about
> how much it is out of memory.
>
>> You report is unclear.  Yes, the machine wedges up under the UDP load.
>> But does it recover when the other machine stops spraying UDP packets
>> at this machine?  It _should_ recover.  If it does not, we have a bug
>> somewhere.
>
> In this case kmem_cache_alloc is failing to get memory, being called by
> the route_dst code, maybe someone on netdev can comment if this has been
> fixed along the way.
>
>> The usual workaround for these problems is to increase the value in
>> /proc/sys/vm/min_free_kbytes.
>
> this should help a lot in my experience.
>
>> 2.6.17 is fairly old.  If we need to do additional work on this report
>> then we'll be asking you to test something more recent - ideally
>> 2.6.29.
>
> If you must run 2.6.17, then you might want to try the e1000e driver (*not
> e1000*) from sourceforge for your 82571.
>
> Otherwise I also will be asking you to soon try a newer kernel.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists