linux-kernel - Re: Page alloc failures under network/disk IO load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081204135443.7c24234a@rockhopper.limebrokerage.com>
Date:	Thu, 4 Dec 2008 13:54:43 -0500
From:	Dan Noé <dpn@...merica.net>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Page alloc failures under network/disk IO load

On Thu, 04 Dec 2008 09:42:08 +0100
Peter Zijlstra <peterz@...radead.org> wrote:

> On Thu, 2008-12-04 at 09:23 +0100, Peter Zijlstra wrote:
> > On Wed, 2008-12-03 at 22:27 -0500, Dan Noé wrote:
> > > This is on Linux 2.6.28-rc7, on a Core 2 Duo.  The system has
> > > plenty of memory:
> > > 
> > >              total       used       free     shared    buffers
> > > cached
> > > Mem:          1893       1822         70          0          0
> > 
> > filled to the brim with data
> > 
> > > 1573
> > > -/+ buffers/cache:        249       1644
> > > Swap:         1906         37       1868
> > > 
> > > I am using rsync to transfer data onto this system.  The
> > > filesystem is XFS, and the target drive is a 1TB Western Digital
> > > on ata_piix.  The system files are on a RAID 1 (Linux md, also on
> > > ata_piix).
> > > 
> > > Periodically I get page allocation failures, from
> > > __netdev_alloc_skb. I suppose this causes the driver to drop
> > > packets and thus hurts performance.
> > 
> > There isn't much we can do about that, memory is filled and your
> > network card tries to allocate memory in a mode that doesn't allow
> > freeing some.
> > 
> > Looking at the timestamps its not very frequent, so it doesn't hurt
> > performance much if anything. If you're really bothered with this,
> > you could quiet it by sticking in a __GFP_NOWARN in
> > __netdev_alloc_skb() or something..
> 
> Another thing you can do is increase /proc/sys/vm/min_free_kbytes

I'm a bit confused because on another system (2.6.26.3) I never see
messages like this despite having the same amount of physical RAM in
each.  The 2.6.26.3 system is also under more active use, and has more
userspace memory usage.  On that system:

             total       used       free     shared    buffers
cached Mem:          2017       1681        335          0
99        603 -/+ buffers/cache:        979       1037
Swap:          972        137        835

dpn@...obus:~$ cat /proc/sys/vm/min_free_kbytes
3816

Yet on the system where I saw the allocation failures:

dpn@...ut:~/kernels/linux-2.6$ cat /proc/sys/vm/min_free_kbytes
5711

If I understand it correctly the issue is that __netdev_alloc_skb must
make a GFP_ATOMIC allocation, which fails because the page cache must
evict pages before there is sufficient memory.  And
min_free_kbytes allows tuning of the point where try_to_free_pages is
called and thus the "reserve" memory available.  Is that correct?

Wouldn't a higher min_free_kbytes mean less likelihood of GFP_ATOMIC
allocations failing?  Or are these allocations failing on my 2.6.26.3
system and I don't know it because of different config options?

Why am I seeing this on the system with the *higher* min_free_kbytes?

Cheers,
Dan

-- 
Dan Noé
Software Engineer
Lime Brokerage LLC
781-370-2518
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/