[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1008251951230.7034@chino.kir.corp.google.com>
Date: Wed, 25 Aug 2010 20:09:21 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: "Ted Ts'o" <tytso@....edu>, Peter Zijlstra <peterz@...radead.org>,
Jens Axboe <jaxboe@...ionio.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Neil Brown <neilb@...e.de>, Alasdair G Kergon <agk@...hat.com>,
Chris Mason <chris.mason@...cle.com>,
Steven Whitehouse <swhiteho@...hat.com>,
Jan Kara <jack@...e.cz>,
Frederic Weisbecker <fweisbec@...il.com>,
"linux-raid@...r.kernel.org" <linux-raid@...r.kernel.org>,
"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
"cluster-devel@...hat.com" <cluster-devel@...hat.com>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
"reiserfs-devel@...r.kernel.org" <reiserfs-devel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and
kzalloc
On Wed, 25 Aug 2010, Ted Ts'o wrote:
> > We certainly hope that nobody will reimplement the same function without
> > the __deprecated warning, especially for order < PAGE_ALLOC_COSTLY_ORDER
> > where there's no looping at a higher level. So perhaps the best
> > alternative is to implement the same _nofail() functions but do a
> > WARN_ON(get_order(size) > PAGE_ALLOC_COSTLY_ORDER) instead?
>
> Yeah, that sounds better.
>
Ok, and we'll make it a WARN_ON_ONCE() to be nice to the kernel log.
Although the current patchset does this with WARN_ON_ONCE(1, ...) instead,
this serves to ensure that we aren't dependent on the page allocator's
implementation to always loop for order < PAGE_ALLOC_COSTLY_ORDER in which
case the loop in the _nofail() functions would actually do something.
> > I think it's really sad that the caller can't know what the upper bounds
> > of its memory requirement are ahead of time or at least be able to
> > implement a memory freeing function when kmalloc() returns NULL.
>
> Oh, we can determine an upper bound. You might just not like it.
> Actually ext3/ext4 shouldn't be as bad as XFS, which Dave estimated to
> be around 400k for a transaction. My guess is that the worst case for
> ext3/ext4 is probably around 256k or so; like XFS, most of the time,
> it would be a lot less. (At least, if data != journalled; if we are
> doing data journalling and every single data block begins with
> 0xc03b3998U, we'll need to allocate a 4k page for every single data
> block written.) We could dynamically calculate an upper bound if we
> had to. Of course, if ext3/ext4 is attached to a network block
> device, then it could get a lot worse than 256k, of course.
>
On my 8GB machine, /proc/zoneinfo says the min watermark for ZONE_NORMAL
is 5086 pages, or ~20MB. GFP_ATOMIC would allow access to ~12MB of that,
so perhaps we should consider this is an acceptable abuse of GFP_ATOMIC as
a fallback behavior when GFP_NOFS or GFP_NOIO fails?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists