lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100826000940.GR31488@dastard>
Date:	Thu, 26 Aug 2010 10:09:40 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ted Ts'o <tytso@....edu>, David Rientjes <rientjes@...gle.com>,
	Jens Axboe <jaxboe@...ionio.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Neil Brown <neilb@...e.de>, Alasdair G Kergon <agk@...hat.com>,
	Chris Mason <chris.mason@...cle.com>,
	Steven Whitehouse <swhiteho@...hat.com>,
	Jan Kara <jack@...e.cz>,
	Frederic Weisbecker <fweisbec@...il.com>,
	"linux-raid@...r.kernel.org" <linux-raid@...r.kernel.org>,
	"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
	"cluster-devel@...hat.com" <cluster-devel@...hat.com>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	"reiserfs-devel@...r.kernel.org" <reiserfs-devel@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and
 kzalloc

On Wed, Aug 25, 2010 at 03:35:42PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-08-25 at 23:24 +1000, Dave Chinner wrote:
> > 
> > That is, the guarantee that we will always make progress simply does
> > not exist in filesystems, so a mempool-like concept seems to me to
> > be doomed from the start.... 
> 
> While I appreciate that it might be somewhat (a lot) harder for a
> filesystem to provide that guarantee, I'd be deeply worried about your
> claim that its impossible.

I didn't say impossible, just that there's no way we can always
guarantee of forward progress with a specific, bound pool of memory.

Sure, we know what the worst case amount of log space is needed for
each transaction (i.e. how many pages that will be dirtied), but
that does not take into account all the blocks that need to be read
to make those modifications, the memory needed for stuff like btree
cursors, log tickets, transaction commit vectors, btree blocks
needed to do the searches, etc.  A typical transaction reservation
on a 4k block filesystem is between 200-400k (it's worst case), and
if you add in all the other allocations that might be required,
we're at the order of requiring megabytes of RAM to guarantee a
single transaction will succeed in low memory conditions. The exact
requirement is very difficult to quantify algorithmically, but for a
single transaction it should be possible.

However, consider the case of running a thousand concurrent
transactions and in the middle of that the system runs out of
memory. All the transactions need memory allocation to succeed, some
are blocked waiting for resources held in other transactions, etc.
Firstly, how to you stop all the transactions from making further
progress to serialise access to the low memory pool?  Secondly, how
do you select which transaction you want to use the low memory pool?
What do you do if the selected transaction then blocks on a resource
held by another transaction (which you can't know ahead of time)? Do
you switch to another thread and hope the pool doesn't run dry? What
do you do when (not if) the memory pool runs dry?

I'm sure this could be done, but it's lot of difficult, unrewarding
work that greatly increases code complexity, touches a massive
amount of the filesystem code base, exponentially increases the test
matrix, is likely to have significant operational overhead, and even
then there's no guarantee that we've got it right. That doesn't
sound like a good solution to me.

> It would render a system without swap very prone to deadlocks. Even with
> the very tight dirty page accounting we currently have you can fill all
> your memory with anonymous pages, at which point there's nothing free
> and you require writeout of dirty pages to succeed.

Then don't allow anonymous pages to fill all of memory when there is
no swap available - i.e. keep a larger pool of free memory when
there is no swap available. That's a much simpler solution than
turning all the filesystems upside down to try to make them not need
allocation....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ