[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20070918211628.71cba770@lappy>
Date: Tue, 18 Sep 2007 21:16:28 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Daniel Phillips <phillips@...nq.net>
Cc: "Mike Snitzer" <snitzer@...il.com>,
"Christoph Lameter" <clameter@....com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
dkegel@...gle.com, "David Miller" <davem@...emloft.net>,
"Nick Piggin" <npiggin@...e.de>, "Wouter Verhelst" <w@...r.be>,
"Evgeniy Polyakov" <johnpol@....mipt.ru>
Subject: Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
On Tue, 18 Sep 2007 09:56:06 -0700 Daniel Phillips <phillips@...nq.net>
wrote:
> On Tuesday 18 September 2007 02:58, Peter Zijlstra wrote:
> > On Mon, 17 Sep 2007 22:11:25 -0700 Daniel Phillips wrote:
> > > > I've been using Avi Kivity's patch from some time ago:
> > > > http://lkml.org/lkml/2004/7/26/68
> > >
> > > Yes. Ddsnap includes a bit of code almost identical to that, which
> > > we wrote independently. Seems wild and crazy at first blush,
> > > doesn't it? But this approach has proved robust in practice, and is
> > > to my mind, obviously correct.
> >
> > I'm so not liking this :-(
>
> Why don't you share your specific concerns?
>
> > Can't we just run the user-space part as mlockall and extend netlink
> > to work with PF_MEMALLOC where needed?
> >
> > I did something like that for iSCSI.
>
> Not sure what you mean by extend netlink. We do run the user daemons
> under mlockall of course, this is one of the rules I stated earlier for
> daemons running in the block IO path. The problem is, if this
> userspace daemon allocates even one page, for example in sys_open, it
> can deadlock. Running the daemon in PF_MEMALLOC mode fixes this
> problem robustly, provided that the necessary audit of memory
> allocation patterns and library dependencies has been done.
>
> I suppose you are worried that the userspace code could unexpectedly
> allocate a large amount of memory and exhaust the entire PF_MEMALLOC
> reserve? Kernel code could do that too. This userspace code just
> needs to be checked carefully. Perhaps we could come up with a kernel
> debugging option to verify that a task does in fact stay within some
> bounded number of page allocs while in PF_MEMALLOC mode.
As I said on IRC, my main concern is exposing PF_MEMALLOC to user-space
at all.
I'm sure you have good programmers that write perfect user-space
code. But once the thing is out there, there is little to no control.
Of course, once root, trashing your box isn't hard, but lets not make
it easier.
The iSCSI daemon was mlockall but only communicated with the kernel
using netlink, so by sprinkling pixie dust on the netlink code one can
inject user-space policy stuffs in a safe way.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists