[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20071004160941.e0c0c7e5.akpm@linux-foundation.org>
Date: Thu, 4 Oct 2007 16:09:41 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: miklos@...redi.hu, wfg@...l.ustc.edu.cn, a.p.zijlstra@...llo.nl,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] remove throttle_vm_writeout()
On Fri, 05 Oct 2007 00:39:16 +0200
Miklos Szeredi <miklos@...redi.hu> wrote:
> > throttle_vm_writeout() should be a per-zone thing, I guess. Perhaps fixing
> > that would fix your deadlock. That's doubtful, but I don't know anything
> > about your deadlock so I cannot say.
>
> No, doing the throttling per-zone won't in itself fix the deadlock.
>
> Here's a deadlock example:
>
> Total memory = 32M
> /proc/sys/vm/dirty_ratio = 10
> dirty_threshold = 3M
> ratelimit_pages = 1M
>
> Some program dirties 4M (dirty_threshold + ratelimit_pages) of mmap on
> a fuse fs. Page balancing is called which turns all these into
> writeback pages.
>
> Then userspace filesystem gets a write request, and tries to allocate
> memory needed to complete the writeout.
>
> That will possibly trigger direct reclaim, and throttle_vm_writeout()
> will be called. That will block until nr_writeback goes below 3.3M
> (dirty_threshold + 10%). But since all 4M of writeback is from the
> fuse fs, that will never happen.
>
> Does that explain it better?
>
yup, thanks.
This is a somewhat general problem: a userspace process is in the IO path.
Userspace block drivers, for example - pretty much anything which involves
kernel->userspace upcalls for storage applications.
I solved it once in the past by marking the userspace process as
PF_MEMALLOC and I beleive that others have implemented the same hack.
I suspect that what we need is a general solution, and that the solution
will involve explicitly telling the kernel that this process is one which
actually cleans memory and needs special treatment.
Because I bet there will be other corner-cases where such a process needs
kernel help, and there might be optimisation opportunities as well.
Problem is, any such mark-me-as-special syscall would need to be
privileged, and FUSE servers presently don't require special perms (do
they?)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists