lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20071004160941.e0c0c7e5.akpm@linux-foundation.org>
Date:	Thu, 4 Oct 2007 16:09:41 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	miklos@...redi.hu, wfg@...l.ustc.edu.cn, a.p.zijlstra@...llo.nl,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] remove throttle_vm_writeout()

On Fri, 05 Oct 2007 00:39:16 +0200
Miklos Szeredi <miklos@...redi.hu> wrote:

> > throttle_vm_writeout() should be a per-zone thing, I guess.  Perhaps fixing
> > that would fix your deadlock.  That's doubtful, but I don't know anything
> > about your deadlock so I cannot say.
> 
> No, doing the throttling per-zone won't in itself fix the deadlock.
> 
> Here's a deadlock example:
> 
> Total memory = 32M
> /proc/sys/vm/dirty_ratio = 10
> dirty_threshold = 3M
> ratelimit_pages = 1M
> 
> Some program dirties 4M (dirty_threshold + ratelimit_pages) of mmap on
> a fuse fs.  Page balancing is called which turns all these into
> writeback pages.
> 
> Then userspace filesystem gets a write request, and tries to allocate
> memory needed to complete the writeout.
> 
> That will possibly trigger direct reclaim, and throttle_vm_writeout()
> will be called.  That will block until nr_writeback goes below 3.3M
> (dirty_threshold + 10%).  But since all 4M of writeback is from the
> fuse fs, that will never happen.
> 
> Does that explain it better?
> 

yup, thanks.

This is a somewhat general problem: a userspace process is in the IO path. 
Userspace block drivers, for example - pretty much anything which involves
kernel->userspace upcalls for storage applications.

I solved it once in the past by marking the userspace process as
PF_MEMALLOC and I beleive that others have implemented the same hack.

I suspect that what we need is a general solution, and that the solution
will involve explicitly telling the kernel that this process is one which
actually cleans memory and needs special treatment.

Because I bet there will be other corner-cases where such a process needs
kernel help, and there might be optimisation opportunities as well.

Problem is, any such mark-me-as-special syscall would need to be
privileged, and FUSE servers presently don't require special perms (do
they?)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ