linux-kernel - Re: [RFC 0/8] Cpuset aware writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0701161407530.3545@schroedinger.engr.sgi.com>
Date:	Tue, 16 Jan 2007 14:15:56 -0800 (PST)
From:	Christoph Lameter <clameter@....com>
To:	Andrew Morton <akpm@...l.org>
cc:	menage@...gle.com, linux-kernel@...r.kernel.org,
	nickpiggin@...oo.com.au, linux-mm@...ck.org, ak@...e.de,
	pj@....com, dgc@....com
Subject: Re: [RFC 0/8] Cpuset aware writeback

On Tue, 16 Jan 2007, Andrew Morton wrote:

> > On Mon, 15 Jan 2007 21:47:43 -0800 (PST) Christoph Lameter <clameter@....com> wrote:
> >
> > Currently cpusets are not able to do proper writeback since
> > dirty ratio calculations and writeback are all done for the system
> > as a whole.
> 
> We _do_ do proper writeback.  But it's less efficient than it might be, and
> there's an NFS problem.

Well yes we write back during LRU scans when a potentially high percentage 
of the memory in cpuset is dirty.

> > This may result in a large percentage of a cpuset
> > to become dirty without writeout being triggered. Under NFS
> > this can lead to OOM conditions.
> 
> OK, a big question: is this patchset a performance improvement or a
> correctness fix?  Given the above, and the lack of benchmark results I'm
> assuming it's for correctness.

It is a correctness fix both for NFS OOM and doing proper cpuset writeout.

> - Why does NFS go oom?  Because it allocates potentially-unbounded
>   numbers of requests in the writeback path?
> 
>   It was able to go oom on non-numa machines before dirty-page-tracking
>   went in.  So a general problem has now become specific to some NUMA
>   setups.

Right. The issue is that large portions of memory become dirty / 
writeback since no writeback occurs because dirty limits are not checked 
for a cpuset. Then NFS attempt to writeout when doing LRU scans but is 
unable to allocate memory.

>   So an obvious, equivalent and vastly simpler "fix" would be to teach
>   the NFS client to go off-cpuset when trying to allocate these requests.

Yes we can fix these allocations by allowing processes to allocate from 
other nodes. But then the container function of cpusets is no longer 
there.

> (But is it really bad? What actual problems will it cause once NFS is fixed?)

NFS is okay as far as I can tell. dirty throttling works fine in non 
cpuset environments because we throttle if 40% of memory becomes dirty or 
under writeback.

> I don't understand why the proposed patches are cpuset-aware at all.  This
> is a per-zone problem, and a per-zone fix would seem to be appropriate, and
> more general.  For example, i386 machines can presumably get into trouble
> if all of ZONE_DMA or ZONE_NORMAL get dirty.  A good implementation would
> address that problem as well.  So I think it should all be per-zone?

No. A zone can be completely dirty as long as we are allowed to allocate 
from other zones.

> Do we really need those per-inode cpumasks?  When page reclaim encounters a
> dirty page on the zone LRU, we automatically know that page->mapping->host
> has at least one dirty page in this zone, yes?  We could immediately ask

Yes, but when we enter reclaim most of the pages of a zone may already be 
dirty/writeback so we fail. Also when we enter reclaim we may not have
the proper process / cpuset context. There is no use to throttle kswapd. 
We need to throttle the process that is dirtying memory.

> But all of this is, I think, unneeded if NFS is fixed.  It's hopefully a
> performance optimisation to permit writeout in a less seeky fashion. 
> Unless there's some other problem with excessively dirty zones.

The patchset improves performance because the filesystem can do sequential 
writeouts. So yes in some ways this is a performance improvement. But this 
is only because this patch makes dirty throttling for cpusets work in the 
same way as for non NUMA system.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/