linux-kernel - RE: [PATCH -v2 -mm] add extra free kbytes tunable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1110111343070.29761@chino.kir.corp.google.com>
Date:	Tue, 11 Oct 2011 14:04:45 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Satoru Moriya <satoru.moriya@....com>
cc:	Rik van Riel <riel@...hat.com>,
	Randy Dunlap <rdunlap@...otime.net>,
	Satoru Moriya <smoriya@...hat.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	"lwoodman@...hat.com" <lwoodman@...hat.com>,
	Seiji Aguchi <saguchi@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	"hannes@...xchg.org" <hannes@...xchg.org>
Subject: RE: [PATCH -v2 -mm] add extra free kbytes tunable

On Tue, 11 Oct 2011, Satoru Moriya wrote:

> > I also
> > think that it will cause regressions on other cpu intensive workloads 
> > that don't require this extra freed memory because it works as a 
> > global heuristic and is not tied to any specific application.
> 
> It's yes and no. It may cause regressions on the workloads due to
> less amount of available memory. But it may improve the workloads'
> performance because they can avoid direct reclaim due to extra
> free memory.
> 

There's only a memory-availability regression if background reclaim is 
actually triggered in the first place, i.e. extra_free_kbytes doesn't 
affect the watermarks themselves when reclaim is started but rather causes 
it to, when set, reclaim more memory than otherwise.

That's not really what I was referring to; I was referring to cpu 
intensive workloads that now incur a regression because kswapd is now 
doing more work (potentially a significant amount of work since 
extra_free_kbytes is unbounded) on shared machines.  These applications 
may not be allocating memory at all and now they incur a performance 
penalty because kswapd is taking away one of their cores.

In other words, I think it's a fine solution if you're running a single 
application with very bursty memory allocations so you need to reclaim 
more memory when low, but that solution is troublesome if it comes at 
the penalty of other applications and that's a direct consequence of it 
being a global tunable.  I'd much rather identify memory allocations in 
the kernel that causing the pain here and mitigate it by (i) attempting to 
sanely rate limit those allocations, (ii) preallocate at least a partial 
amount of those allocations ahead of time so avoid significant reclaim 
all at one, or (iii) annotate memory allocations with such potential so 
that the page allocator can add this reclaim bonus itself only in these 
conditions.

> Of course if one doesn't need extra free memory, one can turn it
> off. I think we can add this feature to cgroup if we want to set
> it for any specific process or process group. (Before that we
> need to implement min_free_kbytes for cgroup and the implementation
> of extra free kbytes strongly depends on it.)
> 

That would allow you to only reclaim additional memory when certain 
applications tirgger it, but it's not actually a solution since another 
task can hit a zone's low watermark and kick kswapd and then the bursty 
memory allocations happen immediately following that and doesn't actually 
do anything because kswapd was already running.  So I disagree, as I did 
when per-cgroup watermark tunables were proposed, that watermarks should 
be changed for a subset of applications unless you guarantee memory 
isolation such that that subset of applications has exclusive access to 
the memory zones being tuned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/