linux-kernel - Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130805050119.GA715@cmpxchg.org>
Date:	Mon, 5 Aug 2013 01:01:19 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Minchan Kim <minchan@...nel.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...riel.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Zlatko Calusic <zcalusic@...sync.net>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

On Mon, Aug 05, 2013 at 01:48:58PM +0900, Minchan Kim wrote:
> On Sun, Aug 04, 2013 at 11:43:04PM -0400, Johannes Weiner wrote:
> > On Mon, Aug 05, 2013 at 10:15:46AM +0900, Minchan Kim wrote:
> > > I really want to give Reviewed-by but before that, I'd like to clear out
> > > my concern which didn't handle enoughly in previous iteration.
> > > 
> > > Let's assume system has normal zone : 800M High zone : 800M
> > > and there are two parallel workloads.
> > > 
> > > 1. alloc_pages(GFP_KERNEL) : 800M
> > > 2. alloc_pages(GFP_MOVABLE) + mlocked : 800M
> > > 
> > > With old behavior, allocation from both workloads is fulfilled happily
> > > because most of allocation from GFP_KERNEL would be done in normal zone
> > > while most of allocation from GFP_MOVABLE would be done in high zone.
> > > There is no OOM kill in this scenario.
> > 
> > If you have used ANY cache before, the movable pages will spill into
> > lowmem.
> 
> Indeed, my example was just depends on luck.
> I just wanted to discuss such corner case issue to notice cons at least,
> someone. 
> 
> > 
> > > With you change, normal zone would be fullfilled with GFP_KERNEL:400M
> > > and GFP_MOVABLE:400M while high zone will have GFP_MOVABLE:400 + free 400M.
> > > Then, someone would be OOM killed.
> > >
> > > Of course, you can argue that if there is such workloads, he should make
> > > sure it via lowmem_reseve but it's rather overkill if we consider more examples
> > > because any movable pages couldn't be allocated from normal zone so memory
> > > efficieny would be very bad.
> > 
> > That's exactly what lowmem reserves are for: protect low memory from
> > data that can sit in high memory, so that you have enough for data
> > that can only be in low memory.
> > 
> > If we find those reserves to be inadequate, we have to increase them.
> > You can't assume you get more lowmem than the lowmem reserves, period.
> 
> Theoretically, true.
> 
> > 
> > And while I don't mean to break highmem machines, I really can't find
> > it in my heart to care about theoretical performance variations in
> > highmem cornercases (which is already a redundancy).
> 
> Yes. as I said, I don't know such workload, even embedded world.
> But, recent mobile phone start to use 3G DRAM and maybe 2G would be a high
> memory in 32bit machine. That's why I had a concern about this patch.
> I think It's likely to pin lowmem more than old.
>
> > > As I said, I like your approach because I have no idea to handle unbalanced
> > > aging problem better and we can get more benefits rather than lost by above
> > > corner case but at least, I'd like to confirm what you think about
> > > above problem before further steps. Maybe we can introduce "mlock with
> > > newly-allocation or already-mapped page could be migrated to high memory zone"
> > > when someone reported out? (we thougt mlocked page migration would be problem
> > > RT latency POV but Peter confirmed it's no problem.)
> > 
> > And you think increasing lowmem reserves would be overkill? ;-)
> 
> If possible, I would like to avoid. ;-)
> 
> Peak workload : 800M average workload : 100M 
> int foo[800M] vs int *bar = malloc(800M);
> 
> > 
> > These patches fix real page aging problems.  Making trade offs to work
> 
> Indeed!
> 
> > properly on as many setups as possible is one thing, but a highmem
> > configuration where you need exactly 100% of lowmem and mlock 100% of
> > highmem?
> 
> Nope. Apprently, I don't know.
> I just wanted to record that we should already cover such claims
> in review phase so that if such problem happens in future, we can answer
> easily "Just rasise your lowmem reserve ratio because you have been
> depends on the luck until now". And I don't want to argue with other mm
> guys such solution in future again.

That's fair enough.

I would definitely suggest increasing the lowmem reserves in that case
but I don't expect anybody to actually rely on the exact placement on
a pristine system.  Not for performance, much less for _correctness_.

> I think as reviewer, it's enough as it is.
> 
> All three patches,
> 
> Reviewed-by: Minchan Kim <minchan@...nel.org>

Thank you very much!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/