linux-kernel - Re: [PATCH 1/3] tree wide: get rid of __GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <564C8801.2090202@suse.cz>
Date:	Wed, 18 Nov 2015 15:15:29 +0100
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Michal Hocko <mhocko@...nel.org>
Cc:	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0
 allocations part I

On 11/10/2015 01:51 PM, Michal Hocko wrote:
> On Mon 09-11-15 23:04:15, Vlastimil Babka wrote:
>> On 5.11.2015 17:15, mhocko@...nel.org wrote:
>> > From: Michal Hocko <mhocko@...e.com>
>> > 
>> > __GFP_REPEAT has a rather weak semantic but since it has been introduced
>> > around 2.6.12 it has been ignored for low order allocations. Yet we have
>> > the full kernel tree with its usage for apparently order-0 allocations.
>> > This is really confusing because __GFP_REPEAT is explicitly documented
>> > to allow allocation failures which is a weaker semantic than the current
>> > order-0 has (basically nofail).
>> > 
>> > Let's simply reap out __GFP_REPEAT from those places. This would allow
>> > to identify place which really need allocator to retry harder and
>> > formulate a more specific semantic for what the flag is supposed to do
>> > actually.
>> 
>> So at first I thought "yeah that's obvious", but then after some more thinking,
>> I'm not so sure anymore.
> 
> Thanks for looking into this! The primary purpose of this patch series was
> to start the discussion. I've only now realized I forgot to add RFC, sorry
> about that.
> 
>> I think we should formulate the semantic first, then do any changes. Also, let's
>> look at the flag description (which comes from pre-git):
> 
> It's rather hard to formulate one without examining the current users...

Sure, but changing existing users is a different thing :)

>>  * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
>>  * _might_ fail.  This depends upon the particular VM implementation.
>> 
>> So we say it's implementation detail, and IIRC the same is said about which
>> orders are considered costly and which not, and the associated rules. So, can we
>> blame callers that happen to use __GFP_REPEAT essentially as a no-op in the
>> current implementation? And is it a problem that they do that?
> 
> Well, I think that many users simply copy&pasted the code along with the
> flag. I have failed to find any justification for adding this flag for
> basically all the cases I've checked.
> 
> My understanding is that the overal motivation for the flag was to
> fortify the allocation requests rather than weaken them. But if we were
> literal then __GFP_REPEAT is in fact weaker than GFP_KERNEL for lower
> orders. It is true that the later one is so only implicitly - and as an
> implementation detail.

OK I admit I didn't realize fully that __GFP_REPEAT is supposed to be weaker,
although you did write it quite explicitly in the changelog. It's just
completely counterintuitive given the name of the flag!

> Anyway I think that getting rid of those users which couldn't ever see a
> difference is a good start.
> 
>> So I think we should answer the following questions:
>> 
>> * What is the semantic of __GFP_REPEAT?
>>   - My suggestion would be something like "I would really like this allocation
>> to succeed. I still have some fallback but it's so suboptimal I'd rather wait
>> for this allocation."
> 
> This is very close to the current wording.
> 
>> And then we could e.g. change some heuristics to take that
>> into account - e.g. direct compaction could ignore the deferred state and
>> pageblock skip bits, to make sure it's as thorough as possible. Right now, that
>> sort of happens, but not quite - given enough reclaim/compact attempts, the
>> compact attempts might break out of deferred state. But pages_reclaimed might
>> reach 1 << order before compaction "undefers", and then it breaks out of the
>> loop. Is any such heuristic change possible for reclaim as well?
> 
> I am not familiar with the compaction code enough to comment on this but
> the reclaim part is already having something in should_continue_reclaim.

Yeah in this function having the flag means to try harder. Which is
counterintuitive to the "allow failure" semantics.

> For low order allocations this doesn't make too much of a difference
> because the reclaim is retried anyway from the page allocator path.
> 
>> As part of this question we should also keep in mind/rethink __GFP_NORETRY as
>> that's supposed to be the opposite flag to __GFP_REPEAT.
>> 
>> * Can it ever happen that __GFP_REPEAT could make some difference for order-0?
> 
> Yes, if you want to try hard but eventually allow to fail the request.

Right, I missed the "eventually fail" part.

> Why just not use __GFP_NORETRY for that purpose? Well, that context is
> much weaker. We give up after the first round of the reclaim and we do
> not trigger the OOM killer in that context. __GFP_REPEAT should be about
> finit retrying.
> 
> I am pretty sure there are users who would like to have this semantic.
> None of the current low-order users seem to fall into this cathegory
> AFAICS though.
> 
>>   - Certainly not wrt compaction, how about reclaim?
> 
> We can decide to allow the allocation to fail if reclaim progress was
> not sufficient _and_ the OOM killer haven't helped rather than start
> looping again.

Makes sense.

>>   - If it couldn't possibly affect order-0, then yeah proceed with Patch 1.
> 
> I've split up obviously order-0 from the rest because I think order-0
> are really easy to understand. Patch 1 is a bit harder to grasp but I
> think it should be safe as well. I am open to discussion of course.
> 
>> * Is PAGE_ALLOC_COSTLY_ORDER considered an implementation detail?
> 
> Yes, more or less, but I doubt we can change it much considering all the
> legacy code which might rely on it. I think we should simply remove the
> dependency on the order and act for all orders same semantically.

That's a nice goal, but probably a long-term one? Looks like making failure
possibility the default isn't viable. So that leaves us with making non-failure
the default. Which means checking all currently-costly-oder allocating sites to
pass some of the flags allowing failure.

>>   - I would think so, and if yes, then we probably shouldn't remove
>> __GFP_NORETRY for order-1+ allocations that happen to be not costly in the
>> current implementation?
> 
> I guess you meant __GFP_REPEAT here

Ah, yes.

> but even then we should really think
> about the reason why the flag has been added. Is it to fortify the
> request? If yes it never worked that way so it is hard to justify it
> that way.

Yep, we should definitely think of a less misleading name.

> Thanks!
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/