lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Mar 2013 17:05:17 -0700
From:	John Stultz <john.stultz@...aro.org>
To:	Minchan Kim <minchan@...nel.org>
CC:	Minchan Kim <minchan.kim@....com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Arun Sharma <asharma@...com>, Mel Gorman <mel@....ul.ie>,
	Hugh Dickins <hughd@...gle.com>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>, Neil Brown <neilb@...e.de>,
	Mike Hommey <mh@...ndium.org>, Taras Glek <tglek@...illa.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Jason Evans <je@...com>, sanjay@...gle.com,
	Paul Turner <pjt@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michel Lespinasse <walken@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC v7 00/11] Support vrange for anonymous page

On 03/27/2013 01:03 AM, Minchan Kim wrote:
> On Tue, Mar 26, 2013 at 05:26:04PM -0700, John Stultz wrote:
>> Sorting out how to handle vrange() calls that cross both anonymous
>> and file vmas will be interesting, and may have some of the
>> drawbacks of the vma based approach, but I think it will still be
> Do you have any specific drawback examples?
> I'd like to solve it if it is critical and I believe we shouldn't
> do that for simpler implementation.

My current thought is that we manage volatile memory on both a per-mm 
(for anonymous memory) and per-address_space (for file memory) basis.

The down side, if we manage both file and anonymous volatile ranges with 
the same interface, we may have similar problems to the per-vma approach 
you were trying before. Specifically, if a single range covers both 
anonymous and file memory, we'll have to do a similar iterating over the 
different types of ranges, as we did with your earlier vma approach.

This adds some complexity since with the single interval tree method in 
your current patch, we know we only have to allocate one additional 
range per insert/remove. So we can do that right off the bat, and return 
any enomem errors without having made any state changes. This is a nice 
quality to have.

Where as if we're iterating over different types of ranges, with 
possibly multiple trees (ie: different mmapped files), we don't know how 
many new ranges we may have to allocate, so we could fail half way which 
causes ambiguous results on the marking ranges non-volatile (since 
returning the error leaves the range possibly half-unmarked).


I'm still thinking it through, but that's my concern.

Some ways we can avoid this:
1) Require that any vrange() call not cross different types of memory.
2) Provide a different vrange call (fvrange?)to be used with file backed 
memory.

Any other thoughts?


>> Anyway, that's my current thinkig. You can preview my current attempt here:
>> http://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=refs/heads/dev/vrange-minchan
>>
> I saw it roughly and it seems good to me.
> I will review it in detail if you send formal patch. :)
Ok. I'm still working on some changes (been slow this week), but hope to 
have more to send your way next week.

> As you know well, there are several trial to handle memory management
> in userspace. One of example is lowmemory notifier. Kernel just send
> signal and user can free pages. Frankly speaking, I don't like that idea.
> Because there are several factors to limit userspace daemon's bounded
> reaction and could have false-positive alarm if system has streaming data,
> mlocked pages or many dirty pages and so on.

True. However, I think that there are valid use cases lowmemory 
notification (Android's low-memory killer is one, where we're not just 
freeing pages, but killing processes), and I think both approaches have 
valid use.

> Anyway, my point is that I'd like to control page reclaiming in only
> kernel itself. For it, userspace can register their volatile or
> reclaimable memory ranges to kernel and define to the threshold.
> If kernel find memory is below threshold user defined, kernel can
> reclaim every pages in registered range freely.
>
> It means kernel has a ownership of page freeing. It makes system more
> deterministic and not out-of-control.
>
> So vrange system call's semantic is following as.
>
> 1. vrange for anonymous page -> Discard wthout swapout
> 2. vrange for file-backed page except shmem/tmpfs -> Discard without sync
> 3. vrange for shmem/tmpfs -> hole punching
I think on non-shmem file backed pages (case #2) hole punching will be 
required as well. Though I'm not totally convinced volatile ranges on 
non-tmpfs files actually makes sense (I still have yet to understand a 
use case).


Thanks again for your thoughts here.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ