[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0908250947400.2872@sister.anvils>
Date: Tue, 25 Aug 2009 10:03:30 +0100 (BST)
From: Hugh Dickins <hugh.dickins@...cali.co.uk>
To: Hiroaki Wakabayashi <primulaelatior@...il.com>
cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Minchan Kim <minchan.kim@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
Paul Menage <menage@...gle.com>, Ying Han <yinghan@...gle.com>,
Pekka Enberg <penberg@...helsinki.fi>,
Lee Schermerhorn <lee.schermerhorn@...com>
Subject: Re: [PATCH] mm: make munlock fast when mlock is canceled by sigkill
On Tue, 25 Aug 2009, Hiroaki Wakabayashi wrote:
> Thank you for reviews.
>
> >>> > @@ -254,6 +254,7 @@ static inline void
> >>> > mminit_validate_memmodel_limits(unsigned long *start_pfn,
> >>> > #define GUP_FLAGS_FORCE 0x2
> >>> > #define GUP_FLAGS_IGNORE_VMA_PERMISSIONS 0x4
> >>> > #define GUP_FLAGS_IGNORE_SIGKILL 0x8
> >>> > +#define GUP_FLAGS_ALLOW_NULL 0x10
> >>> >
> >>>
> >>> I am worried about adding new flag whenever we need it.
Indeed! See my comments below.
> >>> But I think this case makes sense to me.
> >>> In addition, I guess ZERO page can also use this flag.
> >>>
> >>> Kame. What do you think about it?
> >>>
> >> I do welcome this !
> >> Then, I don't have to take care of mlock/munlock in ZERO_PAGE patch.
I _think_ there's nothing to do for it (the page->mapping checks suit
the ZERO_PAGE); but I've not started testing my version, so may soon
be proved wrong.
> >>
> >> And without this patch, munlock() does copy-on-write just for unpinning memory.
> >> So, this patch shows some right direction, I think.
> >>
> >> One concern is flag name, ALLOW_NULL sounds not very good.
> >>
> >> GUP_FLAGS_NOFAULT ?
> >>
> >> I wonder we can remove a hack of FOLL_ANON for core-dump by this flag, too.
No, the considerations there a different (it can only point to a ZERO_PAGE
where faulting would anyway present a page of zeroes); it should be dealt
with by a coredump-specific flag, rather than sowing confusion elsewhere.
As above, I've done that but not yet tested it.
> >
> > Yeah, GUP_FLAGS_NOFAULT is better.
>
> Me too.
> I will change this flag name.
>...
> When I try to change __get_user_pages(), I got problem.
> If remove NULLs from pages,
> __mlock_vma_pages_range() cannot know how long __get_user_pages() readed.
> So, I have to get the virtual address of the page from vma and page.
> Because __mlock_vma_pages_range() have to call
> __get_user_pages() many times with different `start' argument.
>
> I try to use page_address_in_vma(), but it failed.
> (page_address_in_vma() returned -EFAULT)
> I cannot find way to solve this problem.
> Are there good ideas?
> Please give me some ideas.
I agree that this munlock issue needs to be addressed: it's not just a
matter of speedup, I hit it when testing what happens when mlock takes
you to OOM - which is currently a hanging disaster because munlock'ing
in the exiting OOM-killed process gets stuck trying to fault in all
those pages that couldn't be locked in the first place.
I had intended to fix it by being more careful about splitting/merging
vmas, noting how far the mlock had got, and munlocking just up to there.
However, now that I've got in there, that looks wrong to me, given the
traditional behaviour that mlock does its best, but pretends success
to allow for later instantiation of the pages if necessary.
You ask for ideas. My main idea is that so far we have added
GUP_FLAGS_IGNORE_VMA_PERMISSIONS (Kosaki-san, what was that about?
we already had the force flag),
GUP_FLAGS_IGNORE_SIGKILL, and now you propose
GUP_FLAGS_NOFAULT, all for the sole use of munlock.
How about GUP_FLAGS_MUNLOCK, or more to the point, GUP_FLAGS_DONT_BE_GUP?
By which I mean, don't all these added flags suggest that almost
everything __get_user_pages() does is unsuited to the munlock case?
My advice (but I sure hate giving advice before I've tried it myself)
is to put __mlock_vma_pages_range() back to handling just the mlock
case, and do your own follow_page() loop in munlock_vma_pages_range().
Hugh
Powered by blists - more mailing lists