[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45107ECE.5040603@google.com>
Date: Tue, 19 Sep 2006 16:35:42 -0700
From: Mike Waychison <mikew@...gle.com>
To: Andrew Morton <akpm@...l.org>
CC: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
linux-mm@...ck.org,
Linux Kernel list <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...l.org>
Subject: Re: [RFC] page fault retry with NOPAGE_RETRY
Patch attached.
As Andrew points out, the logic is a bit hacky and using a flag in
current->flags to determine whether we have done the retry or not already.
I too think the right approach to being able to handle these kinds of
retries in a more general fashion is to introduce a struct
pagefault_args along the page faulting path. Within it, we could
introduce a reason for the retry so the higher levels would be able to
better understand what to do.
struct pagefault_args {
u32 reason;
};
#define PAGEFAULT_MAYRETRY_IO 0x1
#define PAGEFAULT_RETRY_IO 0x2
#define PAGEFAULT_RETRY_SIGNALPENDING 0x4
do_page_fault could for example set PAGEFAULT_MAYRETRY_IO, letting a
nopage implementation drop locks for IO and signalling back up to
do_page_fault by also setting PAGEFAULT_RETRY_IO.
PAGEFAULT_RETRY_SIGNALPENDING could be set to tell do_page_fault to
deliver the signal and replay the faulting instruction (as opposed to
the "goto retry" in the patch attached).
Mike Waychison
Andrew Morton wrote:
> On Fri, 15 Sep 2006 08:55:08 +1000
> Benjamin Herrenschmidt <benh@...nel.crashing.org> wrote:
>
>
>>in mm.h:
>>
>> #define NOPAGE_SIGBUS (NULL)
>> #define NOPAGE_OOM ((struct page *) (-1))
>>+#define NOPAGE_RETRY ((struct page *) (-2))
>>
>>and in memory.c, in do_no_page():
>>
>>
>> /* no page was available -- either SIGBUS or OOM */
>> if (new_page == NOPAGE_SIGBUS)
>> return VM_FAULT_SIGBUS;
>> if (new_page == NOPAGE_OOM)
>> return VM_FAULT_OOM;
>>+ if (new_page == NOPAGE_RETRY)
>>+ return VM_FAULT_MINOR;
>
>
> Google are using such a patch (Mike owns it).
>
> It is to reduce mmap_sem contention with threaded apps. If one thread
> takes a major pagefault, it will perform a synchronous disk read while
> holding down_read(mmap_sem). This causes any other thread which wishes to
> perform any mmap/munmap/mprotect/etc (which does down_write(mmap_sem)) to
> block behind that disk IO. As you can understand, that can be pretty bad
> in the right circumstances.
>
> The patch modifies filemap_nopage() to look to see if it needs to block on
> the page coming uptodate and if so, it does up_read(mmap_sem);
> wait_on_page_locked(); return NOPAGE_RETRY. That causes the top-level
> do_page_fault() code to rerun the entire pagefault.
>
> It hasn't been submitted for upstream yet because
>
> a) To avoid livelock possibilities (page reclaim, looping FADV_DONTNEED,
> etc) it only does the retry a single time. After that it falls back to
> the traditional synchronous-read-inside-mmap_sem approach. A flag in
> current->flags is used to detect the second attempt. It keeps the patch
> simple, but is a bit hacky.
>
> To resolve this cleanly I'm thinking we change all the pagefault code
> everywhere: instantiate a new `struct pagefault_args' in do_page_fault()
> and pass that all around the place. So all the pagefault code, all the
> ->nopage handlers etc will take a single argument.
>
> This will, I hope, result in less code, faster code and less stack
> consumption. It could also be used for things like the
> lock-the-page-in-filemap_nopage() proposal: the ->nopage()
> implementation could set a boolean in pagefault_args indicating whether
> the page has been locked.
>
> And, of course, fielmap_nopage could set another boolean in
> pagefault_args to indicate that it has already tried to rerun the
> pagefault once.
>
> b) It could be more efficient. Most of the time, there's no need to
> back all the way out of the pagefault handler and rerun the whole thing.
> Because most of the time, nobody changed anything in the mm_struct. We
> _could_ just retake the mmap_sem after the page comes uptodate and, if
> nothing has changed, proceed. I see two ways of doing this:
>
> - The simple way: look to see if any other processes are sharing
> this mm_struct. If not, just do the synchronous read inside mmap_sem.
>
> - The better way: put a sequence counter in the mm_struct,
> increment that in every place where down_write(mmap_sem) is performed.
> The pagefault code then can re-take the mmap_sem for reading and look
> to see if the sequence counter is unchanged. If it is, proceed. If
> it _has_ changed then drop mmap_sem again and return NOPAGE_RETRY.
>
> otoh, maybe using another bit in page->flags is a good compromise ;)
>
> Mike, could you whip that patch out please?
View attachment "mmap_sem_drop_for_faults-2.6.18-rc7.patch" of type "text/plain" (7743 bytes)
Powered by blists - more mailing lists