[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120416175040.0e33b37f@de.ibm.com>
Date: Mon, 16 Apr 2012 17:50:40 +0200
From: Martin Schwidefsky <schwidefsky@...ibm.com>
To: Mel Gorman <mgorman@...e.de>
Cc: Heiko Carstens <heiko.carstens@...ibm.com>,
Hugh Dickins <hughd@...gle.com>,
Rik van Riel <riel@...hat.com>,
Linux-MM <linux-mm@...ck.org>,
Linux-S390 <linux-s390@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page
under the page lock
On Mon, 16 Apr 2012 15:14:23 +0100
Mel Gorman <mgorman@...e.de> wrote:
> This patch is horribly ugly and there has to be a better way of doing
> it. I'm looking for suggestions on what s390 can do here that is not
> painful or broken.
>
> However, s390 needs a better way of guarding against
> PageSwapCache pages being removed from the radix tree while set_page_dirty()
> is being called. The patch would be marginally better if in the PageSwapCache
> case we simply tried to lock once and in the contended case just fail to
> propogate the storage key. I lack familiarity with the s390 architecture
> to be certain if this is safe or not. Suggestions on a better fix?
One though that crossed my mind is that maybe a better approach would be
to move the page_test_and_clear_dirty check out of page_remove_rmap.
What we need to look out for are code sequences of the form:
if (pte_dirty(pte))
set_page_dirty(page);
...
page_remove_rmap(page);
There are four of those as far as I can see: in try_to_unmap_one,
try_to_unmap_cluster, zap_pte, and zap_pte_range.
A valid implementation for s390 would be to test and clear the changed
bit in the storage key for every of those pte_dirty() calls.
if (pte_dirty(pte) || page_test_and_clear_dirty(page))
set_page_dirty(page);
...
page_remove_rmap(page); /* w/o page_test_clear_dirty */
Trouble is that the ISKE and SSKE instructions are very expensive, that
is why we currently have the operation in page_remove_rmap after the
map counter dropped to zero (which is wrong as we now have learned the
hard way). The additional check for (!PageAnon || PageSwapCache) is
just another variation of avoiding ISKE/SSKE.
Thinking about a function like this:
static inline int page_test_dirty_eco(struct page *page)
{
if (page_mapcount(page) > 1)
return 0;
if (PageAnon(page) && !PageSwapCache(page))
return 0;
return page_test_and_clear_dirty(page);
}
and use it alongside the pte_dirty() check. The worry I have is the
map counter. What guarantees us that the map counter is not decremented
concurrently? Which is probably a problem with the current patch as
well, checking atomic_add_negative(-1, &page->_mapcount) against zero
works, doing (page_mapcount(page) == 1) followed by the decrement
can race. And we better not forget a dirty bit ..
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists