linux-kernel - Re: [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page under the page lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120416175040.0e33b37f@de.ibm.com>
Date:	Mon, 16 Apr 2012 17:50:40 +0200
From:	Martin Schwidefsky <schwidefsky@...ibm.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Heiko Carstens <heiko.carstens@...ibm.com>,
	Hugh Dickins <hughd@...gle.com>,
	Rik van Riel <riel@...hat.com>,
	Linux-MM <linux-mm@...ck.org>,
	Linux-S390 <linux-s390@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] s390: mm: rmap: Transfer storage key to struct page
 under the page lock

On Mon, 16 Apr 2012 15:14:23 +0100
Mel Gorman <mgorman@...e.de> wrote:

> This patch is horribly ugly and there has to be a better way of doing
> it. I'm looking for suggestions on what s390 can do here that is not
> painful or broken. 
> 
> However, s390 needs a better way of guarding against
> PageSwapCache pages being removed from the radix tree while set_page_dirty()
> is being called. The patch would be marginally better if in the PageSwapCache
> case we simply tried to lock once and in the contended case just fail to
> propogate the storage key. I lack familiarity with the s390 architecture
> to be certain if this is safe or not. Suggestions on a better fix?

One though that crossed my mind is that maybe a better approach would be
to move the page_test_and_clear_dirty check out of page_remove_rmap.
What we need to look out for are code sequences of the form:

	if (pte_dirty(pte))
		set_page_dirty(page);
	...
	page_remove_rmap(page);

There are four of those as far as I can see: in try_to_unmap_one,
try_to_unmap_cluster, zap_pte, and zap_pte_range.

A valid implementation for s390 would be to test and clear the changed
bit in the storage key for every of those pte_dirty() calls.

	if (pte_dirty(pte) || page_test_and_clear_dirty(page))
		set_page_dirty(page);
	...
	page_remove_rmap(page); /* w/o page_test_clear_dirty */

Trouble is that the ISKE and SSKE instructions are very expensive, that
is why we currently have the operation in page_remove_rmap after the
map counter dropped to zero (which is wrong as we now have learned the
hard way). The additional check for (!PageAnon || PageSwapCache) is
just another variation of avoiding ISKE/SSKE.

Thinking about a function like this:

static inline int page_test_dirty_eco(struct page *page)
{
	if (page_mapcount(page) > 1)
		return 0;
	if (PageAnon(page) && !PageSwapCache(page))
		return 0;
	return page_test_and_clear_dirty(page);
}

and use it alongside the pte_dirty() check. The worry I have is the
map counter. What guarantees us that the map counter is not decremented
concurrently? Which is probably a problem with the current patch as
well, checking atomic_add_negative(-1, &page->_mapcount) against zero
works, doing (page_mapcount(page) == 1) followed by the decrement 
can race. And we better not forget a dirty bit ..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/