linux-kernel - Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Tue,  8 Dec 2009 15:27:46 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Rik van Riel <riel@...hat.com>
Cc:	kosaki.motohiro@...fujitsu.com,
	Andrea Arcangeli <aarcange@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Larry Woodman <lwoodman@...hat.com>
Subject: Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page

> On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
> >
> > Andrea, Can you please try following patch on your workload?
> >
> >
> >  From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> > From: KOSAKI Motohiro<kosaki.motohiro@...fujitsu.com>
> > Date: Mon, 7 Dec 2009 18:37:20 +0900
> > Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
> >
> > Changelog
> >   o from andrea's original patch
> >     - Rebase topon my patches.
> >     - Use list_cut_position/list_splice_tail pair instead
> >       list_del/list_add to make pte scan fairness.
> >     - Only use max young threshold when soft_try is true.
> >       It avoid wrong OOM sideeffect.
> >     - Return SWAP_AGAIN instead successful result if max
> >       young threshold exceed. It prevent the pages without clear
> >       pte young bit will be deactivated wrongly.
> >     - Add to treat ksm page logic
> 
> I like the concept and your changes, and really only
> have a few small nitpicks :)
> 
> First, the VM uses a mix of "referenced", "accessed" and
> "young".  We should probably avoid adding "active" to that
> mix, and may even want to think about moving to just one
> or two terms :)

Ah yes, certainly.


> > +#define MAX_YOUNG_BIT_CLEARED 64
> > +/*
> > + * if VM pressure is low and the page have too many active mappings, there isn't
> > + * any reason to continue clear young bit of other ptes. Otherwise,
> > + *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> > + *  - Makes lots IPI for pte change and it might cause another sadly lock
> > + *    contention.
> > + */
> 
> If VM pressure is low and the page has lots of active users, we only
> clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time.  Clearing
> accessed bits takes CPU time, needs TLB invalidate IPIs and could
> cause lock contention.  Since a heavily shared page is very likely
> to be used again soon, the cost outweighs the benefit of making such
> a heavily shared page a candidate for eviction.

Thanks. Will fix.


> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index cfda0a0..f4517f3 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
> >   		ret = wipe_page_reference_one(page, refctx, vma, address);
> >   		if (ret != SWAP_SUCCESS)
> >   			break;
> > +		if (too_many_young_bit_found(refctx)) {
> > +			LIST_HEAD(tmp_list);
> > +
> > +			/*
> > +			 * The scanned ptes move to list tail. it help every ptes
> > +			 * on this page will be tested by ptep_clear_young().
> > +			 * Otherwise, this shortcut makes unfair thing.
> > +			 */
> > +			list_cut_position(&tmp_list,
> > +					&vma->anon_vma_node,
> > +					&anon_vma->head);
> > +			list_splice_tail(&tmp_list,&vma->anon_vma_node);
> > +			ret = SWAP_AGAIN;
> > +			break;
> > +		}
> 
> I do not understand the unfairness here, since all a page needs
> to stay on the active list is >64 referenced PTEs.  It does not
> matter which of the PTEs mapping the page were recently referenced.
> 
> However, rotating the anon vmas around may help spread out lock
> pressure in the VM and help things that way, so the code looks
> useful to me.

agreed. I have to rewrite the comment.


> In short, you can give the next version of this patch my
> 
> Reviewed-by: Rik van Riel <riel@...hat.com>
> 
> All I have are comment nitpicks :)

No. It's really worth.

Thank you.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/