lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 16 Nov 2023 19:06:23 +0100
From:   David Hildenbrand <david@...hat.com>
To:     xu <xu.xin.sc@...il.com>
Cc:     akpm@...ux-foundation.org, imbrenda@...ux.ibm.com,
        jiang.xuexin@....com.cn, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, ran.xiaokai@....com.cn, wang.yong12@....com.cn,
        xu.xin16@....com.cn, yang.yang29@....com.cn
Subject: Re: [PATCH] ksm: delay the check of splitting compound pages

On 16.11.23 18:39, David Hildenbrand wrote:
> On 16.11.23 13:17, xu wrote:
>>>>>> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
>>>>>>      	tree_rmap_item =
>>>>>>      		unstable_tree_search_insert(rmap_item, page, &tree_page);
>>>>>>      	if (tree_rmap_item) {
>>>>>> -		bool split;
>>>>>> -
>>>>>>      		kpage = try_to_merge_two_pages(rmap_item, page,
>>>>>>      						tree_rmap_item, tree_page);
>>>>>> -		/*
>>>>>> -		 * If both pages we tried to merge belong to the same compound
>>>>>> -		 * page, then we actually ended up increasing the reference
>>>>>> -		 * count of the same compound page twice, and split_huge_page
>>>>>> -		 * failed.
>>>>>> -		 * Here we set a flag if that happened, and we use it later to
>>>>>> -		 * try split_huge_page again. Since we call put_page right
>>>>>> -		 * afterwards, the reference count will be correct and
>>>>>> -		 * split_huge_page should succeed.
>>>>>> -		 */
>>>>>
>>>>> I'm curious, why can't we detect that ahead of time and keep only a
>>>>> single reference? Why do we need the backup code? Anything I am missing?
>>
>> Do you mean like this?
> 
> Let me see if the refcounting here is sane:
> 
> (a) The caller has a reference on "page" that it will put just after the
>       cmp_and_merge_page() call.
> (b) unstable_tree_search_insert() obtained a reference to the
>       "tree_page" using get_mergeable_page()->follow_page(). We will put
>       that reference.
> 
> So indeed, if both references are to the same folio, *we* have two
> references to the folio and can safely drop one of both.
> 
>>
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -2229,23 +2229,21 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
>>           tree_rmap_item =
>>                   unstable_tree_search_insert(rmap_item, page, &tree_page);
>>           if (tree_rmap_item) {
>> -               bool split;
>> +               bool SameCompound;
>> +               /*
>> +                * If they belongs to the same compound page, its' reference
>> +                * get twice, so need to put_page once to avoid that
>> +                * split_huge_page fails in try_to_merge_two_pages().
>> +                */
>> +               if (SameCompound = Is_SameCompound(page, tree_page))
>> +                       put_page(tree_page);
>>    
> 
> bool same_folio = page_folio(page) == page_folio(tree_page);
> 
> /*
>    * If both pages belong to the same folio, we are holding two references
>    * to the same large folio: splitting the folio in
>    * try_to_merge_one_page() will fail for that reason. So let's just drop
>    * one reference early.  Note that this is only possible if tree_page is
>    * not a KSM page yet.
>    */
> if (same_folio)
> 	put_page(tree_page);
> 
> [we could use more folio operations here, but lets KIS]
> 

Looking into the details, that doesn't work unfortunately.

try_to_merge_one_page() will call split_huge_page(), but that
will only hold a reference to "page", not to "tree page" after the
split.

Long story short, after split_huge_page() tree_page could get freed
because we don't hold a reference to that one anymore.

So we most probably cannot do better than the following (untested):

diff --git a/mm/ksm.c b/mm/ksm.c
index 7efcc68ccc6e..afb079524585 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2226,25 +2226,30 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
  		if (!err)
  			return;
  	}
+retry:
  	tree_rmap_item =
  		unstable_tree_search_insert(rmap_item, page, &tree_page);
  	if (tree_rmap_item) {
-		bool split;
+		/*
+		 * If both pages belong to the same folio, we are holding two
+		 * references to the same large folio: splitting the folio in
+		 * try_to_merge_one_page() will fail for that reason.
+		 *
+		 * We have to split manually, and lookup the tree_page
+		 * again. Note that we'll only hold a reference to "page" after
+		 * the split.
+		 */
+		if (page_folio(page) == page_folio(tree_page)) {
+			put_page(tree_page);
+
+			if (!trylock_page(page))
+				return;
+			split_huge_page(page);
+			unlock_page(page);
+			goto retry;
+		}
  
  		kpage = try_to_merge_two_pages(rmap_item, page,
  						tree_rmap_item, tree_page);
-		/*
-		 * If both pages we tried to merge belong to the same compound
-		 * page, then we actually ended up increasing the reference
-		 * count of the same compound page twice, and split_huge_page
-		 * failed.
-		 * Here we set a flag if that happened, and we use it later to
-		 * try split_huge_page again. Since we call put_page right
-		 * afterwards, the reference count will be correct and
-		 * split_huge_page should succeed.
-		 */
-		split = PageTransCompound(page)
-			&& compound_head(page) == compound_head(tree_page);
  		put_page(tree_page);
  		if (kpage) {
  			/*
@@ -2271,20 +2276,6 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
  				break_cow(tree_rmap_item);
  				break_cow(rmap_item);
  			}
-		} else if (split) {
-			/*
-			 * We are here if we tried to merge two pages and
-			 * failed because they both belonged to the same
-			 * compound page. We will split the page now, but no
-			 * merging will take place.
-			 * We do not want to add the cost of a full lock; if
-			 * the page is locked, it is better to skip it and
-			 * perhaps try again later.
-			 */
-			if (!trylock_page(page))
-				return;
-			split_huge_page(page);
-			unlock_page(page);
  		}
  	}
  }

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ