linux-kernel - Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <944d990f-3c98-4ade-8176-4e4b25eae0b8@redhat.com>
Date: Mon, 18 Dec 2023 15:04:43 +0100
From: David Hildenbrand <david@...hat.com>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
 linux-mm <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Ryan Roberts <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>,
 Hugh Dickins <hughd@...gle.com>, Yin Fengwei <fengwei.yin@...el.com>,
 Yang Shi <shy828301@...il.com>, Ying Huang <ying.huang@...el.com>,
 Zi Yan <ziy@...dia.com>, Peter Zijlstra <peterz@...radead.org>,
 Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
 Waiman Long <longman@...hat.com>, "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs
 map a partially-mappable folio

On 17.12.23 20:13, Nadav Amit wrote:
> 
>> On Nov 24, 2023, at 3:26 PM, David Hildenbrand <david@...hat.com> wrote:
>>
>> 5. sub-IDs
>> ==========
>>
>> To achieve (2), we generate sub-IDs that have the following property,
>> assuming that our folio has P=folio_nr_pages() pages.
>>   "2 * sub-ID" cannot be represented by the sum of any other *2* sub-IDs
>>   "3 * sub-ID" cannot be represented by the sum of any other *3* sub-IDs
>>   "4 * sub-ID" cannot be represented by the sum of any other *4* sub-IDs
>>   ...
>>   "P * sub-ID" cannot be represented by the sum of any other *P* sub-IDs
>>
>> The sub-IDs are generated in generations, whereby
>> (1) Generation #0 is the number 0
>> (2) Generation #N takes all numbers from generations #0..#N-1 and adds
>>     (P + 1)^(N - 1), effectively doubling the number of sub-IDs
>>
>> Consequently, the smallest number S in gen #N is:
>>   S[#N] = (P + 1)^(N - 1)
>>
>> The largest number L in gen #N is:
>>   L[#N] = (P + 1)^(N - 1) + (P + 1)^(N - 2) + ... (P + 1)^0 + 0.
>>   -> [geometric sum with "P + 1 != 1"]
>>         = (1 - (P + 1)^N) / (1 - (P + 1))
>>         = (1 - (P + 1)^N) / (-P)
>>         = ((P + 1)^N - 1) / P
> 
> David, as you know it took me a while to understand your impressive work.
> 

Hi Nadav,

thanks a bunch for digging into the details of the solution and trying 
to verify the correctness!

And thanks a lot for highlighting the special case below that I 
intuitively used for the "intuition" :)


> I think that part of what made it hard for me is the presentation and the
> formulation of sub-IDs in arithmetic means instead of bit manipulations.

Yes, that needs more work to make it all easier to understand.

> 
> Basically, IIUC, you want for order-K pages to add K-1 “0” bits between
> each rmap-id bits.
> 
> In this case, in x86-64 (with BMI2) there is the PDEP instruction that can
> generate these values rather easily with little overhead.

Partially, yes.

What you describe here is a special case that is easier to "grasp", 
likely a bit faster to calculate, but ends up being less-optimal in 
space consumption for some cases (especially, order-2 and order-9 folios).

You are describing the special case where "P+1" is a power of two.

Let me explain using the example (P = 3) from the "intuition" further 
down in this patch description, highlighting the mapping:

RMAP-ID |       |Subid |
-----------------------------------
  0       | 0000 | 0    | 0000 0000
  1       | 0001 | 1    | 0000 0001
  2       | 0010 | 4    | 0000 0100
  3       | 0011 | 5    | 0000 0101
  4       | 0100 | 16   | 0001 0000
  5       | 0101 | 17   | 0001 0001
  6       | 0110 | 20   | 0001 0100
  7       | 0111 | 21   | 0001 0101
  8       | 1000 | 64   | 0100 0000
  9       | 1001 | 65   | 0100 0001
  10      | 1010 | 68   | 0100 0100
  11      | 1011 | 69   | 0100 0101
  12      | 1100 | 80   | 0101 0100
  13      | 1101 | 81   | 0101 0001
  14      | 1110 | 84   | 0101 0100
  15      | 1111 | 85   | 0101 0101

So going from e.g., 11 -> 69 means adding one zero for each bit, just 
like you said.

But adding 1 "0" bit is not sufficient for handling order-2 folios (P = 
4), only for handling order-1 folios. So what the current approach does 
is the following (P = 4):

RMAP-ID |       | Subid |
-----------------------------------
  0       | 0000 | 0     | 0000 0000
  1       | 0001 | 1     | 0000 0001
  2       | 0010 | 5     | 0000 0101
  3       | 0011 | 6     | 0000 0110
  4       | 0100 | 25    | 0001 1001
  5       | 0101 | 26    | 0001 1010
  6       | 0110 | 30    | 0001 1110
  7       | 0111 | 31    | 0001 1111
  8       | 1000 | 125   | 0111 1101
  9       | 1001 | 126   | 0111 1110
  10      | 1010 | 130   | 1000 0010
  11      | 1011 | 131   | 1000 0011
  12      | 1100 | 150   | 1001 0110
  13      | 1101 | 151   | 1001 0111
  14      | 1110 | 155   | 1001 1011
  15      | 1111 | 156   | 1001 1100

Which is not just adding "0"s.

To handle it using your simplification we'd have to add one more "0" bit 
to be able to use it for order-2 folios. So I'll refine your statement to:
	for order-K pages to add K “0” bits between each rmap-id bits.

Then, it's easy to see how going from 24 RMAP-ID bits for an order-2 
page would require with that simplification 3*24 = 72bit and would no 
longer fit into a single 64bit value.

To summarize, with the optimized (currently implemented) scheme, we achieve:
  * == order-2:  1x 64bit rmap values
  * <= order-5:  2x 64bit rmap values
  * <= order-9:  3x 64bit rmap values
  * <= order-10: 4x 64bit rmap values
  * <= order-12: 5x 64bit rmap values
  * <= order-15: 6x 64bit rmap values
  * ...

I think, going with the simplification above (to be double-checked), 
we'd achieve [essentially, subtracting 1 from all orders]:
  * == order-1:  1x 64bit rmap values
  * <= order-4:  2x 64bit rmap values
  * <= order-8:  3x 64bit rmap values
  * <= order-9:  4x 64bit rmap values
  * <= order-11: 5x 64bit rmap values
  * <= order-14: 6x 64bit rmap values
  * ...

So especially for order-9 folios we would require 4 instead of 3  64bit 
values.


Now, going with this simplification as first step makes absolute sense, 
because it's much more intuitive and eventually a bit easier to 
implement (although really not significantly IMHO).

One can then easily add the optimization on top later.

> 
> I think that besides the easier generation of sub-ids values in this
> manner, discussing the matter without the “generations" also makes it
> easier to understand the correctness (at least for me).

Yes, that presentation certainly needs improvement. Generating the magic 
numbers recursively makes it easier to proof the correctness using 
induction, that's how I started to use that whole "generation" terminology.

-- 
Cheers,

David / dhildenb