[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa542818337b4157bfdcf262926b9fe3@AcuMS.aculab.com>
Date: Wed, 10 May 2023 11:56:05 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Kent Overstreet' <kent.overstreet@...ux.dev>,
Lorenzo Stoakes <lstoakes@...il.com>
CC: Christoph Hellwig <hch@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-bcachefs@...r.kernel.org" <linux-bcachefs@...r.kernel.org>,
Kent Overstreet <kent.overstreet@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Uladzislau Rezki <urezki@...il.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: RE: [PATCH 07/32] mm: Bring back vmalloc_exec
From: Kent Overstreet
> Sent: 09 May 2023 22:29
...
> The background is that bcachefs generates a per btree node unpack
> function, based on the packed format for that btree node, for unpacking
> keys within that node. The unpack function is only ~50 bytes, and for
> locality we want it to be located with the btree node's other in-memory
> lookup tables so they can be prefetched all at once.
Loading data into the d-cache isn't going to load code into
the i-cache.
Indeed you don't want to be mixing code and data in the same
cache line - because it just wastes space in the cache.
Looks to me like you could have a few different unpack
functions and pick the correct one based on the packed format.
Quite likely the code would be just as fast (if longer)
when you allow for parallel execution on modern cpu.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists