[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG_fn=WPE9fS=iKiYZB0dwNpdbdDwQWz+3XTECN3r131Hm4sGQ@mail.gmail.com>
Date: Thu, 20 Jul 2023 14:00:42 +0200
From: Alexander Potapenko <glider@...gle.com>
To: Yury Norov <yury.norov@...il.com>
Cc: catalin.marinas@....com, will@...nel.org, pcc@...gle.com,
andreyknvl@...il.com, andriy.shevchenko@...ux.intel.com,
linux@...musvillemoes.dk, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, eugenis@...gle.com,
syednwaris@...il.com, william.gray@...aro.org
Subject: Re: [PATCH v3 3/5] arm64: mte: implement CONFIG_ARM64_MTE_COMP
On Wed, Jul 19, 2023 at 11:06 PM Yury Norov <yury.norov@...il.com> wrote:
>
> > Would it help if I add a comment explaining that iN is an N-bit
> > integer? Or do you prefer something like
> >
> > r_sizes[0..4] : 5 x 7 bits
> >
> > ?
>
> Yes, that would help.
Ok.
> > > 7 : Invalid handler
> >
> > No, 7 means "treat this handle as an out-of-line one". It is still
> > valid, but instead of tag data it contains a pointer.
>
> So, it's invalid combination for _inline_ handler, right?
Yes, that's right.
> Anyways, I'm
> waiting for an updated docs, so it will hopefully bring some light.
It's on the way.
> > > > +
> > > > +/* Transform tag ranges back into tags. */
> > > > +void ea0_ranges_to_tags(u8 *r_tags, short *r_sizes, int r_len, u8 *tags)
> > > > +{
> > > > + int i, j, pos = 0;
> > > > + u8 prev;
> > > > +
> > > > + for (i = 0; i < r_len; i++) {
> > > > + for (j = 0; j < r_sizes[i]; j++) {
> > > > + if (pos % 2)
> > > > + tags[pos / 2] = (prev << 4) | r_tags[i];
> > > > + else
> > > > + prev = r_tags[i];
> > > > + pos++;
>
> This code flushes tags at every 2nd iteration. Is that true that
> there's always an even number of iterations, i.e. rsizes is always
> even, assuming r_len can be 1?
>
> If not, it's possible to loose a tag, consider rlen == 1 and
> rsizes[0] == 1.
By design, ea0_ranges_to_tags() only accepts ranges that sum up to 256
(as produced by ea0_tags_to_ranges())
We could add a check for that, but given that it is an internal
helper, a comment should suffice.
But r_sizes[i] does not have to be even (otherwise we could've used
this fact for a more efficient compression).
For example, if the array of tags is {0xab, 0x0, 0x0, 0x0...}, then
r_len = 3, r_sizes[] = {1, 1, 254}, r_tags = {0x0a, 0x0b, 0x0}.
>
> If yes, you can simplify:
>
> for (i = 0; i < r_len; i++)
> for (j = 0; j < r_sizes[i]; j++)
> tags[pos++] = (r_tags[i] << 4) | r_tags[i];
>
> Anyways, in the test can you run all possible combinations?
I will extract code from compress_range_helper() to generate different
numbers of ranges and test against those.
> > > > +/* Translate allocation size into maximum number of ranges that it can hold. */
> > > > +static int ea0_size_to_ranges(int size)
> > > > +{
> > > > + switch (size) {
> > > > + case 8:
> > > > + return 6;
> > > > + case 16:
> > > > + return 11;
> > > > + case 32:
> > > > + return 23;
> > > > + case 64:
> > > > + return 46;
> > > > + default:
> > > > + return 0;
> > > > + }
> > > > +}
> > >
> > > I wonder if there's a math formula here? Can you explain where from
> > > those numbers come?
> >
> > I'll add a comment there.
> > Basically, for the inline case it is the biggest N such as 4 + 4*N +
> > 7*(N-1) <= 63
> >
> > (not 64, because Xarray steals one bit from us)
> >
> > For the out-of-line case it is the biggest N such as 6+4*N + 7*(N-1)
> > <= array size in bits (128, 256, or 512).
>
> So it's: N <= (BIT_CAPACITY + NUM4 - NUM7) / (TAGSZ + SZ)
>
> Doesn't look like a rocket science...
Note that the size of largest_idx is picked based on the largest N for
64-byte buffers.
And the size of r_sizes[] depends on the number of tags that fit into
128 bytes: if we throw away the largest range, each of the remaining
ones will fit into 7 bits.
But this will not be the case for e.g. 3-bit tags (we'll need 8-bit sizes then).
Changing mte_size_to_ranges() to "(size * 8 + MTE_BITS_PER_SIZE -
largest_bits) / (MTE_BITS_PER_TAG + MTE_BITS_PER_SIZE)" indeed looks
better than hardcoded numbers, but we should keep in mind that the
parameters depend on each other and cannot be easily changed.
>
> Why don't you code the formula instead of results? Are there any
> difficulties? Things may change. For example, in next spec they
> may bring 3- or 5-bit tags, and your compressor will crack loudly
> with hardcoded numbers.
The main reason for the compressor to crack loudly would be the
hardcoded assumption of the tags consisting of 4 bits.
I don't think it is practical to account on that number to change (and
e.g. use bitmap_read/bitmap_write to convert tags into ranges) - we'd
rather have a static assert for that.
--
Alexander Potapenko
Software Engineer
Google Germany GmbH
Erika-Mann-Straße, 33
80636 München
Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Powered by blists - more mailing lists