lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZF/k0Xo76q60uk0M@moria.home.lan>
Date:   Sat, 13 May 2023 15:28:17 -0400
From:   Kent Overstreet <kent.overstreet@...ux.dev>
To:     Eric Biggers <ebiggers@...nel.org>
Cc:     Lorenzo Stoakes <lstoakes@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-bcachefs@...r.kernel.org,
        Kent Overstreet <kent.overstreet@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Uladzislau Rezki <urezki@...il.com>, linux-mm@...ck.org
Subject: Re: [PATCH 07/32] mm: Bring back vmalloc_exec

On Fri, May 12, 2023 at 06:57:52PM -0700, Eric Biggers wrote:
> What I would suggest instead is preprocessing the list of 6 field lengths to
> create some information that can be used to extract all 6 fields branchlessly
> with no dependencies between different fields.  (And you clearly *can* add a
> preprocessing step, as you already have one -- the dynamic code generator.)
> 
> So, something like the following:
> 
>     const struct field_info *info = &format->fields[0];
> 
>     field0 = (in->u64s[info->word_idx] >> info->shift1) & info->mask;
>     field0 |= in->u64s[info->word_idx - 1] >> info->shift2;
> 
> ... but with the code for all 6 fields interleaved.
> 
> On modern CPUs, I think that would be faster than your current C code.
> 
> You could do better by creating variants that are specialized for specific
> common sets of parameters.  During "preprocessing", you would select a variant
> and set an enum accordingly.  During decoding, you would switch on that enum and
> call the appropriate variant.  (This could also be done with a function pointer,
> of course, but indirect calls are slow these days...)
> 
> For example, you mentioned that 8-byte packed keys is a common case.  In that
> case there is only a single u64 to decode from, so you could create a function
> that just handles that case:
> 
>     field0 = (word >> info->shift) & info->mask;
> 
> You could also create other variants, e.g.:
> 
> - 16-byte packed keys (which you mentioned are common)
> - Some specific set of fields have zero width so don't need to be extracted
>   (which it sounds like is common, or is it different fields each time?)
> - All fields having specific lengths (are there any particularly common cases?)
> 
> Have you considered any of these ideas?

I like that idea.

Gonna hack some code... :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ