linux-kernel - Re: [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and string routines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6d33c737-ead2-2e67-573e-936e245a13f3@arm.com>
Date:   Fri, 11 Sep 2020 16:14:13 +0100
From:   Oli Swede <oli.swede@....com>
To:     Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>
Cc:     Robin Murphy <robin.murphy@....com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and
 string routines

Hi Will and Catalin,

Thank you both for having a look at this, the coupling between the 
usercopy routines and the fixup routines was my main concern as well and 
I understand the desire to take an approach that avoids this dependency 
when it comes to importing new copy routines.

On 9/11/20 12:29 PM, Catalin Marinas wrote:
 >
 > I was a bit more optimistic about this series until I saw the
 > copy_user_fixup.S changes (patches 12 to 14). I have a suspicion it's
 > only Oli (and maybe Robin) who understands it, so from a maintainer's
 > perspective it doesn't really scale. It's also very fragile with any
 > minor tweak in the actual copy routine potentially breaking the fixup
 > code.
 >

I was wondering if it would be possible to check that an alternative 
expression of the logic involved (e.g. diverted to a C function - if 
technically feasible - that is easier to understand and modify) would 
not really be desirable either?

On 9/11/20 12:29 PM, Catalin Marinas wrote:
 > On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
 >>
 >> I'm inclined to say that cortex-strings is probably not a good basis for
 >> our uaccess routines. The code needs to be adapted in a 
non-straightforward
 >> way so that we lose pretty much all of the benefits we'd usually get 
from
 >> adopted an existing implementation; we can't pull in fixes or 
improvements
 >> without a lot of manual effort, we can't reuse existing testing 
infrastructure
 >> (see below) and we end up being a "second-class" user of the routines
 >> because of the discrepancies in implementation.
 >
 > I was a bit more optimistic about this series until I saw the
 > copy_user_fixup.S changes (patches 12 to 14). I have a suspicion it's
 > only Oli (and maybe Robin) who understands it, so from a maintainer's
 > perspective it doesn't really scale. It's also very fragile with any
 > minor tweak in the actual copy routine potentially breaking the fixup
 > code.
 >

An alternative approach to the fixup routines came up recently in 
discussions; it could potentially enable the copy template to be 
preserved in its current form and allow the uaccess routines to re-use a 
shared copy algorithm (including cortex-strings and/or future optimized 
out-of-order algorithms), and is based on one of Robin's suggestions 
from a few weeks ago (I found some time to prototype this locally the 
other day). The algorithm-specific backtracking would be dropped in 
favour of first- and second-stage fixups, with each of these being 
relatively short & straightforward, and (together) likely compatible 
with future copy routines.
More specifically, within the fixup path invoked after the initial 
fault, it could jump back N bytes and then begin an in-order byte-wise 
copy from within the first fixup, and then intentionally fault for the 
second time at either the same or another fault address. As the 
intermediate scan-copy should continue right up to the 'true' fault 
address, and ignore any out-of-order techniques used by the optimized 
algorithm, the fixup for the second fault could then contain an 
accurate, conclusive calculation of the return value.

A technique for this which would need to be carried forward from patch 
11 in v4 would be to redirect the PC to a different 'intermediate' fixup 
(though only two choices this time) depending on whether the faulting 
instruction is a load or store, so it can establish if the minimum 
address to jump back to should be src or buf when comparing to 
(faddr-N). Additionally, for this the parameters to the usercopy 
functions would still need to be stored on the stack (patch 10 in v4).
Otherwise, it would involve a slight rewrite of v4 that could provide a 
more maintainable approach; the same fixups would be shared for all copy 
sizes & fault addresses (versus in v4) and the only variable subject to 
re-evaluation with new imports might be N (currently for copy sizes over 
64B it is guaranteed anything below the previous 64B will have been 
copied). It could also remove the need to special-case UAO systems when 
compared to v4.

If it's possible for pages to be swapped out in the middle of a usercopy 
function then this method might not provide the extra implicit knowledge 
used by the fixups in v4 about the stage of the algorithm that has been 
reached (deduced from the fixup that has been routed to, implied by the 
fault-instruction-to-fixup mapping) to compensate for this, so it could 
under-report the bytes copied if it faults at an earlier address during 
the in-order (secondary) copy. This could be outweighed by the potential 
advantages mentioned above, and I mainly include this as v4 should 
account for this scenario (but perhaps unnecessarily if the scenario 
isn't possible or too unlikely :) ), but as you rightly point out it 
would be tiresome to re-write these routines.

On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
 >
 > So why don't we use cortex-strings as a basis for the in-kernel routines
 > only, preferably in a form where the code can be used directly and 
updated
 > with a script (e.g. similar to how we pull in arch/arm64/crypto routines
 > from OpenSSL). We can then roll our own uaccess routines, using a 
slightly
 > more straight-forward implementation which is more amenable to handling
 > user faults and doesn't do things like over copying.

I was wondering if the proposal described above is something that you 
would want to look into, or if the use of separate copy routines 
(script-based or otherwise) for in-kernel/uaccess is the preferred approach?

On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
 >
 > Can we put this test module into the kernel source tree, please, maybe as
 > part of lkdtm? Given the control flow of these optimised functions, I 
think
 > we absolutely need targeted testing to make sure we're getting complete
 > coverage.

I will post the standalone module in its current state later today/early 
next week, the tests are exhaustive but if it would be useful to include 
a mechanism for specifying certain ranges of faults/copy sizes or 
improve the error reporting, I'd be happy to add these features (and/or 
integrate it into lkdtm).

Many thanks,
Oli


On 9/11/20 12:29 PM, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 11:10:03AM +0100, Will Deacon wrote:
>> On Wed, Jul 01, 2020 at 09:12:49AM +0100, Oli Swede wrote:
>>>> Version 3 addressed this but I later found some issues with the fixup
>>>> correctness after further testing, and have partially re-written them
>>>> here, and addressed some other behaviours of the copy algorithm.
>>
>> [...]
>>
>>> I am waiting on access to the relevant machine before posting the benchmark
>>> results for this optimized memcpy, but Sam reported the following with the
>>> similar (but now slightly older) cortex-strings version:
>>>    * copy_from_user: 13.17%
>>>    * copy_to_user: 4.8%
>>>    * memcpy: 27.88%
>>>    * copy_in_user: Didn't appear in the test results.
>>> This machine will also be used to check the fixups are accurate on a system
>>> with UAO - they appear to be exact on a non-UAO system with PAN that I've
>>> been working on locally.
>>
>> I'm inclined to say that cortex-strings is probably not a good basis for
>> our uaccess routines. The code needs to be adapted in a non-straightforward
>> way so that we lose pretty much all of the benefits we'd usually get from
>> adopted an existing implementation; we can't pull in fixes or improvements
>> without a lot of manual effort, we can't reuse existing testing infrastructure
>> (see below) and we end up being a "second-class" user of the routines
>> because of the discrepancies in implementation.
> 
> I was a bit more optimistic about this series until I saw the
> copy_user_fixup.S changes (patches 12 to 14). I have a suspicion it's
> only Oli (and maybe Robin) who understands it, so from a maintainer's
> perspective it doesn't really scale. It's also very fragile with any
> minor tweak in the actual copy routine potentially breaking the fixup
> code.
> 
>> So why don't we use cortex-strings as a basis for the in-kernel routines
>> only, preferably in a form where the code can be used directly and updated
>> with a script (e.g. similar to how we pull in arch/arm64/crypto routines
>> from OpenSSL). We can then roll our own uaccess routines, using a slightly
>> more straight-forward implementation which is more amenable to handling
>> user faults and doesn't do things like over copying.
> 
> I think that's probably the best option. I wouldn't mind replacing the
> in-kernel memcpy/strcpy etc. with these patches since the work was done
> already but definitely not for the uaccess and fixup routines (we still
> have the original implementation in the git log).
> 
> A script would work even better. Do we have any issue with licensing
> though? Cortex Strings is BSD (3-clause IIRC) and copyright owned by
> Linaro. I got them to officially agree with relicensing (dual license)
> the latest copy under GPLv2 so that we can contribute it to the kernel.
> But since the project license is still BSD, any future updates in there
> are BSD-only.
> 
> Maybe someone who understands this stuff can confirm that it's ok to
> regularly grab the Cortex Strings files into a GPLv2 codebase without
> asking for Linaro's permission.
> 
> BTW, you could pick the kprobes patch in here, that explicit fixup call
> is not necessary.
>