linux-kernel - Re: [PATCH v3 2/2] x86/copy_mc: Introduce copy_mc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPcyv4jPZny8uraVtO8gMfs8W9EJWfgSAo1zOnwqe2VBSLgaDQ@mail.gmail.com>
Date:   Wed, 20 May 2020 14:57:02 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Vivek Goyal <vgoyal@...hat.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, X86 ML <x86@...nel.org>,
        stable <stable@...r.kernel.org>, Borislav Petkov <bp@...en8.de>,
        Tony Luck <tony.luck@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Erwin Tsaur <erwin.tsaur@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>
Subject: Re: [PATCH v3 2/2] x86/copy_mc: Introduce copy_mc_generic()

On Wed, May 20, 2020 at 12:13 PM Vivek Goyal <vgoyal@...hat.com> wrote:
>
> On Tue, May 19, 2020 at 03:12:42PM -0700, Dan Williams wrote:
> > The original copy_mc_fragile() implementation had negative performance
> > implications since it did not use the fast-string instruction sequence
> > to perform copies. For this reason copy_mc_to_kernel() fell back to
> > plain memcpy() to preserve performance on platform that did not indicate
> > the capability to recover from machine check exceptions. However, that
> > capability detection was not architectural and now that some platforms
> > can recover from fast-string consumption of memory errors the memcpy()
> > fallback now causes these more capable platforms to fail.
> >
> > Introduce copy_mc_generic() as the fast default implementation of
> > copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to
> > be a platform quirk to indicate 'fragility'. With this in place
> > copy_mc_to_kernel() is fast and recovery-ready by default regardless of
> > hardware capability.
> >
> > Thanks to Vivek for identifying that copy_user_generic() is not suitable
> > as the copy_mc_to_user() backend since the #MC handler explicitly checks
> > ex_has_fault_handler().
>
> /me is curious to know why #MC handler mandates use of _ASM_EXTABLE_FAULT().

Even though we could try to handle all faults / exceptions
generically, I think it makes sense to enforce type safety here if
only to support architectures that can only satisfy the minimum
contract of copy_mc_to_user(). For example, if there was some
destination exception other than #PF the contract implied by
copy_mc_to_user() is that exception is not intended to be permissible
in this path. See:

00c42373d397 x86-64: add warning for non-canonical user access address
dereferences
75045f77f7a7 x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups

...for examples of other justification for being explicit in these paths.

>
> [..]
> > +/*
> > + * copy_mc_generic - memory copy with exception handling
> > + *
> > + * Fast string copy + fault / exception handling. If the CPU does
> > + * support machine check exception recovery, but does not support
> > + * recovering from fast-string exceptions then this CPU needs to be
> > + * added to the copy_mc_fragile_key set of quirks. Otherwise, absent any
> > + * machine check recovery support this version should be no slower than
> > + * standard memcpy.
> > + */
> > +SYM_FUNC_START(copy_mc_generic)
> > +     ALTERNATIVE "jmp copy_mc_fragile", "", X86_FEATURE_ERMS
> > +     movq %rdi, %rax
> > +     movq %rdx, %rcx
> > +.L_copy:
> > +     rep movsb
> > +     /* Copy successful. Return zero */
> > +     xorl %eax, %eax
> > +     ret
> > +SYM_FUNC_END(copy_mc_generic)
> > +EXPORT_SYMBOL_GPL(copy_mc_generic)
> > +
> > +     .section .fixup, "ax"
> > +.E_copy:
> > +     /*
> > +      * On fault %rcx is updated such that the copy instruction could
> > +      * optionally be restarted at the fault position, i.e. it
> > +      * contains 'bytes remaining'. A non-zero return indicates error
> > +      * to copy_safe() users, or indicate short transfers to
>
> copy_safe() is vestige of terminology of previous patches?

Thanks, yes, I missed this one.

>
> > +      * user-copy routines.
> > +      */
> > +     movq %rcx, %rax
> > +     ret
> > +
> > +     .previous
> > +
> > +     _ASM_EXTABLE_FAULT(.L_copy, .E_copy)
>
> A question for my education purposes.
>
> So copy_mc_generic() can handle MCE both on source and destination
> addresses? (Assuming some device can generate MCE on stores too).

There's no such thing as #MC on write. #MC is only signaled on consumed poison.

In this case what is specifically being handled is #MC with RIP
pointing at a movq instruction. The fault handler actually does not
know anything about source or destination, it just knows fault /
exception type and the register state.

> On the other hand copy_mc_fragile() handles MCE recovery only on
> source and non-MCE recovery on destination.

No, there's no difference in capability. #MC can only be raised on a
poison-read in both cases.