linux-kernel - Re: [PATCH v4] x86/traps: Enable UBSAN traps on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240711184937.GE27299@noisy.programming.kicks-ass.net>
Date: Thu, 11 Jul 2024 20:49:37 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Kees Cook <kees@...nel.org>
Cc: Marco Elver <elver@...gle.com>,
	Gatlin Newhouse <gatlin.newhouse@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrey Konovalov <andreyknvl@...il.com>,
	Andrey Ryabinin <ryabinin.a.a@...il.com>,
	Nathan Chancellor <nathan@...nel.org>,
	Nick Desaulniers <ndesaulniers@...gle.com>,
	Bill Wendling <morbo@...gle.com>,
	Justin Stitt <justinstitt@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Baoquan He <bhe@...hat.com>,
	Rick Edgecombe <rick.p.edgecombe@...el.com>,
	Pengfei Xu <pengfei.xu@...el.com>,
	Josh Poimboeuf <jpoimboe@...nel.org>,
	Changbin Du <changbin.du@...wei.com>, Xin Li <xin3.li@...el.com>,
	Jason Gunthorpe <jgg@...pe.ca>, Arnd Bergmann <arnd@...db.de>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	linux-kernel@...r.kernel.org, kasan-dev@...glegroups.com,
	linux-hardening@...r.kernel.org, llvm@...ts.linux.dev,
	t.p.northover@...il.com, Fangrui Song <maskray@...gle.com>
Subject: Re: [PATCH v4] x86/traps: Enable UBSAN traps on x86

On Thu, Jul 11, 2024 at 09:35:24AM -0700, Kees Cook wrote:
> On Thu, Jul 11, 2024 at 11:06:12AM +0200, Marco Elver wrote:
> > On Thu, 11 Jul 2024 at 10:10, Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Wed, Jul 10, 2024 at 08:32:38PM +0000, Gatlin Newhouse wrote:
> > > > Currently ARM architectures extract which specific sanitizer
> > > > has caused a trap via encoded data in the trap instruction.
> > > > Clang on x86 currently encodes the same data in ud1 instructions
> > > > but the x86 handle_bug() and is_valid_bugaddr() functions
> > > > currently only look at ud2s.
> > > >
> > > > Bring x86 to parity with arm64, similar to commit 25b84002afb9
> > > > ("arm64: Support Clang UBSAN trap codes for better reporting").
> > > > Enable the reporting of UBSAN sanitizer detail on x86 architectures
> > > > compiled with clang when CONFIG_UBSAN_TRAP=y.
> > >
> > > Can we please get some actual words on what code clang will generate for
> > > this? This doesn't even refer to the clang commit.
> > >
> > > How am I supposed to know if the below patch matches what clang will
> > > generate etc..
> > 
> > I got curious what the history of this is - I think it was introduced
> > in https://github.com/llvm/llvm-project/commit/c5978f42ec8e9, which
> > was reviewed here: https://reviews.llvm.org/D89959
> 
> Sorry, I should have suggested this commit be mentioned in the commit
> log. The details are in llvm/lib/Target/X86/X86MCInstLower.cpp:
> https://github.com/llvm/llvm-project/commit/c5978f42ec8e9#diff-bb68d7cd885f41cfc35843998b0f9f534adb60b415f647109e597ce448e92d9f
> 
>   case X86::UBSAN_UD1:
>     EmitAndCountInstruction(MCInstBuilder(X86::UD1Lm)
>                                 .addReg(X86::EAX)
>                                 .addReg(X86::EAX)
>                                 .addImm(1)
>                                 .addReg(X86::NoRegister)
>                                 .addImm(MI->getOperand(0).getImm())
>                                 .addReg(X86::NoRegister));
> 
> Which is using the UD1Lm template from
> https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86InstrSystem.td#L27
> 
>   def UD1Lm   : I<0xB9, MRMSrcMem, (outs), (ins GR32:$src1, i32mem:$src2),
>                   "ud1{l}\t{$src2, $src1|$src1, $src2}", []>, TB, OpSize32;
> 
> It uses OpSize32, distinct from UD1Wm (16) and UD1Qm (64).
> 
> > But besides that, there's very little documentation. Either Gatlin or
> > one of the other LLVM folks might have more background, but we might
> > be out of luck if that 1 commit is all there is.
> > 
> > [+Cc Tim, author of the LLVM commit]
> > 
> > > > diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
> > > > index a3ec87d198ac..ccd573d58edb 100644
> > > > --- a/arch/x86/include/asm/bug.h
> > > > +++ b/arch/x86/include/asm/bug.h
> > > > @@ -13,6 +13,17 @@
> > > >  #define INSN_UD2     0x0b0f
> > > >  #define LEN_UD2              2
> > > >
> > > > +/*
> > > > + * In clang we have UD1s reporting UBSAN failures on X86, 64 and 32bit.
> > > > + */
> > > > +#define INSN_ASOP    0x67
> > >
> > > I asked, but did not receive answer, *WHY* does clang add this silly
> > > prefix? AFAICT this is entirely spurious and things would be simpler if
> > > we don't have to deal with it.
> 
> Even if we change LLVM, I'd still like to support the older versions, so
> we'll need to handle this regardless.

Is it (LLVM) allowed to do prefix stuffing for 'random' instructions in
order to achieve alignment goals? That is, are we ever expecting more
prefixes here?

Anyway, as proposed the 'decoder' also accepts ASOP UD2, should we be
complete and also return an instruction length? Just in case we want to
be non fatal (WARN like) and skip over the instruction.

> > >
> > > > +#define OPCODE_PREFIX        0x0f
> > >
> > > This is *NOT* a prefix, it is an escape, please see the SDM Vol 2
> > > Chapter 'Instruction Format'. That ASOP thing above is a prefix.
> > >
> > > > +#define OPCODE_UD1   0xb9
> > > > +#define OPCODE_UD2   0x0b
> > >
> > > These are second byte opcodes. The actual (single byte opcodes) of those
> > > value exist and are something entirely different (0xB0+r is MOV, and
> > > 0x0B is OR).
> 
> What would be your preferred names for all of these defines?

SDM calls 0x0f the escape opcode and these others secondary opcode
bytes, so something along those lines would be clear I suppose.	

Vol2 2.1.2 (todays edition)