lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Feb 2015 23:40:55 +0100
From:	Sabrina Dubroca <sd@...asysnail.net>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Denys Vlasenko <dvlasenk@...hat.com>,
	Andrey Wagin <avagin@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	X86 ML <x86@...nel.org>, Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/3 v3] x86: entry_64.S: always allocate complete "struct
 pt_regs"

2015-02-25, 13:59:06 -0800, Andy Lutomirski wrote:
> On Wed, Feb 25, 2015 at 1:28 PM, Denys Vlasenko <dvlasenk@...hat.com> wrote:
> > On 02/25/2015 09:10 PM, Andy Lutomirski wrote:
> >> On Wed, Feb 25, 2015 at 11:59 AM, Andrey Wagin <avagin@...il.com> wrote:
> >>> 2015-02-25 21:42 GMT+03:00 Denys Vlasenko <dvlasenk@...hat.com>:
> >>>> On 02/25/2015 01:37 PM, Andrey Wagin wrote:
> >>>>> 2015-02-13 0:54 GMT+03:00 Denys Vlasenko <dvlasenk@...hat.com>:
> >>>>>> 64-bit code was using six stack slots less by not saving/restoring
> >>>>>> registers which are callee-preserved according to C ABI,
> >>>>>> and not allocating space for them.
> >>>>>> Only when syscall needed a complete "struct pt_regs",
> >>>>>> the complete area was allocated and filled in.
> >>>>>> As an additional twist, on interrupt entry a "slightly less truncated pt_regs"
> >>>>>> trick is used, to make nested interrupt stacks easier to unwind.
> >>>>>>
> >>>>>> This proved to be a source of significant obfuscation and subtle bugs.
> >>>>>> For example, stub_fork had to pop the return address,
> >>>>>> extend the struct, save registers, and push return address back. Ugly.
> >>>>>> ia32_ptregs_common pops return address and "returns" via jmp insn,
> >>>>>> throwing a wrench into CPU return stack cache.
> >>>>>>
> >>>>>> This patch changes code to always allocate a complete "struct pt_regs".
> >>>>>> The saving of registers is still done lazily.
> >>>>>>
> >>>>>> "Partial pt_regs" trick on interrupt stack is retained.
> >>>>>>
> >>>>>> Macros which manipulate "struct pt_regs" on stack are reworked:
> >>>>>> ALLOC_PT_GPREGS_ON_STACK allocates the structure.
> >>>>>> SAVE_C_REGS saves to it those registers which are clobbered by C code.
> >>>>>> SAVE_EXTRA_REGS saves to it all other registers.
> >>>>>> Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it.
> >>>>>>
> >>>>>> ia32_ptregs_common, stub_fork and friends lost their ugly dance with
> >>>>>> return pointer.
> >>>>>>
> >>>>>> LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
> >>>>>> instead of magic numbers.
> >>>>>>
> >>>>>> error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS
> >>>>>> instead of having it open-coded yet again.
> >>>>>>
> >>>>>> Patch was run-tested: 64-bit executables, 32-bit executables,
> >>>>>> strace works.
> >>>>>> Timing tests did not show measurable difference in 32-bit
> >>>>>> and 64-bit syscalls.
> >>>>>
> >>>>> Hello Denys,
> >>>>>
> >>>>> My test vm doesn't boot with this patch. Could you help to investigate
> >>>>> this issue?
> >>>>
> >>>> I think I found it. This part of my patch is possibly wrong:
> >>>>
> >>>> @@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void)
> >>>>  #define ARCH_LOCKDEP_SYS_EXIT_IRQ      \
> >>>>         TRACE_IRQS_ON; \
> >>>>         sti; \
> >>>> -       SAVE_REST; \
> >>>> +       SAVE_EXTRA_REGS; \
> >>>>         LOCKDEP_SYS_EXIT; \
> >>>> -       RESTORE_REST; \
> >>>> +       RESTORE_EXTRA_REGS; \
> >>>>         cli; \
> >>>>         TRACE_IRQS_OFF;
> >>>>
> >>>> The "SAVE_REST" here is intended to really *push* extra regs on stack,
> >>>> but the patch changed it so that they are written to existing stack
> >>>> slots above.
> >>>>
> >>>> From code inspection it should work in almost all cases, but some
> >>>> locations where it is used are really obscure.
> >>>>
> >>>> If there are places where *pushing* regs is really necessary,
> >>>> this can corrupt rbp,rbx,r12-15 registers.
> >>>>
> >>>> Your config has CONFIG_LOCKDEP=y, I think it's worth trying whether the bug
> >>>> was here.
> >>>> Please find updated patch attached. Can you try it?
> >>>
> >>> It doesn't work
> >
> > Thanks for testing it anyway.
> >
> >
> >>> [    3.016262] traps: systemd-cgroups[390] general protection
> >>> ip:7f456f7b6028 sp:7fffdc059718 error:0 in
> >>> ld-2.18.so[7f456f79e000+20000]
> >
> > This is what I know about these crashes. The SEGV itself is caused by
> > HLT instruction executed by dynamic loader, ld-2.NN.so.
> > The instruction is in _exit function, and is only reachable if
> > exit_group and exit syscalls fail to terminate the process.
> > So it seems that syscall execution is getting badly broken somehow
> > at some point.
> >
> > This happens to both reporters.
> >
> > My theory that it is related to lockdep seems to be wrong, because
> > Sabrina's kernel is not lockdep-enabled, yet it sees the same failure.
> >
> > Both kernels are paravirtualized, both are booted under KVM,
> > Andrey runs it with four virtual CPUs, Sabrina runs with two.
> >
> > My next theory is that I missed something related to paravirt.
> > I am looking at that code, so far I don't see anything suspicious.
> >
> > Unfortunately, it doesn't happen to me: I have Sabrina's bzImage,
> > I run it under "qemu-system-x86_64 -enable-kvm -smp 2",
> > I see in dmesg that kernel does detect that it is being run under KVM,
> > but it works for me. No mysterious segfaults.
> >
> > Andrey, can you send me your bzImage? Maybe it will trigger
> > the problem for me.
> >
> >
> >> The change to stub_\func looks wrong to me.  It saves and restores
> >> regs, but those regs might already have been saved if we're on the
> >> slow path.  (Yes, all that code is quite buggy even without all these
> >> patches.)  So is execve.
> >>
> >> This means that, for example, execve called in the slow path will
> >> save/restore regs twice.  If the values in the regs after the first
> >> save and before the second save are different, then we corrupt user
> >> state.
> >
> > This part?
> >
> >         .macro FORK_LIKE func
> >  ENTRY(stub_\func)
> >         CFI_STARTPROC
> > -       popq    %r11                    /* save return address */
> > -       PARTIAL_FRAME 0
> > -       SAVE_REST
> > -       pushq   %r11                    /* put it back on stack */
> > +       DEFAULT_FRAME 0, 8              /* offset 8: return address */
> > +       SAVE_EXTRA_REGS 8
> >         FIXUP_TOP_OF_STACK %r11, 8
> > -       DEFAULT_FRAME 0 8               /* offset 8: return address */
> >         call sys_\func
> >         RESTORE_TOP_OF_STACK %r11, 8
> > -       ret $REST_SKIP          /* pop extended registers */
> > +       ret
> >         CFI_ENDPROC
> >  END(stub_\func)
> >         .endm
> >
> >         FORK_LIKE  clone
> >         FORK_LIKE  fork
> >         FORK_LIKE  vfork
> >
> > But the old code (SAVE_REST thing) was also saving registers here.
> > It had to jump through hoops (pop return address, SAVE_REST,
> > push return address) to do that.
> > After the patch, "SAVE_EXTRA_REGS 8" does the same, just without
> > pop/push pair.
> >
> > I just don't see what's wrong with it. Can you elaborate?
> 
> SAVE_REST pushed the regs onto the stack, whereas SAVE_EXTRA_REGS just
> writes them in place.  It's possible for this to be called when the
> regs have already been saved.
> 
> >
> > And this area of code has no paravirt gunk, so if the bug is here,
> > why it doesn't fail for people running this natively?
> 
> I don't know whether paravirt is involved.  It could be something else.

After reading Denys's last mail, I tried booting the same VM
 - with 1 cpu
 - without CONFIG_PARAVIRT
 - with x86_64_defconfig

and I still get the same traps in all 3 cases.

I can run some userspace programs, but I have no idea what would be
helpful.
I can also try booting a real machine with archlinux/systemd tomorrow.

-- 
Sabrina
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ