lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1hOcMrNb=RSqB92LzXRBRVZBkOdwggERSS3FLKak1EZi9gkA@mail.gmail.com>
Date:	Sun, 8 Mar 2015 20:13:28 +0100
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Fengguang Wu <fengguang.wu@...el.com>, X86 ML <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>, LKP <lkp@...org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [x86/asm/entry] BUG: unable to handle kernel paging request

On Sat, Mar 7, 2015 at 1:33 AM, Andy Lutomirski <luto@...capital.net> wrote:
> On Fri, Mar 6, 2015 at 3:30 PM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>> Greetings,
>>
>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
>>
>> commit 75182b1632a89f12540baa1806a7c5c180db620c
>> Author:     Andy Lutomirski <luto@...capital.net>
>> AuthorDate: Thu Mar 5 19:19:03 2015 -0800
>> Commit:     Ingo Molnar <mingo@...nel.org>
>> CommitDate: Fri Mar 6 08:32:57 2015 +0100
>>
>>     x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()
>
> This problem only happens on 32-bit kernels, and the culprit is, sort of:
>
> /*
>  * The below -8 is to reserve 8 bytes on top of the ring0 stack.
>  * This is necessary to guarantee that the entire "struct pt_regs"
>  * is accessible even if the CPU haven't stored the SS/ESP registers
>  * on the stack (interrupt gate does not save these registers
>  * when switching to the same priv ring).
>  * Therefore beware: accessing the ss/esp fields of the
>  * "struct pt_regs" is possible, but they may contain the
>  * completely wrong values.
>  */
> #define task_pt_regs(task)                                             \
> ({                                                                     \
>        struct pt_regs *__regs__;                                       \
>        __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task))-8); \
>        __regs__ - 1;                                                   \
> })
>
> I'm confused about multiple things:
>
> 1. I don't understand this comment.

Comment says that in 32-bit x86, interrupts and exceptions
in ring 0 do not push SS,ESP - they only save EFLAGS,CS,EIP
in iret frame. (This happens because CPL doesn't
change, not beacuse ot is zero).

IRET insn likewise does not restore SS,ESP if it detects
that RPL(stack_CS) = RPL(CS).

This was changed for 64-bit more for two reasons:
(1) it's simpler, and
(2) it makes it possible for CPU to automatically align stack
to 16-byte boundary (you need to undo that on iretq,
therefore you need to always save old RSP).

>From AMD docs:

"8.9.3 Interrupt Stack Frame
In long mode, the return-program stack pointer (SS:RSP) is always
pushed onto the interrupt-handler
stack, regardless of whether or not a privilege change occurs.
Although the SS register is not used in
64-bit mode, SS is pushed to allow returns into compatibility mode.
Pushing SS:RSP unconditionally
presents operating systems with a consistent interrupt-stack-frame
size for all interrupts, except for
error codes. Interrupt service-routine entry points that handle
interrupts generated by non-error-code
interrupts can push an error code on the stack for consistency.
In long mode, when a control transfer to an interrupt handler occurs,
the processor performs the
following:
1. Aligns the new interrupt-stack frame by masking RSP with
FFFF_FFFF_FFFF_FFF0h.
..."

> 2. copy_thread thinks that sp0 should be task_pt_regs + 1, which is 8
> bytes below the top of the stack.  cpu_tss, formerly INIT_TSS, thinks
> that sp0 should be the top of the stack.  This is inconsistent.  What
> gives?

I ran a small test: made "umask" syscall print where  pt_regs is,
and where it's end is:

SYSCALL_DEFINE1(umask, int, mask)
{
        struct pt_regs *regs = task_pt_regs(current);
        unsigned long ptregs = (unsigned long)regs;

        pr_err("[%d] current:%lx pt_regs:%lx pt_regs_end:%lx\n",
                current->pid, (long)current, ptregs, ptregs + sizeof(*regs));
        ....

Result:

/ # umask
[    8.393450] [910] current:c777ce00 pt_regs:c7567fb4 pt_regs_end:c7567ff8
[    8.396601] [910] current:c777ce00 pt_regs:c7567fb4 pt_regs_end:c7567ff8

It seems to be true that 32-bit x86 always has two unused words
beyond "struct pt_regs" on stack during syscalls.

IOW: tss.sp0 is *not* set at the very end of the stack on 32-bit.

> 3. vm86 does something terrible to sp0, which might mean that we can't
> use sp0 to find the top of the stack in general on 32-bit kernels,
> which is sad.  We might need a partial revert or to introduce a
> per-cpu variable "real_sp0" on 32-bit.  Ugh.  Could someone who
> understands vm86 mode better than me give me a hint?
>
> 4. Not directly related, but the 32-bit tss_struct contains this gem:
>
>     /*
>      * .. and then another 0x100 bytes for the emergency kernel stack:
>      */
>     unsigned long        stack[64];
>
> Last I checked, 0x100 != 64.  Also, wow, this is kind of disgusting. :)


Seems to be unused: I commented it out on "defconfig" build
and got no build errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ