linux-kernel - Re: Linux 6.11-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wj2BYPvYQAQa-pyT3hERcd2pVw+rL5kw7Y=-8PA3JTDAg@mail.gmail.com>
Date: Tue, 30 Jul 2024 11:53:31 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Guenter Roeck <linux@...ck-us.net>, Andy Lutomirski <luto@...nel.org>, Ingo Molnar <mingo@...hat.com>, 
	Peter Anvin <hpa@...or.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Jens Axboe <axboe@...nel.dk>, 
	"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: Linux 6.11-rc1

[ Adding x86-32 entry code people, more context at the thread in:

  https://lore.kernel.org/all/3f65bfad-bd04-4651-bbe3-e2b1925f1a13@kernel.dk/

  for people who were dragged in late ]

On Tue, 30 Jul 2024 at 10:04, Guenter Roeck <linux@...ck-us.net> wrote:
>
> From the crash log:

The full log is more informative, at

  http://server.roeck-us.net/qemu/x86-nosmp/

which has that config too.

> [    3.605247] sr 2:0:0:0: Attached scsi generic sg0 type 5
> [    3.764508] sched_clock: Marking stable (3740032902, 23766486)->(3766853760, -3054372)
> [    3.768164] registered taskstats version 1
> [    3.768271] Loading compiled-in X.509 certificates
> [    3.990683] Btrfs loaded, zoned=no, fsverity=no
> [    4.005012] cryptomgr_test (68) used greatest stack depth: 6136 bytes left
> [    4.029889] traps: PANIC: double fault, error_code: 0x0

Double faults are bad bad juju.  Nasty to debug, because it means
something went wrong at a horribly bad time.

> [    4.030613] EIP: asm_exc_page_fault+0x0/0x10

Sadly, this mainly says that taking a page fault was part of the
horribly bad time.

> [    4.031389]  <ENTRY_TRAMPOLINE>
> [    4.031392]  ? asm_exc_int3+0x10/0x10
> ...
> [    4.033360]  ? asm_exc_int3+0x10/0x10
> [    4.033368]  ? restore_all_switch_stack+0x65/0xe6
> [    4.033386]  </ENTRY_TRAMPOLINE>

Yeah "restore_all_switch_stack" is also part of "horribly bad time".

And from the full log, I see that the "..." is a *lot* of asm_exc_int3+0x10.

Which makes me think it's asm_exc_int3 just recursively failing.

Which will cause a stack overflow, and then - after a time - a double fault.

[ Time passes, I build the i386 kernel image with your config just to
get an image that looks like yours ]

Hmm. I think the stack dump output confused me. Because
"asm_exc_int3+0x10/0x10" doesn't end up making much sense, but it
turns out that "asm_exc_int3+0x10" is actually the same as
'asm_exc_page_fault'.

So it smells like we're taking a page fault, but somehow the page
fault text address has been unmapped, so taking a page fault causes a
page fault and then we end up finally in that same "no more stack,
double fault" situation.

Either page table corruption, or some issue with the page table mitigation.

The fact that it started happening with the block merge may be because
the block code causes some major corruption, or may just be random bad
luck and it just changed some alignment somewhere, and exposed a
hidden but pre-existing issue.

Jens separately said that he can see it with gcc-11, but not his
regular compiler, so regardless it seems to be compiler-dependent.

Let's see it x86 people have some idea, but that

   restore_all_switch_stack+0x65/0xe6

and doing an objdump to see the code generation, it is literally here:

        0f 20 d8                mov    %cr3,%eax
        0d 00 10 00 00          or     $0x1000,%eax
        0f 22 d8                mov    %eax,%cr3
        eb 16                   jmp    <restore_all_switch_stack+0x7d>

with that "jmp" instruction being the restore_all_switch_stack+0x65 address.

So the infinite page faults seem to literally happen right after the
"mov %eax,%cr3".

Definitely something wrong with the page tables. But where that
wrongness comes from, I have no idea.

            Linus