linux-kernel - Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgSsfo89ESHcngvPCkQSh_YAJG-0g7fupb+Uv0E1d_EcQ@mail.gmail.com>
Date:   Mon, 16 Oct 2023 12:24:37 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Uros Bizjak <ubizjak@...il.com>
Cc:     Nadav Amit <namit@...are.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

On Mon, 16 Oct 2023 at 11:53, Uros Bizjak <ubizjak@...il.com> wrote:
>
> Unfortunately, it does not work and dies early in the boot with:

Side note: build the kernel with debug info (the limited form is
sufficient), and then run oopses through

  ./scripts/decode_stacktrace.sh

to get much nicer oops information that has line numbers and inlining
information in the backtrace.

> [    4.939358] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [    4.940090] RIP: 0010:begin_new_exec+0x8f2/0xa30
> [    4.940090] Code: 31 f6 e8 c1 49 f9 ff e9 3c fa ff ff 31 f6 4c 89
> ef e8 b2 4a f9 ff e9 19 fa ff ff 31 f6 4c 89 ef e8 23 4a f9 ff e9 ea
> fa ff ff <f0> 41 ff 0c 24 0f
> 85 55 fb ff ff 4c 89 e7 e8 4b 02 df ff e9 48 fb

That decodes to

   0: 31 f6                xor    %esi,%esi
   2: e8 c1 49 f9 ff        call   0xfffffffffff949c8
   7: e9 3c fa ff ff        jmp    0xfffffffffffffa48
   c: 31 f6                xor    %esi,%esi
   e: 4c 89 ef              mov    %r13,%rdi
  11: e8 b2 4a f9 ff        call   0xfffffffffff94ac8
  16: e9 19 fa ff ff        jmp    0xfffffffffffffa34
  1b: 31 f6                xor    %esi,%esi
  1d: 4c 89 ef              mov    %r13,%rdi
  20: e8 23 4a f9 ff        call   0xfffffffffff94a48
  25: e9 ea fa ff ff        jmp    0xfffffffffffffb14
  2a:* f0 41 ff 0c 24        lock decl (%r12) <-- trapping instruction
  2f: 0f 85 55 fb ff ff    jne    0xfffffffffffffb8a
  35: 4c 89 e7              mov    %r12,%rdi
  38: e8 4b 02 df ff        call   0xffffffffffdf0288

but without a nicer backtrace it's nasty to guess where this is.

The "lock decl ; jne" is a good hint, though - that sequence is most
definitely "atomic_dec_and_test()".

And that in turn means that it's almost certainly mmdrop(), which is

        if (unlikely(atomic_dec_and_test(&mm->mm_count)))
                __mmdrop(mm);

where that

  35: 4c 89 e7              mov    %r12,%rdi
  38: e8 4b 02 df ff        call   0xffffffffffdf0288

is exactly the unlikely "__mmdrop(mm)" part (and gcc decided to make
the likely branch a branch-out for some reason - presumably with the
inlining the code around it meant that was the better layout - maybe
this was all inside another "unlikely()" branch.

And if I read that right, this has all been inlined from
begin_new_exec() -> exec_mmap() -> mmdrop_lazy_tlb().

Now, how and why 'mm' would be NULL in that path, and why any
'current' reloading optimization would matter in this all I very much
can't see. The call site in begin_new_exec() is

        /*
         * Release all of the old mmap stuff
         */
        acct_arg_size(bprm, 0);
        retval = exec_mmap(bprm->mm);
        if (retval)
                goto out;

        bprm->mm = NULL;

and "bprm->mm" is most definitely non-NULL there because we earlier did

So I suspect the problem happened much earlier, caused some nasty
internal corruption, and the odd 'mm is NULL' is just a symptom.

        retval = set_mm_exe_file(bprm->mm, bprm->file);

using it, and that would have oopsed had bprm->mm been NULL then.

So there's some serious corruption there, but from the oops itself I
can't tell the source. I guess if we get 'current' wrong anywhere, all
bets are off.

             Linus