linux-kernel - Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFULd4YAFTFqon3ojv7N6h=G_1pAjSH3T6YvX0G=g7Fwh7j1jQ@mail.gmail.com>
Date:   Wed, 18 Oct 2023 11:04:30 +0200
From:   Uros Bizjak <ubizjak@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Nadav Amit <namit@...are.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

On Wed, Oct 18, 2023 at 9:46 AM Uros Bizjak <ubizjak@...il.com> wrote:
>
> On Tue, Oct 17, 2023 at 11:53 PM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Tue, 17 Oct 2023 at 14:06, Uros Bizjak <ubizjak@...il.com> wrote:
> > >
> > > But adding the attached patch on top of both patches boots OK.
> >
> > Funky.
> >
> > Mind adding a
> >
> >         WARN_ON_ONCE(!active_mm);
> >
> > to there to give a nice backtrace for the odd NULL case.
>
> [    4.907840] Call Trace:
> [    4.908909]  <TASK>
> [    4.909858]  ? __warn+0x7b/0x120
> [    4.911108]  ? begin_new_exec+0x90f/0xa30
> [    4.912602]  ? report_bug+0x164/0x190
> [    4.913929]  ? handle_bug+0x3c/0x70
> [    4.915179]  ? exc_invalid_op+0x17/0x70
> [    4.916569]  ? asm_exc_invalid_op+0x1a/0x20
> [    4.917969]  ? begin_new_exec+0x90f/0xa30
> [    4.919303]  ? begin_new_exec+0x3ce/0xa30
> [    4.920667]  ? load_elf_phdrs+0x67/0xb0
> [    4.921935]  load_elf_binary+0x2bb/0x1770
> [    4.923262]  ? __kernel_read+0x136/0x2d0
> [    4.924563]  bprm_execve+0x277/0x630
> [    4.925703]  kernel_execve+0x145/0x1a0
> [    4.926890]  call_usermodehelper_exec_async+0xcb/0x180
> [    4.928408]  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
> [    4.930515]  ret_from_fork+0x2f/0x50
> [    4.931894]  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
> [    4.933941]  ret_from_fork_asm+0x1b/0x30
> [    4.935371]  </TASK>
> [    4.936212] ---[ end trace 0000000000000000 ]---
>
> >
> > That code *is* related to 'current', in how we do
> >
> >         tsk = current;
> > ...
> >         local_irq_disable();
> >         active_mm = tsk->active_mm;
> >         tsk->active_mm = mm;
> >         tsk->mm = mm;
> > ...
> >         activate_mm(active_mm, mm);
> > ...
> >         mmdrop_lazy_tlb(active_mm);
> >
> > but I don't see how 'active_mm' could *poossibly* be validly NULL
> > here, and why caching 'current' would matter and change it.
>
> I have also added "__attribute__((optimize(0)))" to exec_mmap() to
> weed out compiler bugs. The result was the same oops in
> mmdrop_lazy_tlb.
>
> Also, when using WARN_ON instead of WARN_ON_ONCE, it triggers only
> once during the whole boot, with the above trace.
>
> Another observation: adding WARN_ON to the top of exec_mmap:
>
>     WARN_ON(!current->active_mm);
>     /* Notify parent that we're no longer interested in the old VM */
>     tsk = current;
>     old_mm = current->mm;
>
> also triggers WARN, suggesting that current does not have active_mm
> set on the entry to the function.

Solved.

All that is needed is to patch cpu_init() from
arch/x86/kernel/cpu/common.c with:

--cut here--
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b14fc8c1c953..61b6fcdf6937 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2228,7 +2232,7 @@ void cpu_init_exception_handling(void)
 */
void cpu_init(void)
{
-       struct task_struct *cur = current;
+       struct task_struct *cur = this_cpu_read_stable(pcpu_hot.current_task);
       int cpu = raw_smp_processor_id();

#ifdef CONFIG_NUMA
--cut here--

This is effectively the old  get_current(). Since we declare and export

+DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
+                       const_pcpu_hot) __attribute__((alias("pcpu_hot")));
+EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);

in the same file, and the "new" current represents just

return const_pcpu_hot.current_task;

GCC assumes and over-optimizes something and seemingly doesn't fully
initialize the

    cur->active_mm = &init_mm;

below.

Have to run now, but this will be easy to fix.

Uros.