lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whK=G1o6RtS9DS3wEGF1KU7WLgLL1+6Se86bj8m7wwqrQ@mail.gmail.com>
Date: Fri, 15 Mar 2024 09:42:04 -0700
From: Linus Torvalds <torvalds@...uxfoundation.org>
To: Guenter Roeck <linux@...ck-us.net>
Cc: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>, x86@...nel.org, 
	Uros Bizjak <ubizjak@...il.com>, linux-sparse@...r.kernel.org, lkp@...el.com, 
	oe-kbuild-all@...ts.linux.dev
Subject: Re: [patch 5/9] x86: Cure per CPU madness on UP

On Fri, 15 Mar 2024 at 09:17, Guenter Roeck <linux@...ck-us.net> wrote:
>
> [    3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
> [    3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff

The code is

  mov    %rax,0x60(%rbx)
  call   0x125f5f
  mov    0x26184b4(%rip),%edx
  mov    0x3078f01(%rip),%rax
  movq   $0xffffffffb68084e0,0x90(%rbx)
  mov    %rbx,0x138(%rax,%rdx,8)                <-- trapping instruction
  jmp    <backwards>

with %rdx being some index having the value 0xffffffed (-19).

That's ENODEV.

Without line numbers (if you have debug info for that kernel, it's
good to run "scripts/decode_stacktrace.sh" on stack traces) it's hard
to really know what's up, but I strongly suspect that it's this:

        rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;

because we have

   topology_logical_die_id(cpu) ->
       (cpu_data(cpu).topo.logical_die_id)

and we have

    c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);

and topology_get_logical_id() does this:

        if (lvlid >= MAX_LOCAL_APIC)
                return -ERANGE;
        if (!test_bit(lvlid, apic_maps[at_level].map))
                return -ENODEV;

so that -ENODEV is not entirely unlikely for a UP run.

This also explains why it *used* to work - that whole thing is new to
the current merge window and came in through commit ca7e91776912
("Merge tag 'x86-apic-2024-03-10' of ...").

Thomas, over to you. I wonder if maybe all those topology macros
should just return 0 on an UP build, but that
topology_get_logical_id() thing looks a bit wrong regardless.

It really shouldn't depend on local apic data for configs that may not
*have* a local apic.

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ