lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bebcaffc-d485-912d-0c42-c0781f9c7603@toxicpanda.com>
Date:   Wed, 17 Mar 2021 11:19:21 -0400
From:   Josef Bacik <josef@...icpanda.com>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        "kernel-team@...com" <kernel-team@...com>
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On 3/16/21 10:50 PM, Kai-Heng Feng wrote:
> Hi,
> 
> On Wed, Mar 17, 2021 at 10:17 AM Josef Bacik <josef@...icpanda.com> wrote:
>>
>> This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
>>
>> This patch causes a panic when rebooting my Dell Poweredge r440.  I do
>> not have the full panic log as it's lost at that stage of the reboot and
>> I do not have a serial console.  Reverting this patch makes my system
>> able to reboot again.
> 
> But this patch also helps many HP laptops, so maybe we should figure
> out what's going on on Poweredge r440.
> Does it also panic on shutdown?
> 

Sure I'll test whatever to get it fixed, but I just wasted 3 days bisecting and 
lost a weekend of performance testing on btrfs because of this regression, so 
until you figure out how it broke it needs to be reverted so people don't have 
to figure out why reboot suddenly isn't working.

Running "halt" has the same effect with and without your patch, it gets to 
"system halted" and just sits there without powering off.  Not entirely sure why 
that is, but there's no panic.

The panic itself is lost, but I see there's an NMI and I have the RIP

(gdb) list *('mwait_idle_with_hints.constprop.0'+0x4b)
0xffffffff816dabdb is in mwait_idle_with_hints 
(./arch/x86/include/asm/current.h:15).
10
11	DECLARE_PER_CPU(struct task_struct *, current_task);
12
13	static __always_inline struct task_struct *get_current(void)
14	{
15		return this_cpu_read_stable(current_task);
16	}
17
18	#define current get_current()
19

<mwait_idle_with_hints.constprop.0>:    jmp    0xffffffff936dac02 
<mwait_idle_with_hints.constprop.0+0x72>
<mwait_idle_with_hints.constprop.0+0x2>:        nopl   (%rax)
<mwait_idle_with_hints.constprop.0+0x5>:        jmp    0xffffffff936dabac 
<mwait_idle_with_hints.constprop.0+0x1c>
<mwait_idle_with_hints.constprop.0+0x7>:        nopl   (%rax)
<mwait_idle_with_hints.constprop.0+0xa>:        mfence
<mwait_idle_with_hints.constprop.0+0xd>:        mov    %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x16>:       clflush (%rax)
<mwait_idle_with_hints.constprop.0+0x19>:       mfence
<mwait_idle_with_hints.constprop.0+0x1c>:       xor    %edx,%edx
<mwait_idle_with_hints.constprop.0+0x1e>:       mov    %rdx,%rcx
<mwait_idle_with_hints.constprop.0+0x21>:       mov    %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x2a>:       monitor %rax,%rcx,%rdx
<mwait_idle_with_hints.constprop.0+0x2d>:       mov    (%rax),%rax
<mwait_idle_with_hints.constprop.0+0x30>:       test   $0x8,%al
<mwait_idle_with_hints.constprop.0+0x32>:       jne    0xffffffff936dabdb 
<mwait_idle_with_hints.constprop.0+0x4b>
<mwait_idle_with_hints.constprop.0+0x34>:       jmpq   0xffffffff936dabd0 
<mwait_idle_with_hints.constprop.0+0x40>
<mwait_idle_with_hints.constprop.0+0x39>:       verw   0x9f9fec(%rip)        # 
0xffffffff940d4bbc
<mwait_idle_with_hints.constprop.0+0x40>:       mov    $0x1,%ecx
<mwait_idle_with_hints.constprop.0+0x45>:       mov    %rdi,%rax
<mwait_idle_with_hints.constprop.0+0x48>:       mwait  %rax,%rcx
<mwait_idle_with_hints.constprop.0+0x4b>:       mov    %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x54>:       lock andb $0xdf,0x2(%rax)
<mwait_idle_with_hints.constprop.0+0x59>:       lock addl $0x0,-0x4(%rsp)
<mwait_idle_with_hints.constprop.0+0x5f>:       mov    (%rax),%rax
<mwait_idle_with_hints.constprop.0+0x62>:       test   $0x8,%al
<mwait_idle_with_hints.constprop.0+0x64>:       je     0xffffffff936dac01 
<mwait_idle_with_hints.constprop.0+0x71>
<mwait_idle_with_hints.constprop.0+0x66>:       andl 
$0x7fffffff,%gs:0x6c93cf7f(%rip)        # 0x17b80
<mwait_idle_with_hints.constprop.0+0x71>:       retq
<mwait_idle_with_hints.constprop.0+0x72>:       mov    %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x7b>:       lock orb $0x20,0x2(%rax)
<mwait_idle_with_hints.constprop.0+0x80>:       mov    (%rax),%rax
<mwait_idle_with_hints.constprop.0+0x83>:       test   $0x8,%al
<mwait_idle_with_hints.constprop.0+0x85>:       jne    0xffffffff936dabdb 
<mwait_idle_with_hints.constprop.0+0x4b>
<mwait_idle_with_hints.constprop.0+0x87>:       jmpq   0xffffffff936dab95 
<mwait_idle_with_hints.constprop.0+0x5>
<mwait_idle_with_hints.constprop.0+0x8c>:       nopl   0x0(%rax)

0x4b is after the mwait, which means we're panicing in the 
current_clr_polling(), where we do clear_thread_flag(TIF_POLLING_NRFLAG).  Thanks,

Josef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ