lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210602204357.fq2yahccehf6cqjh@treble>
Date:   Wed, 2 Jun 2021 15:43:57 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Lukasz Majczak <lma@...ihalf.com>
Cc:     Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        jgross@...e.com, mbenes@...e.com, linux-kernel@...r.kernel.org,
        upstream@...ihalf.com,
        Radosław Biernacki <rad@...ihalf.com>,
        Łukasz Bartosik <lb@...ihalf.com>,
        Guenter Roeck <groeck@...gle.com>
Subject: Re: [PATCH v3 16/16] objtool,x86: Rewrite retpoline thunk calls

On Wed, Jun 02, 2021 at 05:51:01PM +0200, Lukasz Majczak wrote:
> Hi Peter,
> 
> This patch seems to crash on Tigerlake platform (Chromebook delbin), I
> got the following error:
> 
> [    2.103054] pcieport 0000:00:1c.0: PME: Signaling with IRQ 122
> [    2.110148] pcieport 0000:00:1c.0: pciehp: Slot #7 AttnBtn-
> PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+
> IbPresDis- LLActRep+
> [    2.126754] pcieport 0000:00:1d.0: PME: Signaling with IRQ 123
> [    2.133946] ACPI: \_SB_.CP00: Found 3 idle states
> [    2.139708] BUG: kernel NULL pointer dereference, address: 000000000000012b
> [    2.140704] #PF: supervisor read access in kernel mode
> [    2.140704] #PF: error_code(0x0000) - not-present page
> [    2.140704] PGD 0 P4D 0
> [    2.140704] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [    2.140704] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>   5.13.0-rc1 #31
> [    2.140704] Hardware name: Google Delbin/Delbin, BIOS
> Google_Delbin.13672.156.3 05/14/2021
> [    2.140704] RIP: 0010:cpuidle_poll_time+0x9/0x6a
> [    2.140704] Code: 44 00 00 85 f6 78 19 55 48 89 e5 48 8b 05 16 44
> 44 01 4c 8b 58 40 4d 85 db 5d 41 ff d3 66 90 00 c3 0f 1f 44 00 00 55
> 48 89 e5 <48> 8b 46 20 48 85 c0 75 56 4c 63 87 28 04 00 00 b8 24 f49
> [    2.140704] RSP: 0000:ffffffff9cc03ea8 EFLAGS: 00010282
> [    2.140704] RAX: 0000000000008e7d RBX: ffffffff9cc1c5fd RCX: 000000007f894e5a
> [    2.140704] RDX: 000000007f894d4f RSI: 000000000000010b RDI: 0000000002fa1cf6
> [    2.140704] RBP: ffffffff9cc03ea8 R08: 0000000000000000 R09: 00000000ca948246
> [    2.140704] R10: 0000000000000000 R11: ffffffff9bf132cb R12: 0000000000000003
> [    2.140704] R13: ffffbbfdffc21960 R14: 0000000000000000 R15: ffffffff9cdba638
> [    2.140704] FS:  0000000000000000(0000) GS:ffff928280000000(0000)
> knlGS:0000000000000000
> [    2.140704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.140704] CR2: 000000000000012b CR3: 000000027e414001 CR4: 0000000000770ef0
> [    2.140704] PKRU: 55555554
> [    2.140704] Call Trace:
> [    2.140704]  do_idle+0x175/0x1f6
> [    2.140704]  cpu_startup_entry+0x1d/0x1f
> [    2.140704]  start_kernel+0x3be/0x420
> [    2.140704]  secondary_startup_64_no_verify+0xb0/0xbb

Assuming I'm looking at the right code, this is weird.

cpuidle_poll_time()'s only caller is poll_idle(), which isn't even
listed in the stack trace.  Maybe the function before
cpuidle_poll_time() fell through into it somehow.  Or execution got
otherwise hosed.  That would also explain the bad function argument.

In addition to the data Peter requested, it would also be interesting to
see the disassembly of do_idle() with objdump -dr, to see which function
got called before it went off the rails.

-- 
Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ