linux-kernel - Re: [PATCH 4.4 00/37] 4.4.110-stable review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFwgqH4=fFPxQT8O08pZwa14Fsbn6-_Oj1vAmwnE88roqw@mail.gmail.com>
Date:   Thu, 4 Jan 2018 12:11:56 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Pavel Tatashin <soleen@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Guenter Roeck <linux@...ck-us.net>,
        Shuah Khan <shuahkh@....samsung.com>, patches@...nelci.org,
        Ben Hutchings <ben.hutchings@...ethink.co.uk>,
        lkft-triage@...ts.linaro.org, stable <stable@...r.kernel.org>
Subject: Re: [PATCH 4.4 00/37] 4.4.110-stable review

On Thu, Jan 4, 2018 at 8:38 AM, Pavel Tatashin <soleen@...il.com> wrote:
> I am getting the following panic when trying to boot 4.4.110rc1 on
> Intel(R) Xeon(R) CPU E5-2630:
>
> [    5.923489] BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
> [    5.932259] IP: [<ffffffff810e70d2>] dyntick_save_progress_counter+0x12/0x50

Hmm. You don't have the "Code:" line in this oops anywhere, do you?

> [    5.977905] RIP: dyntick_save_progress_counter+0x12/0x50
> [    5.988505] RSP: 0000:ffff881ff2f27dc0  EFLAGS: 00010046
> [    5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000
> [    6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140
> [    6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359
> [    6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140
> [    6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000
> [    6.034262] FS:  0000000000000000(0000) GS:ffff881fff940000(0000) knlGS:0000000000000000
> [    6.043293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670
> [    6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.073603] Stack:
> [    6.075847]  ffff881ff2f27e18 ffffffff810e8fac 0000000000000202 ffff881ff2f27e60
> [    6.084158]  ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140 ffffffff81b127a0
> [    6.092465]  0000000000000001 0000000000000000 0000000000000003 ffff881ff2f27eb8
> [    6.100768] Call Trace:
> [    6.103501]  [<ffffffff810e8fac>] force_qs_rnp+0xdc/0x150

The oops looks like it *might* be this:

        lock xadd %edx,0xc(%rax)

which is from the

        int snap = atomic_add_return(0, &rdtp->dynticks);

in rcu_dynticks_snap() because %rax is 1 and that would give you the
invalid page fault and the right faulting address.

But that would be complete rcu data structure corruption (that rdtp
pointer comes from

        per_cpu_ptr(rsp->rda, cpu)

in force_qs_rnp(), afaik.

The PTI patches obviously change percpu stuff, but this looks like an
odd place for that to manifest.

                 Linus