[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <270e7e43-d3f9-0305-4764-7e23b2d515a2@oracle.com>
Date: Fri, 5 Jan 2018 20:16:09 -0500
From: Pavel Tatashin <pasha.tatashin@...cle.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andy Lutomirski <luto@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Voegtle <tv@...96.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Guenter Roeck <linux@...ck-us.net>,
Shuah Khan <shuahkh@....samsung.com>, patches@...nelci.org,
Ben Hutchings <ben.hutchings@...ethink.co.uk>,
lkft-triage@...ts.linaro.org, stable <stable@...r.kernel.org>
Subject: Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Hugh,
Thank you very much for your very thoughtful input.
I quiet positive this problem is PTI regression, because exactly the
same problem I see with kernel 4.1 to which I back-ported all the
necessary PTI patches from 4.4.110. I will provide this thread with more
information as I collect it. I will also try to root cause the problem.
The bug has memory corruption behavior, but with both 4.1 and 4.4
kernels problem goes away when I boot with noefi parameter. So, EFI +
PTI is the culprit for this memory corruption.
Thank you,
Pavel
On 01/05/2018 06:15 PM, Hugh Dickins wrote:
> On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashin
> <pasha.tatashin@...cle.com> wrote:
>> The hardware works :) I meant that before the patch linked in
>> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But
>> with that patch applied, I was able to boot it at least once, but it could
>> be accidental. The hang/panic does not happen at the same time on every
>> boot.
>
> I get the feeling that it was accidental: it seems to me that you have
> a memory corruption problem, that gets shifted around by the different
> patches (or "noefi" or "nopti").
>
> Because yesterday your boots were able to get way beyond the "EFI
> Variables Facility" message, and I can't imagine why the EFI issue
> would not have been equally debilitating on yesterday's 110-rc, if it
> were in play.
>
> I did intend to ask you to send your System.map, for us to scan
> through: maybe some variable is marked __init and should not be, then
> the "Freeing unused kernel memory" frees it for random reuse.
>
> But today you didn't get anywhere near the "Freeing unused kernel
> memory", so that can't be it - or do you sometimes get that far today?
>
> You mention that the hang/panic does not happen at the same time on
> every boot: I think all I can ask is for you to keep supplying us with
> different examples (console messages) of where it occurs, in the hope
> that one of them will point us in the right direction.
>
> And it even seems possible that this has nothing to do with the
> 4.4.110 changes - that 4.4.109 plus some other random patches would
> unleash similar corruption. Though on balance that does seem unlikely.
>
> Hugh
Powered by blists - more mailing lists