[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9712f15-21fd-79a1-0292-8ca88d710c92@lwfinger.net>
Date: Fri, 29 Jun 2018 21:38:07 -0500
From: Denise Finger <Larry.Finger@...inger.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Matthew Wilcox <willy@...radead.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Christoph Lameter <cl@...ux.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Jerome Glisse <jglisse@...hat.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Pekka Enberg <penberg@...nel.org>,
Randy Dunlap <rdunlap@...radead.org>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
ppc-dev <linuxppc-dev@...ts.ozlabs.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot -
bisected to commit 1d40a5ea01d5
On 06/29/2018 04:01 PM, Linus Torvalds wrote:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@...inger.net> wrote:
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>> page->page_type. The routine was called twice. The first had page_type of
>> 0xfffffbff, which would have been expected for a . The second call had
>> 0xffffffff, which led to the BUG.
>
> So it looks to me like the tear-down of the page tables first found a
> page that is indeed a page table, and cleared the page table bit
> (well, it set it - the bits are reversed).
>
> Then it took an exception (that "interrupt: 700") and that causes
> do_exit() again, and it tries to free the same page table - and now
> it's no longer marked as a page table, because it already went through
> the __ClearPageTable() dance once.
>
> So on the second path through, it catches that "the bit already said
> it wasn't a page table" and does the BUG.
>
> But the real question is what the problem was the *first* time around.
> I assume that has scrolled off the screen? This part:
>
> _exception_pkey+0x58/0x128
> ret_from_except_full+0x0/0x4
> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
> LR = free_pgd_range+0x19c/0x30c
> free_pgtables+0xa/0xb
> exit_mnap+0xf4/0x16c
> mmput+0x64/0xf0
>
> Does reverting that commit 1d40a5ea01d5 make everything work for you?
> Because if so, judging by the deafening silence on this so far, I
> think that's what we should do.
>
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?
The deafening silence may be due to my having an old Microsoft address for
Matthew Wilcox in my first posting. He should now have received the BUG report,
and he may have some suggestions. Yes, reverting commit 1d40a5ea01d5 does permit
the box to boot.
Kirill's patch also works, which seems like a better solution. If any other
architecture bugs on boot, at least we will know where to look. :)
@Kirill: You may add a Reported-by: and Tested-by: Larry Finger
<Larry.Finger@...inger.net> to the patch.
Thanks for the help,
Larry
Powered by blists - more mailing lists