[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <c6461a35-c66d-6c62-a255-175917d8881b@linux.ibm.com>
Date: Sat, 30 Jun 2018 11:53:32 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Larry Finger <Larry.Finger@...inger.net>,
Matthew Wilcox <willy@...radead.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Christoph Lameter <cl@...ux.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Jerome Glisse <jglisse@...hat.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Pekka Enberg <penberg@...nel.org>,
Randy Dunlap <rdunlap@...radead.org>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
ppc-dev <linuxppc-dev@...ts.ozlabs.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot -
bisected to commit 1d40a5ea01d5
On 06/30/2018 03:16 AM, Kirill A. Shutemov wrote:
> On Fri, Jun 29, 2018 at 02:01:46PM -0700, Linus Torvalds wrote:
>> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@...inger.net> wrote:
>>>
>>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>>> page->page_type. The routine was called twice. The first had page_type of
>>> 0xfffffbff, which would have been expected for a . The second call had
>>> 0xffffffff, which led to the BUG.
>>
>> So it looks to me like the tear-down of the page tables first found a
>> page that is indeed a page table, and cleared the page table bit
>> (well, it set it - the bits are reversed).
>>
>> Then it took an exception (that "interrupt: 700") and that causes
>> do_exit() again, and it tries to free the same page table - and now
>> it's no longer marked as a page table, because it already went through
>> the __ClearPageTable() dance once.
>>
>> So on the second path through, it catches that "the bit already said
>> it wasn't a page table" and does the BUG.
>>
>> But the real question is what the problem was the *first* time around.
>
> +Aneesh.
>
> Looks like pgtable_page_dtor() gets called in __pte_free_tlb() path twice.
> Once in __pte_free_tlb() itself and the second time in pgtable_free().
>
> Would this help?
>
> diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> index 6a6673907e45..e7a2f0e6b695 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> @@ -137,7 +137,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
> static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
> unsigned long address)
> {
> - pgtable_page_dtor(table);
> pgtable_free_tlb(tlb, page_address(table), 0);
> }
> #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
> diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> index 1707781d2f20..30a13b80fd58 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> @@ -139,7 +139,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
> unsigned long address)
> {
> tlb_flush_pgtable(tlb, address);
> - pgtable_page_dtor(table);
> pgtable_free_tlb(tlb, page_address(table), 0);
> }
> #endif /* _ASM_POWERPC_PGALLOC_32_H */
>
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-June/175015.html
Also part of pull request from Michael Ellerman
-aneesh
Powered by blists - more mailing lists