[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <78816aa1-299c-6894-f426-f0dff4a41cee@codeaurora.org>
Date: Wed, 27 Sep 2017 21:31:59 -0600
From: Richard Ruigrok <rruigrok@...eaurora.org>
To: Will Deacon <will.deacon@....com>
Cc: Yury Norov <ynorov@...iumnetworks.com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: ARM64: kernel panics in DABT in sys_msync path
On 9/27/2017 12:00 PM, Richard Ruigrok wrote:
>
> On 9/27/2017 9:50 AM, Will Deacon wrote:
>> On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote:
>>> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote:
>>>> On 9/26/2017 4:23 AM, Will Deacon wrote:
>>>>> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote:
>>>>>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I
>>>>>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K
>>>>>> page I was not able to reproduce. RH also reported it here: https://
>>>>>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel
>>>>>> (4.12) on Centriq2400 and ThunderX
>>>>>>
>>>>>>
>>>>>> https://bugs.linaro.org/show_bug.cgi?id=3191
>>>>>>
>>>>>> https://bugs.linaro.org/show_bug.cgi?id=3068.
>>>>> These two aren't the same bug (that's a forward progress issue that we're
>>>>> currently working on). I don't have permission to look at the redhat one,
>>>>> but is it just an RCU stall or actually the Oops reported by Yury?
>>>>>
>>>>>> I was able to bisect down to a specific commit.
>>>>> I think we're chasing two different things here, so not sure I trust the
>>>>> bisect!
>>>>>
>>>> The RCU stall is side effect. The issue I'm seeing has the same stack
>>>> trace and same stimulus (rwtest). Following are the details.
>>> FWIW, I think I've worked out what's going on here and I should have a patch
>>> tomorrow.
>> Diff below. I'm going to follow up with a separate thread about this,
>> because the proper fix is going to be invasive. I'll keep you on cc.
>>
>> Out of curiosity: what version of GCC are you using to compile the kernel?
> I'm using gcc-linaro-6.3.1-2017.02-x86_64_aarch64-linux-gnu
> Thanks for the patch, test results to follow.
> Richard
With this change applied on v4.13, the LTP rwtest passed 50 iterations, it appears to solve the issue I was seeing.
This kernel was built with 5.2.1, I've also started using 6.3.1. If you think it makes a difference I can test also with 6.3.1.
Linux version 4.13.0-00002-g8540910-dirty (rruigrok@...igrok-lnx) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-1)) #55 SMP PREEMPT Wed Sep 27 13:37:25 MDT 2017
Richard
>> Will
>>
>> --->8
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index bc4e92337d16..b46e54c2399b 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>> /* Find an entry in the third-level page table. */
>> #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
>>
>> -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t))
>> +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t))
>> #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr))))
>>
>> #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
Powered by blists - more mailing lists