[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c417ab1-a808-72ea-9618-3d76ec203684@arm.com>
Date: Thu, 24 May 2018 09:44:16 +0100
From: Suzuki K Poulose <Suzuki.Poulose@....com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Jia He <hejianet@...il.com>
Cc: Andrea Arcangeli <aarcange@...hat.com>,
Minchan Kim <minchan@...nel.org>,
Claudio Imbrenda <imbrenda@...ux.vnet.ibm.com>,
Arvind Yadav <arvind.yadav.cs@...il.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, jia.he@...-semitech.com,
Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v2] mm/ksm: ignore STABLE_FLAG of rmap_item->address in
rmap_walk_ksm
On 14/05/18 10:45, Suzuki K Poulose wrote:
> On 10/05/18 00:31, Andrew Morton wrote:
>> On Fri, 4 May 2018 11:11:46 +0800 Jia He <hejianet@...il.com> wrote:
>>
>>> In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by PAGE_SIZE
>>> unaligned for rmap_item->address under memory pressure tests(start 20 guests
>>> and run memhog in the host).
>>>
>>> ...
>>>
>>> In rmap_walk_ksm, the rmap_item->address might still have the STABLE_FLAG,
>>> then the start and end in handle_hva_to_gpa might not be PAGE_SIZE aligned.
>>> Thus it will cause exceptions in handle_hva_to_gpa on arm64.
>>>
>>> This patch fixes it by ignoring(not removing) the low bits of address when
>>> doing rmap_walk_ksm.
>>>
>>> Signed-off-by: jia.he@...-semitech.com
>>
>> I assumed you wanted this patch to be committed as
>> From:jia.he@...-semitech.com rather than From:hejianet@...il.com, so I
>> made that change. Please let me know if this was inappropriate.
>>
>> You can do this yourself by adding an explicit From: line to the very
>> start of the patch's email text.
>>
>> Also, a storm of WARN_ONs is pretty poor behaviour. Is that the only
>> misbehaviour which this bug causes? Do you think the fix should be
>> backported into earlier kernels?
>>
Jia, Andrew,
What is the status of this patch ?
Suzuki
>
> I think its just not the WARN_ON(). We do more than what is probably
> intended with an unaligned address. i.e, We could be modifying the
> flags for other pages that were not affected.
>
> e.g :
>
> In the original report [0], the trace looked like :
>
>
> [ 800.511498] [<ffff0000080b4f2c>] kvm_age_hva_handler+0xcc/0xd4
> [ 800.517324] [<ffff0000080b4838>] handle_hva_to_gpa+0xec/0x15c
> [ 800.523063] [<ffff0000080b6c5c>] kvm_age_hva+0x5c/0xcc
> [ 800.528194] [<ffff0000080a7c3c>] kvm_mmu_notifier_clear_flush_young+0x54/0x90
> [ 800.535324] [<ffff00000827a0e8>] __mmu_notifier_clear_flush_young+0x6c/0xa8
> [ 800.542279] [<ffff00000825a644>] page_referenced_one+0x1e0/0x1fc
> [ 800.548279] [<ffff00000827e8f8>] rmap_walk_ksm+0x124/0x1a0
> [ 800.553759] [<ffff00000825c974>] rmap_walk+0x94/0x98
> [ 800.558717] [<ffff00000825ca98>] page_referenced+0x120/0x180
> [ 800.564369] [<ffff000008228c58>] shrink_active_list+0x218/0x4a4
> [ 800.570281] [<ffff000008229470>] shrink_node_memcg+0x58c/0x6fc
> [ 800.576107] [<ffff0000082296c4>] shrink_node+0xe4/0x328
> [ 800.581325] [<ffff000008229c9c>] do_try_to_free_pages+0xe4/0x3b8
> [ 800.587324] [<ffff00000822a094>] try_to_free_pages+0x124/0x234
> [ 800.593150] [<ffff000008216aa0>] __alloc_pages_nodemask+0x564/0xf7c
> [ 800.599412] [<ffff000008292814>] khugepaged_alloc_page+0x38/0xb8
> [ 800.605411] [<ffff0000082933bc>] collapse_huge_page+0x74/0xd70
> [ 800.611238] [<ffff00000829470c>] khugepaged_scan_mm_slot+0x654/0xa98
> [ 800.617585] [<ffff000008294e0c>] khugepaged+0x2bc/0x49c
> [ 800.622803] [<ffff0000080ffb70>] kthread+0x124/0x150
> [ 800.627762] [<ffff0000080849f0>] ret_from_fork+0x10/0x1c
> [ 800.633066] ---[ end trace 944c130b5252fb01 ]---
>
> Now, the ksm wants to mark *a page* as referenced via page_referenced_one(),
> passing it an unaligned address. This could eventually turn out to be
> one of :
>
> ptep_clear_flush_young_notify(address, address + PAGE_SIZE)
>
> or
>
> pmdp_clear_flush_young_notify(address, address + PMD_SIZE)
>
> which now spans two pages/pmds and the notifier consumer might
> take an action on the second page as well, which is not something
> intended. So, I do think that old behavior is wrong and has other
> side effects as mentioned above.
>
> [0] https://lkml.kernel.org/r/1525244911-5519-1-git-send-email-hejianet@gmail.com
>
> Suzuki
Powered by blists - more mailing lists