[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cd6b39b-1496-bbd5-9e31-5e3dcb31feda@arm.com>
Date:   Mon, 14 May 2018 10:45:02 +0100
From:   Suzuki K Poulose <Suzuki.Poulose@....com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Jia He <hejianet@...il.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        Minchan Kim <minchan@...nel.org>,
        Claudio Imbrenda <imbrenda@...ux.vnet.ibm.com>,
        Arvind Yadav <arvind.yadav.cs@...il.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, jia.he@...-semitech.com,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v2] mm/ksm: ignore STABLE_FLAG of rmap_item->address in
 rmap_walk_ksm
On 10/05/18 00:31, Andrew Morton wrote:
> On Fri,  4 May 2018 11:11:46 +0800 Jia He <hejianet@...il.com> wrote:
> 
>> In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by PAGE_SIZE
>> unaligned for rmap_item->address under memory pressure tests(start 20 guests
>> and run memhog in the host).
>>
>> ...
>>
>> In rmap_walk_ksm, the rmap_item->address might still have the STABLE_FLAG,
>> then the start and end in handle_hva_to_gpa might not be PAGE_SIZE aligned.
>> Thus it will cause exceptions in handle_hva_to_gpa on arm64.
>>
>> This patch fixes it by ignoring(not removing) the low bits of address when
>> doing rmap_walk_ksm.
>>
>> Signed-off-by: jia.he@...-semitech.com
> 
> I assumed you wanted this patch to be committed as
> From:jia.he@...-semitech.com rather than From:hejianet@...il.com, so I
> made that change.  Please let me know if this was inappropriate.
> 
> You can do this yourself by adding an explicit From: line to the very
> start of the patch's email text.
> 
> Also, a storm of WARN_ONs is pretty poor behaviour.  Is that the only
> misbehaviour which this bug causes?  Do you think the fix should be
> backported into earlier kernels?
> 
I think its just not the WARN_ON(). We do more than what is probably
intended with an unaligned address. i.e, We could be modifying the
flags for other pages that were not affected.
e.g :
In the original report [0], the trace looked like :
[  800.511498] [<ffff0000080b4f2c>] kvm_age_hva_handler+0xcc/0xd4
[  800.517324] [<ffff0000080b4838>] handle_hva_to_gpa+0xec/0x15c
[  800.523063] [<ffff0000080b6c5c>] kvm_age_hva+0x5c/0xcc
[  800.528194] [<ffff0000080a7c3c>] kvm_mmu_notifier_clear_flush_young+0x54/0x90
[  800.535324] [<ffff00000827a0e8>] __mmu_notifier_clear_flush_young+0x6c/0xa8
[  800.542279] [<ffff00000825a644>] page_referenced_one+0x1e0/0x1fc
[  800.548279] [<ffff00000827e8f8>] rmap_walk_ksm+0x124/0x1a0
[  800.553759] [<ffff00000825c974>] rmap_walk+0x94/0x98
[  800.558717] [<ffff00000825ca98>] page_referenced+0x120/0x180
[  800.564369] [<ffff000008228c58>] shrink_active_list+0x218/0x4a4
[  800.570281] [<ffff000008229470>] shrink_node_memcg+0x58c/0x6fc
[  800.576107] [<ffff0000082296c4>] shrink_node+0xe4/0x328
[  800.581325] [<ffff000008229c9c>] do_try_to_free_pages+0xe4/0x3b8
[  800.587324] [<ffff00000822a094>] try_to_free_pages+0x124/0x234
[  800.593150] [<ffff000008216aa0>] __alloc_pages_nodemask+0x564/0xf7c
[  800.599412] [<ffff000008292814>] khugepaged_alloc_page+0x38/0xb8
[  800.605411] [<ffff0000082933bc>] collapse_huge_page+0x74/0xd70
[  800.611238] [<ffff00000829470c>] khugepaged_scan_mm_slot+0x654/0xa98
[  800.617585] [<ffff000008294e0c>] khugepaged+0x2bc/0x49c
[  800.622803] [<ffff0000080ffb70>] kthread+0x124/0x150
[  800.627762] [<ffff0000080849f0>] ret_from_fork+0x10/0x1c
[  800.633066] ---[ end trace 944c130b5252fb01 ]---
Now, the ksm wants to mark *a page* as referenced via page_referenced_one(),
passing it an unaligned address. This could eventually turn out to be
one of :
ptep_clear_flush_young_notify(address, address + PAGE_SIZE)
or
pmdp_clear_flush_young_notify(address, address + PMD_SIZE)
which now spans two pages/pmds and the notifier consumer might
take an action on the second page as well, which is not something
intended. So, I do think that old behavior is wrong and has other
side effects as mentioned above.
[0] https://lkml.kernel.org/r/1525244911-5519-1-git-send-email-hejianet@gmail.com
Suzuki
Powered by blists - more mailing lists
 
