lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e151cfd9-c0a5-4536-a490-77dc47aa3ed6@redhat.com>
Date: Fri, 14 Jun 2024 10:51:10 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, chrisl@...nel.org,
 linux-kernel@...r.kernel.org, mhocko@...e.com, ryan.roberts@....com,
 baolin.wang@...ux.alibaba.com, yosryahmed@...gle.com, shy828301@...il.com,
 surenb@...gle.com, v-songbaohua@...o.com, willy@...radead.org,
 ying.huang@...el.com, yuzhao@...gle.com
Subject: Re: [PATCH RFC 3/3] mm: remove folio_test_anon(folio)==false path in
 __folio_add_anon_rmap()

On 14.06.24 09:56, Barry Song wrote:
> On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 13.06.24 11:06, Barry Song wrote:
>>> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@...hat.com> wrote:
>>>>
>>>> On 13.06.24 10:46, Barry Song wrote:
>>>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@...il.com> wrote:
>>>>>>
>>>>>> From: Barry Song <v-songbaohua@...o.com>
>>>>>>
>>>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
>>>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
>>>>>> consistently pass anonymous folios.
>>>>>>
>>>>>> stack 1:
>>>>>> remove_migration_pmd
>>>>>>       -> folio_add_anon_rmap_pmd
>>>>>>         -> __folio_add_anon_rmap
>>>>>>
>>>>>> stack 2:
>>>>>> __split_huge_pmd_locked
>>>>>>       -> folio_add_anon_rmap_ptes
>>>>>>          -> __folio_add_anon_rmap
>>>>>>
>>>>>> __folio_add_anon_rmap() only needs to handle the cases
>>>>>> folio_test_anon(folio)==true now.
>>>>>
>>>>> My team reported a case where swapoff() is calling
>>>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
>>>>> with one new anon  (!folio_test_anon(folio)).
>>>>>
>>>>> I will double check all folio_add_anon_rmap_pte() cases.
>>>>
>>>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
>>>> batching on the swapoff() path).
>>>>
>>>> But we should never get a new large anon folio there, or could that now
>>>> happen?
>>>
>>> My team encountered the following issue while testing this RFC
>>> series on real hardware. Let me take a look to identify the
>>> problem.
>>>
>>> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
>>> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
>>> [  261.214213][T11285] memcg:ffffff8003678000
>>> [  261.214217][T11285] aops:swap_aops
>>> [  261.214233][T11285] flags:
>>> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
>>> [  261.214241][T11285] page_type: 0x0()
>>> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
>>> dead000000000122 0000000000000000
>>> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
>>> 0000000400000000 ffffff8003678000
>>> [  261.214254][T11285] page dumped because:
>>> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
>>> [  261.214257][T11285] page_owner tracks the page as allocated
>>> [  261.214260][T11285] page last allocated via order 0, migratetype
>>> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
>>> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
>>> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
>>> [  261.214284][T11285]  prep_new_page+0x28/0x13c
>>> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
>>> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
>>> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
>>> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
>>> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
>>> [  261.214326][T11285]  swapin_readahead+0x64/0x448
>>> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
>>> [  261.214337][T11285]  invoke_syscall+0x58/0x114
>>> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
>>> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
>>> [  261.214366][T11285]  el0_svc+0x38/0x68
>>> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
>>> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
>>> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
>>> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
>>> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
>>> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
>>> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
>>> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
>>> [  261.214422][T11285]  shrink_one+0xa8/0x234
>>> [  261.214427][T11285]  shrink_node+0xb38/0xde0
>>> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
>>> [  261.214437][T11285]  kswapd+0x290/0x4e4
>>> [  261.214442][T11285]  kthread+0x114/0x1bc
>>> [  261.214459][T11285]  ret_from_fork+0x10/0x20
>>> [  261.214477][T11285] ------------[ cut here ]------------
>>> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
>>> folio_add_anon_rmap_ptes+0x2b4/0x330
>>>
>>> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
>>> -SSBS BTYPE=--)
>>> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
>>> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
>>> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
>>> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
>>> x27: ffffff80db480000
>>> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
>>> x24: 0000007b44941000
>>> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
>>> x21: fffffffe04cc2980
>>> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
>>> x18: ffffffed011dae80
>>> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
>>> x15: 0000000000000004
>>> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
>>> x12: 0000000000000003
>>> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
>>> : 0d46c0889b468e00
>>> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
>>> : 322e31363220205b
>>> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
>>> : 0000000000000000
>>> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
>>> : ffffff8068c15c80
>>> [  261.215501][T11285] Call trace:
>>> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
>>> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
>>> [  261.215516][T11285]  invoke_syscall+0x58/0x114
>>> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
>>> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
>>> [  261.215539][T11285]  el0_svc+0x38/0x68
>>> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
>>> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
>>> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
>>
>> Ah, yes. in unuse_pte(), you'll have to do the right thing if
>> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
>>
>> You might have a fresh anon folio in the swapcache that was never mapped
>> (hopefully order-0, otherwise we'd likely be in trouble).
> 
> Yes. It is order-0
> 
> [  261.214260][T11285] page last allocated via order 0, migratetype
> 
> Otherwise, we would have encountered this VM_WARN_ON_FOLIO?
> 
> __folio_add_anon_rmap()
> {
> ...
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>       level != RMAP_LEVEL_PMD, folio);
> ...
> }
> 
> Given that nobody has ever reported this warning, I assume all callers
> using folio_add_anon_rmap_pte(s) are right now safe to move to ?

Yes, and we should likely add a VM_WARN_ON_ONCE() here that we have a 
small folio if !anon. If that ever changes, we can assess the situation.

Only swap created "new" anon folios without properly calling the right 
function so far, all other code handles that correctly.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ