linux-kernel - Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tka+ZONNFKw=1FM22b-JTPkiKZaKuM3Upyu6pf4=vN_CRg@mail.gmail.com>
Date: Tue, 3 Sep 2024 11:38:37 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Kairui Song <ryncsn@...il.com>
Cc: hanchuanhua@...o.com, Usama Arif <usamaarif642@...il.com>, akpm@...ux-foundation.org, 
	linux-mm@...ck.org, baolin.wang@...ux.alibaba.com, chrisl@...nel.org, 
	david@...hat.com, hannes@...xchg.org, hughd@...gle.com, 
	kaleshsingh@...gle.com, linux-kernel@...r.kernel.org, mhocko@...e.com, 
	minchan@...nel.org, nphamcs@...il.com, ryan.roberts@....com, 
	senozhatsky@...omium.org, shakeel.butt@...ux.dev, shy828301@...il.com, 
	surenb@...gle.com, v-songbaohua@...o.com, willy@...radead.org, 
	xiang@...nel.org, ying.huang@...el.com, hch@...radead.org
Subject: Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices

[..]
>
> With latest mm-unstable, I'm seeing following WARN followed by user
> space segfaults (multiple mTHP enabled):
>
> [   39.145686] ------------[ cut here ]------------
> [   39.145969] WARNING: CPU: 24 PID: 11159 at mm/page_io.c:535
> swap_read_folio+0x4db/0x520
> [   39.146307] Modules linked in:
> [   39.146507] CPU: 24 UID: 1000 PID: 11159 Comm: sh Kdump: loaded Not
> tainted 6.11.0-rc6.orig+ #131
> [   39.146887] Hardware name: Tencent Cloud CVM, BIOS
> seabios-1.9.1-qemu-project.org 04/01/2014
> [   39.147206] RIP: 0010:swap_read_folio+0x4db/0x520
> [   39.147430] Code: 00 e0 ff ff 09 c1 83 f8 08 0f 42 d1 e9 c4 fe ff
> ff 48 63 85 34 02 00 00 48 03 45 08 49 39 c4 0f 85 63 fe ff ff e9 db
> fe ff ff <0f> 0b e9 91 fd ff ff 31 d2 e9 9d fe ff ff 48 c7 c6 38 b6 4e
> 82 48
> [   39.148079] RSP: 0000:ffffc900045c3ce0 EFLAGS: 00010202
> [   39.148390] RAX: 0017ffffd0020061 RBX: ffffea00064d4c00 RCX: 03ffffffffffffff
> [   39.148737] RDX: ffffea00064d4c00 RSI: 0000000000000000 RDI: ffffea00064d4c00
> [   39.149102] RBP: 0000000000000001 R08: ffffea00064d4c00 R09: 0000000000000078
> [   39.149482] R10: 00000000000000f0 R11: 0000000000000004 R12: 0000000000001000
> [   39.149832] R13: ffff888102df5c00 R14: ffff888102df5c00 R15: 0000000000000003
> [   39.150177] FS:  00007f51a56c9540(0000) GS:ffff888fffc00000(0000)
> knlGS:0000000000000000
> [   39.150623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   39.150950] CR2: 000055627b13fda0 CR3: 00000001083e2000 CR4: 00000000003506b0
> [   39.151317] Call Trace:
> [   39.151565]  <TASK>
> [   39.151778]  ? __warn+0x84/0x130
> [   39.152044]  ? swap_read_folio+0x4db/0x520
> [   39.152345]  ? report_bug+0xfc/0x1e0
> [   39.152614]  ? handle_bug+0x3f/0x70
> [   39.152891]  ? exc_invalid_op+0x17/0x70
> [   39.153178]  ? asm_exc_invalid_op+0x1a/0x20
> [   39.153467]  ? swap_read_folio+0x4db/0x520
> [   39.153753]  do_swap_page+0xc6d/0x14f0
> [   39.154054]  ? srso_return_thunk+0x5/0x5f
> [   39.154361]  __handle_mm_fault+0x758/0x850
> [   39.154645]  handle_mm_fault+0x134/0x340
> [   39.154945]  do_user_addr_fault+0x2e5/0x760
> [   39.155245]  exc_page_fault+0x6a/0x140
> [   39.155546]  asm_exc_page_fault+0x26/0x30
> [   39.155847] RIP: 0033:0x55627b071446
> [   39.156124] Code: f6 7e 19 83 e3 01 74 14 41 83 ee 01 44 89 35 25
> 72 0c 00 45 85 ed 0f 88 73 02 00 00 8b 05 ea 74 0c 00 85 c0 0f 85 da
> 03 00 00 <44> 8b 15 53 e9 0c 00 45 85 d2 74 2e 44 8b 0d 37 e3 0c 00 45
> 85 c9
> [   39.156944] RSP: 002b:00007ffd619d54f0 EFLAGS: 00010246
> [   39.157237] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f51a44f968b
> [   39.157594] RDX: 0000000000000000 RSI: 00007ffd619d5518 RDI: 00000000ffffffff
> [   39.157954] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007
> [   39.158288] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> [   39.158634] R13: 0000000000002b9a R14: 0000000000000000 R15: 00007ffd619d5518
> [   39.158998]  </TASK>
> [   39.159226] ---[ end trace 0000000000000000 ]---
>
> After reverting this or Usama's "mm: store zero pages to be swapped
> out in a bitmap", the problem is gone. I think these two patches may
> have some conflict that needs to be resolved.

Yup. I saw this conflict coming and specifically asked for this
warning to be added in Usama's patch to catch it [1]. It served its
purpose.

Usama's patch does not handle large folio swapin, because at the time
it was written we didn't have it. We expected Usama's series to land
sooner than this one, so the warning was to make sure that this series
handles large folio swapin in the zeromap code. Now that they are both
in mm-unstable, we are gonna have to figure this out.

I suspect Usama's patches are closer to land so it's better to handle
this in this series, but I will leave it up to Usama and
Chuanhua/Barry to figure this out :)

[1]https://lore.kernel.org/lkml/CAJD7tkbpXjg00CRSrXU_pbaHwEaW1b3k8AQgu8y2PAh7EkTOug@mail.gmail.com/