[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20241112141601.34540-1-pjy@amazon.com>
Date: Tue, 12 Nov 2024 14:16:01 +0000
From: Puranjay Mohan <pjy@...zon.com>
To: Mike Kravetz <mike.kravetz@...cle.com>, Andrew Morton
<akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: [RFC PATCH 5.10.y] mm: hugetlb: call huge_pte_offset with i_mmap_rwsem held
huge_pte_offset() walks the page table to the find the pte associated
with the huge page. the PMD page can be shared with another process and
pmd unsharing is possible (so the PUD-ranged pgtable page can go away
while the page table walk is under way. It can be done by a pmd unshare
with a follow up munmap() on the other process).
Protect against this race by taking i_mmap_rwsem while the page table
walk is going on and till the pointer to the PMD is being used.
The upstream kernel has a new lock [1] for fixing this issue and
backporting the whole series is not trivial. This patch is my attempt at
fixing this issue and I am sending this as an RFC to receive feedback if we
can fix it using another method.
Once I receive the feedback and we have a path forward, I will send that
patch to all stable branches that have this issue.
[1] https://lwn.net/Articles/908092/
Here is an example kernel panic due to the issue being fixed in this
patch:
Unable to handle kernel paging request at virtual address ffffffffc0000698
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000729ec000
[ffffffffc0000698] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: xt_tcpmss ip6table_filter tcp_diag [.....]
CPU: 3 PID: 62456 Comm: postgres Not tainted 5.10.184-175.731.amzn2.aarch64 #1
Hardware name: Amazon EC2 caspianr1g.16xlarge/, BIOS 1.0 11/1/2018
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
pc : huge_pte_offset+0x88/0x118
lr : hugetlb_fault+0x60/0x5f0
sp : ffff80001c6d3d00
x29: ffff80001c6d3d00 x28: ffff0003cdfa0000
x27: 0000000000000000 x26: ffff0003ce90d6a8
x25: 0000000000000007 x24: 00000a60da660000
x23: ffff0003ce90d640 x22: ffff800012256ed8
x21: 00000a60da600000 x20: ffff00040388d130
x19: ffff00040388d130 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : ffff800010345880 x8 : 0000000000000000
x7 : 0000000040000000 x6 : 0000000040000000
x5 : 0000000000000183 x4 : 00000000000000d3
x3 : ffffffffc0000000 x2 : 0000000000200000
x1 : 00000a60da600000 x0 : ffffffffc0000698
Call trace:
huge_pte_offset+0x88/0x118
handle_mm_fault+0x1b0/0x240
do_page_fault+0x150/0x420
do_translation_fault+0xb8/0xf4
do_mem_abort+0x48/0xa8
el0_da+0x44/0x80
el0_sync_handler+0xe0/0x120
Code: eb00005f 54000380 d3557424 8b040c60 (f8647863)
---[ end trace 6cffaf3375de3ad9 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x0804800e,7a00a238
Memory Limit: 2048 MB
---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
Signed-off-by: Puranjay Mohan <pjy@...zon.com>
---
mm/hugetlb.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 02b7c8f9b0e87..a991b62afac4e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4545,7 +4545,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
struct address_space *mapping;
int need_wait_lock = 0;
unsigned long haddr = address & huge_page_mask(h);
+ mapping = vma->vm_file->f_mapping;
+ i_mmap_lock_read(mapping);
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (ptep) {
/*
@@ -4556,10 +4558,13 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
entry = huge_ptep_get(ptep);
if (unlikely(is_hugetlb_entry_migration(entry))) {
migration_entry_wait_huge(vma, mm, ptep);
+ i_mmap_unlock_read(mapping);
return 0;
- } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
+ } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
+ i_mmap_unlock_read(mapping);
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
+ }
}
/*
@@ -4573,8 +4578,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* is OK, as huge_pte_alloc will return the same value unless
* something has changed.
*/
- mapping = vma->vm_file->f_mapping;
- i_mmap_lock_read(mapping);
ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
if (!ptep) {
i_mmap_unlock_read(mapping);
--
2.40.1
Powered by blists - more mailing lists