[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210129113334.344377ac.yaoaili@kingsoft.com>
Date: Fri, 29 Jan 2021 11:33:34 +0800
From: Aili Yao <yaoaili@...gsoft.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: <x86@...nel.org>, <naoya.horiguchi@....com>,
<linux-kernel@...r.kernel.org>, <yangfeng1@...gsoft.com>
Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for
hwpoison page access.
On Thu, 28 Jan 2021 09:43:52 -0800
"Luck, Tony" <tony.luck@...el.com> wrote:
> On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> > when one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> > mode it may be fixed by fixup_exception, current code will just return
> > error code to user process.
>
> Shouldn't the AO action that poisoned the page have also unmapped it?
Yes, The page has been unmapped in the code mm/rmap.c:
1567 if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
1568 pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
1569 if (PageHuge(page)) {
1570 hugetlb_count_sub(compound_nr(page), mm);
1571 set_huge_swap_pte_at(mm, address,
1572 pvmw.pte, pteval,
1573 vma_mmu_pagesize(vma));
1574 } else {
1575 dec_mm_counter(mm, mm_counter(page));
1576 set_pte_at(mm, address, pvmw.pte, pteval);
1577 }
1578
1579 }
The pte for this page of processes mapping it should be marked with SWP_HWPOISON.
But the process may not be informed and may continue with the address which has been
ummaped, Then accessing the content in the page will trigger a page fault.
Normally, it will hit the code in arch/x86/mm/fault.c:
945 #ifdef CONFIG_MEMORY_FAILURE
946 if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
947 struct task_struct *tsk = current;
948 unsigned lsb = 0;
949
950 pr_err(
951 "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
952 tsk->comm, tsk->pid, address);
953 if (fault & VM_FAULT_HWPOISON_LARGE)
954 lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
955 if (fault & VM_FAULT_HWPOISON)
956 lsb = PAGE_SHIFT;
957 force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
958 return;
959 }
960 #endif
961 force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
962 }
But when the user processes do a syscall and make a copyin action in kernel space,
the page fault triggered by this action will not got the the above code, it actually
go to the code in arch/x86/mm/fault.c:
650 if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
674 /*
675 * Barring that, we can do the fixup and be happy.
676 */
677 return;
678 }
> >
> > This is not suffient, we should send a SIGBUS to the process and log the
> > info to console, as we can't trust the process will handle the error
> > correctly.
>
> I agree with this part ... few apps check for -EFAULT and do the right
> thing. But I'm not sure how this happens. Can you provide a bit more
> detail on the steps
>
Attachment is a simple code to test this, you can try this test with:
./einj_mem_uc -f copyin2
In my enviroment, the stack will be:
[ 1583.063050] Memory failure: 0x1030254: recovery action for dirty LRU page: Recovered
[ 1583.071724] MCE: Killing einj_mem_uc:11139 due to hardware memory corruption fault at 7f4d59032000
[ 1583.081732] CPU: 38 PID: 11139 Comm: einj_mem_uc Kdump: loaded Not tainted 5.11.0-rc2+ #43
[ 1583.102607] Call Trace:
[ 1583.105338] dump_stack+0x57/0x6a
[ 1583.109041] no_context.cold+0xf6/0x284
[ 1583.113315] mm_fault_error+0xc3/0x1b0
[ 1583.117503] exc_page_fault+0x57/0x110
[ 1583.121690] asm_exc_page_fault+0x1e/0x30
[ 1583.126159] RIP: 0010:__get_user_nocheck_8+0x10/0x13
[ 1583.131704] Code: 0f b7 10 31 c0 0f 01 ca c3 90 0f 01 cb 0f ae e8 8b 10 31 c0 0f 01 ca c3 66 90 0f 01 cb 0f ae e8 48 8b 10 31 c0 0f 01 ca c3 90 <0f> 01 ca 31 d2 48 c7 c0 f2 ff ff ff c3 cc cc cc 0f 1f 44 00 00 40
[ 1583.152659] RSP: 0018:ffffb9e462917d90 EFLAGS: 00050293
[ 1583.158490] RAX: 00007f4d59032000 RBX: 0000000000000000 RCX: 00007f4d59032000
[ 1583.166453] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 00007f4d590321ff
[ 1583.174418] RBP: 0000000000000200 R08: 0000000000000200 R09: ffffb9e462917e50
[ 1583.182382] R10: 0000000000000200 R11: 0000000000000000 R12: ffffb9e462917e60
[ 1583.190345] R13: ffff941470e93058 R14: 0000000000001000 R15: ffffffffc0626540
[ 1583.198310] iov_iter_fault_in_readable+0x4f/0x120
[ 1583.203657] generic_perform_write+0x83/0x1c0
[ 1583.208520] ext4_buffered_write_iter+0x93/0x150 [ext4]
[ 1583.214378] new_sync_write+0x11f/0x1b0
[ 1583.218661] vfs_write+0x1c0/0x280
[ 1583.222455] ksys_write+0x5f/0xe0
[ 1583.226153] do_syscall_64+0x33/0x40
[ 1583.230142] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1583.235778] RIP: 0033:0x7f4d58b35cd0
> P.S. Typo: s/suffient/sufficient/
Thanks for correction!
--
Best Regards!
Aili Yao
View attachment "einj_mem_uc.c" of type "text/x-c++src" (15235 bytes)
Powered by blists - more mailing lists