linux-kernel - Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210129113334.344377ac.yaoaili@kingsoft.com>
Date:   Fri, 29 Jan 2021 11:33:34 +0800
From:   Aili Yao <yaoaili@...gsoft.com>
To:     "Luck, Tony" <tony.luck@...el.com>
CC:     <x86@...nel.org>, <naoya.horiguchi@....com>,
        <linux-kernel@...r.kernel.org>, <yangfeng1@...gsoft.com>
Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for
 hwpoison page access.

On Thu, 28 Jan 2021 09:43:52 -0800
"Luck, Tony" <tony.luck@...el.com> wrote:

> On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> > when one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> > mode it may be fixed by fixup_exception, current code will just return
> > error code to user process.  
> 
> Shouldn't the AO action that poisoned the page have also unmapped it?

Yes, The page has been unmapped in the code mm/rmap.c:

1567                 if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
1568                         pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
1569                         if (PageHuge(page)) {
1570                                 hugetlb_count_sub(compound_nr(page), mm);
1571                                 set_huge_swap_pte_at(mm, address,
1572                                                      pvmw.pte, pteval,
1573                                                      vma_mmu_pagesize(vma));
1574                         } else {
1575                                 dec_mm_counter(mm, mm_counter(page));
1576                                 set_pte_at(mm, address, pvmw.pte, pteval);
1577                         }
1578 
1579                 }

The pte for this page of processes mapping it should be marked with SWP_HWPOISON.
But the process may not be informed and may continue with the address which has been
ummaped, Then accessing the content in the page will trigger a page fault.

Normally, it will hit the code in arch/x86/mm/fault.c:

 945 #ifdef CONFIG_MEMORY_FAILURE
 946         if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
 947                 struct task_struct *tsk = current;
 948                 unsigned lsb = 0;
 949 
 950                 pr_err(
 951         "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
 952                         tsk->comm, tsk->pid, address);
 953                 if (fault & VM_FAULT_HWPOISON_LARGE)
 954                         lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
 955                 if (fault & VM_FAULT_HWPOISON)
 956                         lsb = PAGE_SHIFT;
 957                 force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
 958                 return;
 959         }
 960 #endif
 961         force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
 962 }

But when the user processes do a syscall and make a copyin action in kernel space, 
the page fault triggered by this action will not got the the above code, it actually
go to the code in arch/x86/mm/fault.c:

 650         if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
 674                 /*
 675                  * Barring that, we can do the fixup and be happy.
 676                  */
 677                 return;
 678         }

> > 
> > This is not suffient, we should send a SIGBUS to the process and log the
> > info to console, as we can't trust the process will handle the error
> > correctly.  
> 
> I agree with this part ... few apps check for -EFAULT and do the right
> thing.  But I'm not sure how this happens. Can you provide a bit more
> detail on the steps
> 

Attachment is a simple code to test this, you can try this test with:
./einj_mem_uc -f copyin2

In my enviroment, the stack will be:

[ 1583.063050] Memory failure: 0x1030254: recovery action for dirty LRU page: Recovered
[ 1583.071724] MCE: Killing einj_mem_uc:11139 due to hardware memory corruption fault at 7f4d59032000
[ 1583.081732] CPU: 38 PID: 11139 Comm: einj_mem_uc Kdump: loaded Not tainted 5.11.0-rc2+ #43
[ 1583.102607] Call Trace:
[ 1583.105338]  dump_stack+0x57/0x6a
[ 1583.109041]  no_context.cold+0xf6/0x284
[ 1583.113315]  mm_fault_error+0xc3/0x1b0
[ 1583.117503]  exc_page_fault+0x57/0x110
[ 1583.121690]  asm_exc_page_fault+0x1e/0x30
[ 1583.126159] RIP: 0010:__get_user_nocheck_8+0x10/0x13
[ 1583.131704] Code: 0f b7 10 31 c0 0f 01 ca c3 90 0f 01 cb 0f ae e8 8b 10 31 c0 0f 01 ca c3 66 90 0f 01 cb 0f ae e8 48 8b 10 31 c0 0f 01 ca c3 90 <0f> 01 ca 31 d2 48 c7 c0 f2 ff ff ff c3 cc cc cc 0f 1f 44 00 00 40
[ 1583.152659] RSP: 0018:ffffb9e462917d90 EFLAGS: 00050293
[ 1583.158490] RAX: 00007f4d59032000 RBX: 0000000000000000 RCX: 00007f4d59032000
[ 1583.166453] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 00007f4d590321ff
[ 1583.174418] RBP: 0000000000000200 R08: 0000000000000200 R09: ffffb9e462917e50
[ 1583.182382] R10: 0000000000000200 R11: 0000000000000000 R12: ffffb9e462917e60
[ 1583.190345] R13: ffff941470e93058 R14: 0000000000001000 R15: ffffffffc0626540
[ 1583.198310]  iov_iter_fault_in_readable+0x4f/0x120
[ 1583.203657]  generic_perform_write+0x83/0x1c0
[ 1583.208520]  ext4_buffered_write_iter+0x93/0x150 [ext4]
[ 1583.214378]  new_sync_write+0x11f/0x1b0
[ 1583.218661]  vfs_write+0x1c0/0x280
[ 1583.222455]  ksys_write+0x5f/0xe0
[ 1583.226153]  do_syscall_64+0x33/0x40
[ 1583.230142]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1583.235778] RIP: 0033:0x7f4d58b35cd0

> P.S. Typo: s/suffient/sufficient/

Thanks for correction!

-- 
Best Regards!

Aili Yao

View attachment "einj_mem_uc.c" of type "text/x-c++src" (15235 bytes)