linux-kernel - Re: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6dc1b117-020e-be9e-7e5e-a349ffb7d00a@huawei.com>
Date:   Thu, 20 Apr 2023 10:59:54 +0800
From:   Kefeng Wang <wangkefeng.wang@...wei.com>
To:     Jane Chu <jane.chu@...cle.com>,
        HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>, Thomas Gleixner <tglx@...utronix.de>
CC:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tong Tiangen <tongtiangen@...wei.com>,
        Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v2] mm: hwpoison: coredump: support recovery from
 dump_user_range()



On 2023/4/20 10:03, Jane Chu wrote:
> 
> On 4/19/2023 5:03 AM, Kefeng Wang wrote:
>>
>>
>> On 2023/4/19 15:25, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Tue, Apr 18, 2023 at 05:45:06PM +0800, Kefeng Wang wrote:
>>>>
>>>>
...
>>>>>> @@ -371,6 +372,14 @@ size_t _copy_mc_to_iter(const void *addr, 
>>>>>> size_t bytes, struct iov_iter *i)
>>>>>>    EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
>>>>>>    #endif /* CONFIG_ARCH_HAS_COPY_MC */
>>>>>> +static void *memcpy_from_iter(struct iov_iter *i, void *to, const 
>>>>>> void *from,
>>>>>> +                 size_t size)
>>>>>> +{
>>>>>> +    if (iov_iter_is_copy_mc(i))
>>>>>> +        return (void *)copy_mc_to_kernel(to, from, size);
>>>>>
>>>>> Is it helpful to call memory_failure_queue() if copy_mc_to_kernel() 
>>>>> fails
>>>>> due to a memory error?
>>>>
>>>> For dump_user_range(), the task is dying, if copy incomplete size, the
>>>> coredump will fail and task will exit, also memory_failure will
>>>> be called by kill_me_maybe(),
>>>>
>>>>   CPU: 0 PID: 1418 Comm: test Tainted: G   M               6.3.0-rc5 
>>>> #29
>>>>   Call Trace:
>>>>    <TASK>
>>>>    dump_stack_lvl+0x37/0x50
>>>>    memory_failure+0x51/0x970
>>>>    kill_me_maybe+0x5b/0xc0
>>>>    task_work_run+0x5a/0x90
>>>>    exit_to_user_mode_prepare+0x194/0x1a0
>>>>    irqentry_exit_to_user_mode+0x9/0x30
>>>>    noist_exc_machine_check+0x40/0x80
>>>>    asm_exc_machine_check+0x33/0x40
>>>
>>> Is this call trace printed out when copy_mc_to_kernel() failed by 
>>> finding
>>> a memory error (or in some testcase using error injection)?
>>
>> I add dump_stack() into memory_failure() to check whether the poisoned
>> memory is called or not, and the call trace shows it do call
>> memory_failure()， but I get confused when do the test.
>>
>>> In my understanding, an MCE should not be triggered when MC-safe copy 
>>> tries
>>> to access to a memory error.  So I feel that we might be talking about
>>> different scenarios.
>>>
>>> When I questioned previously, I thought about the following scenario:
>>>
>>>    - a process terminates abnormally for any reason like segmentation 
>>> fault,
>>>    - then, kernel tries to create a coredump,
>>>    - during this, the copying routine accesses to corrupted page to 
>>> read.
>>>
>> Yes, we tested like your described,
>>
>> 1) inject memory error into a process
>> 2) send a SIGABT/SIGBUS to process to trigger the coredump
>>
>> Without patch, the system panic, and with patch only process exits.
>>
>>> In this case the corrupted page should not be handled by 
>>> memory_failure()
>>> yet (because otherwise properly handled hwpoisoned page should be 
>>> ignored
>>> by coredump process).  The coredump process would exit with failure with
>>> your patch, but then, the corrupted page is still left unhandled and can
>>> be reused, so any other thread can easily access to it again.
>>
>> As shown above, the corrupted page will be handled by 
>> memory_failure(), but what I'm wondering,
>> 1) memory_failure() is not always called
>> 2) look at the above call trace, it looks like from asynchronous
>>     interrupt, not from synchronous exception, right?
>>
>>>
>>> You can find a few other places (like __wp_page_copy_user and 
>>> ksm_might_need_to_copy)
>>> to call memory_failure_queue() to cope with such unhandled error pages.
>>> So does memcpy_from_iter() do the same?
>>
>> I add some debug print in do_machine_check() on x86:
>>
>> 1) COW,
>>    m.kflags: MCE_IN_KERNEL_RECOV
>>    fixup_type: EX_TYPE_DEFAULT_MCE_SAFE
>>
>>    CPU: 11 PID: 2038 Comm: einj_mem_uc
>>    Call Trace:
>>     <#MC>
>>     dump_stack_lvl+0x37/0x50
>>     do_machine_check+0x7ad/0x840
>>     exc_machine_check+0x5a/0x90
>>     asm_exc_machine_check+0x1e/0x40
>>    RIP: 0010:copy_mc_fragile+0x35/0x62
>>
>>    if (m.kflags & MCE_IN_KERNEL_RECOV) {
>>            if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
>>                    mce_panic("Failed kernel mode recovery", &m, msg);
>>    }
>>
>>    if (m.kflags & MCE_IN_KERNEL_COPYIN)
>>            queue_task_work(&m, msg, kill_me_never);
>>
>> There is no memory_failure() called when
>> EX_TYPE_DEFAULT_MCE_SAFE, also EX_TYPE_FAULT_MCE_SAFE too,
>> so we manually add a memory_failure_queue() to handle with
>> the poisoned page.
>>
>> 2） Coredump,  nothing print about m.kflags and fixup_type,
>> with above check, add a memory_failure_queue() or memory_failure() seems
>> to be needed for memcpy_from_iter(), but it is totally different from
>> the COW scenario
>>
>>
>> Another question, other copy_mc_to_kernel() callers, eg,
>> nvdimm/dm-writecache/dax, there are not call memory_failure_queue(),
>> should they need a memory_failure_queue(), if so, why not add it into
>> do_machine_check() ?
> 

What I mean is that EX_TYPE_DEFAULT_MCE_SAFE/EX_TYPE_FAULT_MCE_SAFE
is designed to identify fixups which allow in kernel #MC recovery,
that is, the caller of copy_mc_to_kernel() must know the source
is a user address, so we could add a MCE_IN_KERNEL_COPYIN fro
the MCE_SAFE type.

diff --git a/arch/x86/kernel/cpu/mce/severity.c 
b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..63e94484c5d6 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -293,12 +293,11 @@ static noinstr int error_context(struct mce *m, 
struct pt_regs *regs)
         case EX_TYPE_COPY:
                 if (!copy_user)
                         return IN_KERNEL;
-               m->kflags |= MCE_IN_KERNEL_COPYIN;
                 fallthrough;

         case EX_TYPE_FAULT_MCE_SAFE:
         case EX_TYPE_DEFAULT_MCE_SAFE:
-               m->kflags |= MCE_IN_KERNEL_RECOV;
+               m->kflags |= MCE_IN_KERNEL_RECOV | MCE_IN_KERNEL_COPYIN;
                 return IN_KERNEL_RECOV;

         default:

then we could drop memory_failure_queue(pfn, flags) from cow/ksm copy, 
or every Machine Check safe memory copy will need a memory_failure_xx() 
call.

+Thomas，who add the two types, could you share some comments about 
this,thanks.

> In the dax case, if the source address is poisoned, and we do follow up 
> with memory_failure_queue(pfn, flags), what should the value of the 
> 'flags' be ?


I think flags = 0 is enough to for all copy_mc_xxx to isolate the 
poisoned page.

Thanks.