linux-kernel - Re: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d0c38a9-ed2a-a221-0c67-4a2f3945d48b@oracle.com>
Date:   Wed, 19 Apr 2023 19:03:05 -0700
From:   Jane Chu <jane.chu@...cle.com>
To:     Kefeng Wang <wangkefeng.wang@...wei.com>,
        HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tong Tiangen <tongtiangen@...wei.com>,
        Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v2] mm: hwpoison: coredump: support recovery from
 dump_user_range()


On 4/19/2023 5:03 AM, Kefeng Wang wrote:
> 
> 
> On 2023/4/19 15:25, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Tue, Apr 18, 2023 at 05:45:06PM +0800, Kefeng Wang wrote:
>>>
>>>
>>> On 2023/4/18 11:13, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>> On Mon, Apr 17, 2023 at 12:53:23PM +0800, Kefeng Wang wrote:
>>>>> The dump_user_range() is used to copy the user page to a coredump 
>>>>> file,
>>>>> but if a hardware memory error occurred during copy, which called from
>>>>> __kernel_write_iter() in dump_user_range(), it crashes,
>>>>>
>>>>>     CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425
>>>>>     pc : __memcpy+0x110/0x260
>>>>>     lr : _copy_from_iter+0x3bc/0x4c8
>>>>>     ...
>>>>>     Call trace:
>>>>>      __memcpy+0x110/0x260
>>>>>      copy_page_from_iter+0xcc/0x130
>>>>>      pipe_write+0x164/0x6d8
>>>>>      __kernel_write_iter+0x9c/0x210
>>>>>      dump_user_range+0xc8/0x1d8
>>>>>      elf_core_dump+0x308/0x368
>>>>>      do_coredump+0x2e8/0xa40
>>>>>      get_signal+0x59c/0x788
>>>>>      do_signal+0x118/0x1f8
>>>>>      do_notify_resume+0xf0/0x280
>>>>>      el0_da+0x130/0x138
>>>>>      el0t_64_sync_handler+0x68/0xc0
>>>>>      el0t_64_sync+0x188/0x190
>>>>>
>>>>> Generally, the '->write_iter' of file ops will use 
>>>>> copy_page_from_iter()
>>>>> and copy_page_from_iter_atomic(), change memcpy() to 
>>>>> copy_mc_to_kernel()
>>>>> in both of them to handle #MC during source read, which stop coredump
>>>>> processing and kill the task instead of kernel panic, but the source
>>>>> address may not always a user address, so introduce a new copy_mc 
>>>>> flag in
>>>>> struct iov_iter{} to indicate that the iter could do a safe memory 
>>>>> copy,
>>>>> also introduce the helpers to set/cleck the flag, for now, it's only
>>>>> used in coredump's dump_user_range(), but it could expand to any other
>>>>> scenarios to fix the similar issue.
>>>>>
>>>>> Cc: Alexander Viro <viro@...iv.linux.org.uk>
>>>>> Cc: Christian Brauner <brauner@...nel.org>
>>>>> Cc: Miaohe Lin <linmiaohe@...wei.com>
>>>>> Cc: Naoya Horiguchi <naoya.horiguchi@....com>
>>>>> Cc: Tong Tiangen <tongtiangen@...wei.com>
>>>>> Cc: Jens Axboe <axboe@...nel.dk>
>>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@...wei.com>
>>>>> ---
>>>>> v2:
>>>>> - move the helper functions under pre-existing CONFIG_ARCH_HAS_COPY_MC
>>>>> - reposition the copy_mc in struct iov_iter for easy merge, suggested
>>>>>     by Andrew Morton
>>>>> - drop unnecessary clear flag helper
>>>>> - fix checkpatch warning
>>>>>    fs/coredump.c       |  1 +
>>>>>    include/linux/uio.h | 16 ++++++++++++++++
>>>>>    lib/iov_iter.c      | 17 +++++++++++++++--
>>>>>    3 files changed, 32 insertions(+), 2 deletions(-)
>>>>>
>>>> ...
>>>>> @@ -371,6 +372,14 @@ size_t _copy_mc_to_iter(const void *addr, 
>>>>> size_t bytes, struct iov_iter *i)
>>>>>    EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
>>>>>    #endif /* CONFIG_ARCH_HAS_COPY_MC */
>>>>> +static void *memcpy_from_iter(struct iov_iter *i, void *to, const 
>>>>> void *from,
>>>>> +                 size_t size)
>>>>> +{
>>>>> +    if (iov_iter_is_copy_mc(i))
>>>>> +        return (void *)copy_mc_to_kernel(to, from, size);
>>>>
>>>> Is it helpful to call memory_failure_queue() if copy_mc_to_kernel() 
>>>> fails
>>>> due to a memory error?
>>>
>>> For dump_user_range(), the task is dying, if copy incomplete size, the
>>> coredump will fail and task will exit, also memory_failure will
>>> be called by kill_me_maybe(),
>>>
>>>   CPU: 0 PID: 1418 Comm: test Tainted: G   M               6.3.0-rc5 #29
>>>   Call Trace:
>>>    <TASK>
>>>    dump_stack_lvl+0x37/0x50
>>>    memory_failure+0x51/0x970
>>>    kill_me_maybe+0x5b/0xc0
>>>    task_work_run+0x5a/0x90
>>>    exit_to_user_mode_prepare+0x194/0x1a0
>>>    irqentry_exit_to_user_mode+0x9/0x30
>>>    noist_exc_machine_check+0x40/0x80
>>>    asm_exc_machine_check+0x33/0x40
>>
>> Is this call trace printed out when copy_mc_to_kernel() failed by finding
>> a memory error (or in some testcase using error injection)?
> 
> I add dump_stack() into memory_failure() to check whether the poisoned
> memory is called or not, and the call trace shows it do call
> memory_failure()， but I get confused when do the test.
> 
>> In my understanding, an MCE should not be triggered when MC-safe copy 
>> tries
>> to access to a memory error.  So I feel that we might be talking about
>> different scenarios.
>>
>> When I questioned previously, I thought about the following scenario:
>>
>>    - a process terminates abnormally for any reason like segmentation 
>> fault,
>>    - then, kernel tries to create a coredump,
>>    - during this, the copying routine accesses to corrupted page to read.
>>
> Yes, we tested like your described,
> 
> 1) inject memory error into a process
> 2) send a SIGABT/SIGBUS to process to trigger the coredump
> 
> Without patch, the system panic, and with patch only process exits.
> 
>> In this case the corrupted page should not be handled by memory_failure()
>> yet (because otherwise properly handled hwpoisoned page should be ignored
>> by coredump process).  The coredump process would exit with failure with
>> your patch, but then, the corrupted page is still left unhandled and can
>> be reused, so any other thread can easily access to it again.
> 
> As shown above, the corrupted page will be handled by memory_failure(), 
> but what I'm wondering,
> 1) memory_failure() is not always called
> 2) look at the above call trace, it looks like from asynchronous
>     interrupt, not from synchronous exception, right?
> 
>>
>> You can find a few other places (like __wp_page_copy_user and 
>> ksm_might_need_to_copy)
>> to call memory_failure_queue() to cope with such unhandled error pages.
>> So does memcpy_from_iter() do the same?
> 
> I add some debug print in do_machine_check() on x86:
> 
> 1) COW,
>    m.kflags: MCE_IN_KERNEL_RECOV
>    fixup_type: EX_TYPE_DEFAULT_MCE_SAFE
> 
>    CPU: 11 PID: 2038 Comm: einj_mem_uc
>    Call Trace:
>     <#MC>
>     dump_stack_lvl+0x37/0x50
>     do_machine_check+0x7ad/0x840
>     exc_machine_check+0x5a/0x90
>     asm_exc_machine_check+0x1e/0x40
>    RIP: 0010:copy_mc_fragile+0x35/0x62
> 
>    if (m.kflags & MCE_IN_KERNEL_RECOV) {
>            if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
>                    mce_panic("Failed kernel mode recovery", &m, msg);
>    }
> 
>    if (m.kflags & MCE_IN_KERNEL_COPYIN)
>            queue_task_work(&m, msg, kill_me_never);
> 
> There is no memory_failure() called when
> EX_TYPE_DEFAULT_MCE_SAFE, also EX_TYPE_FAULT_MCE_SAFE too,
> so we manually add a memory_failure_queue() to handle with
> the poisoned page.
> 
> 2） Coredump,  nothing print about m.kflags and fixup_type,
> with above check, add a memory_failure_queue() or memory_failure() seems
> to be needed for memcpy_from_iter(), but it is totally different from
> the COW scenario
> 
> 
> Another question, other copy_mc_to_kernel() callers, eg,
> nvdimm/dm-writecache/dax, there are not call memory_failure_queue(),
> should they need a memory_failure_queue(), if so, why not add it into
> do_machine_check() ?

In the dax case, if the source address is poisoned, and we do follow up 
with memory_failure_queue(pfn, flags), what should the value of the 
'flags' be ?

thanks,
-jane

> 
> Thanks.
> 
> 
> 
>>
>> Thanks,
>> Naoya Horiguchi
>