lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Sep 2022 10:52:44 +0800
From:   Shuai Xue <xueshuai@...ux.alibaba.com>
To:     "Luck, Tony" <tony.luck@...el.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        James Morse <james.morse@....com>,
        baicar@...amperecomputing.com
Cc:     Len Brown <lenb@...nel.org>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Jarkko Sakkinen <jarkko@...nel.org>,
        HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>,
        "linmiaohe@...wei.com" <linmiaohe@...wei.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Stable <stable@...r.kernel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "cuibixuan@...ux.alibaba.com" <cuibixuan@...ux.alibaba.com>,
        "baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
        "zhuo.song@...ux.alibaba.com" <zhuo.song@...ux.alibaba.com>
Subject: Re: [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to
 avoid memory leak



在 2022/9/30 AM4:52, Luck, Tony 写道:
> Thanks for your patient explanations.

You are welcome :)

> 
>> STEP2: In IRQ context, ghes_proc/_in_irq() queues memory failure work on current CPU
>> in workqueue and add task work to sync with the workqueue.
> 
> Why is there a difference if the interrupted task was a user task vs. a kernel thread?
> 
> It seems arbitrary. If the error can be handled in the kernel thread case without
> a task_work_add() to the current process, can't all errors be handled this way?

I'm afraid not. The kworker in workqueue is asynchronous with ret_to_user() of the
interrupted task. If we return to user-space before the queued memory_failure() work
is processed, we will take the fault again when the error is signal by synchronous
external abort. This loop may cause platform firmware to exceed some threshold and
reboot.

When a user task consuming poison data, a synchronous external abort will be signaled,
for example "einj_mem_uc single" in ras-tools. In such case, the handling flow will
be like bellow:

----------------------------------STEP 0-------------------------------------------
[ghes_sdei_critical_callback: current einj_mem_uc, local cpu]
ghes_sdei_critical_callback
    => __ghes_sdei_callback
        => ghes_in_nmi_queue_one_entry: peak and read estatus
        => irq_work_queue(&ghes_proc_irq_work) // ghes_proc_in_irq - irq_work
[ghes_sdei_critical_callback: return]
-----------------------------------STEP 1------------------------------------------
[ghes_proc_in_irq: current einj_mem_uc, local cpu]
            => ghes_do_proc
                => ghes_handle_memory_failure
                    => ghes_do_memory_failure
                        => memory_failure_queue	- put work task on a specific cpu
                            => if (kfifo_put(&mf_cpu->fifo, entry))
                                  schedule_work_on(smp_processor_id(), &mf_cpu->work);
            => task_work_add(current, &estatus_node->task_work, TWA_RESUME);
[ghes_proc_in_irq: return]
-----------------------------------STEP 3------------------------------------------
// kworker preempts einj_mem_uc on local cpu due to RESCHED flag
[memory_failure_work_func: current kworker, local cpu]	
     => memory_failure_work_func(&mf_cpu->work)
        => while kfifo_get(&mf_cpu->fifo, &entry);	// until get no work
            => soft/hard offline
------------------------------------STEP 4-----------------------------------------
[ghes_kick_task_work: current einj_mem_uc, other cpu]
                => memory_failure_queue_kick
                    => cancel_work_sync //wait memory_failure_work_func finish
                    => memory_failure_work_func(&mf_cpu->work)
                        => kfifo_get(&mf_cpu->fifo, &entry); // no work here
------------------------------------STEP 5-----------------------------------------
[current einj_mem_uc returned to userspace]
                => Killed by SIGBUS

STEP 4 add a task work to ensure the queued memory_failure() work is processed before
returning to user-space. And the interrupted user will be killed by SIGBUS signal.

If we delete STEP 4, the interrupted user task will return to user space synchronously
and consume the poison data again.


> 
> The current thread likely has nothing to do with the error. Just a matter of chance
> on what is running when the NMI is delivered, right?

Yes, the error is actually handled in workqueue. I think the point is that the
synchronous exception signaled by synchronous external abort must be handled
synchronously, otherwise, it will be signaled again.

Best Regards,
Shuai

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ