linux-kernel - Re: [PATCH v2] x86/mce: Fix endless loop when run task works after #MC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210706164451.GA1289248@agluck-desk2.amr.corp.intel.com>
Date:   Tue, 6 Jul 2021 09:44:51 -0700
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Ding Hui <dinghui@...gfor.com.cn>
Cc:     bp@...en8.de, bp@...e.de, naoya.horiguchi@....com,
        osalvador@...e.de, peterz@...radead.org,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, mingo@...hat.com, x86@...nel.org,
        hpa@...or.com, youquan.song@...el.com, huangcun@...gfor.com.cn,
        stable@...r.kernel.org
Subject: Re: [PATCH v2] x86/mce: Fix endless loop when run task works after
 #MC

On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote:
> Recently we encounter multi #MC on the same task when it's
> task_work_run() has not been called, current->mce_kill_me was
> added to task_works list more than once, that make a circular
> linked task_works, so task_work_run() will do a endless loop.

I saw the same and posted a similar fix a while back:

https://www.spinics.net/lists/linux-mm/msg251006.html

It didn't get merged because some validation tests began failing
around the same time.  I'm now pretty sure I understand what happened
with those other tests.

I'll post my updated version (second patch in a three part series)
later today.

> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c

> +	if (!cmpxchg(&current->mce_kill_me.func, NULL, ch.func)) {
> +		current->mce_addr = m->addr;
> +		current->mce_kflags = m->kflags;
> +		current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
> +		current->mce_whole_page = whole_page(m);

You don't need an atomic cmpxchg here (nor the WRITE_ONCE() to clear it).
The task is operating on its own task_struct. Nobody else should touch
the mce_kill_me field.

-Tony