[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+WzARkyhLrntJfZ2cCB+Z5kiiLAB=OzhERgWQ66bVKr++Yk-A@mail.gmail.com>
Date: Mon, 12 Jul 2021 11:45:00 +0800
From: zhenguo yao <yaozhenguo1@...il.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: oleg@...hat.com, linux-kernel@...r.kernel.org, yaozhenguo@...com
Subject: Re: [PATCH] task_work: return -EBUSY when adding same work
This issue happens in a stress test of memory UE injection. It has
more than once UEs reported to the OS at the same moment in the test.
So do_machine_check-->queue_task_work is called many times.
mce_kill_me work is added to list many times. When mce_kill_me is add
to the list, it becomes the list header and then another mce_kill_me
is added to the list before task_work_run is called. The list becomes
a dead loop: task->task_works = mce_kill_me, mce_kill_me->next =
mce_kill_me. When the task want to return to user mode and run
task_work_run. It becomes a dead loop and never return to user mode
and process signal SIGBUS that mce_kill_me sent to him. I fix this by
following patch
--
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aa..9333696 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1299,7 +1299,9 @@ static void queue_task_work(struct mce *m, int
kill_current_task)
else
current->mce_kill_me.func = kill_me_maybe;
- task_work_add(current, ¤t->mce_kill_me, TWA_RESUME);
+ /* Avoid endless loops when task_work_run is running */
+ if (READ_ONCE(current->task_works) != ¤t->mce_kill_me)
+ task_work_add(current, ¤t->mce_kill_me, TWA_RESUME);
}
--
But I think it is better return an error in task_work_add when same
work is added to the list. Similar problem may happen in other scenes.
It is hard to debug when it is a seldom issue.
Jens Axboe <axboe@...nel.dk> 于2021年7月12日周一 上午10:44写道:
>
> On 7/11/21 8:13 PM, zhenguo yao wrote:
> > Yes I hit this condition. The caller is queue_task_work in
> > arch/x86/kernel/cpu/mce/core.c.
> > It is really a BUG. I have submitted another patch to fix it:
> > https://lkml.org/lkml/2021/7/9/186.
>
> That patch seems broken, what happens if mce_kill_me is added already,
> but it isn't the first work item in the list?
>
> --
> Jens Axboe
>
Powered by blists - more mailing lists