[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjCs7XPtNHwzVHK=0D=tZgtdyMGLtoomZB5JeUm7D3JEg@mail.gmail.com>
Date: Tue, 18 Jan 2022 06:23:51 +0200
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Olivier Langlois <olivier@...llion01.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"<linux-arch@...r.kernel.org>" <linux-arch@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
Alexey Gladkov <legion@...nel.org>,
Kyle Huey <me@...ehuey.com>, Oleg Nesterov <oleg@...hat.com>,
Kees Cook <keescook@...omium.org>,
Al Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>,
Pavel Begunkov <asml.silence@...il.com>
Subject: Re: io_uring truncating coredumps
On Mon, Jan 17, 2022 at 8:47 PM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> Thinking about it from the perspective of not delivering the wake-ups
> fixing io_uring and coredumps in a non-hacky way looks comparatively
> simple. The function task_work_add just needs to not wake anything up
> after a process has started dying.
>
> Something like the patch below.
Hmm. Yes, I think this is the right direction.
That said, I think it should not add the work at all, and return
-ESRCH, the exact same way that it does for that work_exited
condition.
Because it's basically the same thing: the task is dead and shouldn't
do more work. In fact, task_work_run() is the thing that sets it to
&work_exited as it sees PF_EXITING, so it feels to me that THAT is
actually the issue here - we react to PF_EXITING too late. We react to
it *after* we've already added the work, and then we do that "no more
work" logic only after we've accepted those late work entries?
So my gut feel is that task_work_add() should just also test PF_EXITING.
And in fact, my gut feel is that PF_EXITING is too late anyway (it
happens after core-dumping, no?)
But I guess that thing may be on purpose, and maybe the act of dumping
core itself wants to do more work, and so that isn't an option?
So I don't think your patch is "right" as-is, and it all worries me,
but yes, I think this area is very much the questionable one.
I think that work stopping and the io_uring shutdown should probably
move earlier in the exit queue, but as mentioned above, maybe the work
addition boundary in particular really wants to be late because the
exit process itself still uses task works? ;(
Linus
Powered by blists - more mailing lists