[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a171373-e901-ae97-41fd-29445c3fe89e@samba.org>
Date:   Fri, 26 Mar 2021 16:08:44 +0100
From:   Stefan Metzmacher <metze@...ba.org>
To:     Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org
Cc:     torvalds@...ux-foundation.org, ebiederm@...ssion.com,
        oleg@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/6] Allow signals for IO threads
Am 26.03.21 um 15:55 schrieb Jens Axboe:
> On 3/26/21 8:53 AM, Jens Axboe wrote:
>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>
>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>
>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>
>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>> pushed), fwiw.
>>>>>>
>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>
>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>> and never return. So we might need a special case in there to deal with
>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>> correctly for IO threads.
>>>>
>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>>
>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>
>> Yes exactly, we're waiting in there being stopped. So we either need to
>> check to something ala:
>>
>> relock:
>> +	if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>> +		return false;
>>
>> to catch it upfront and from the relock case, or add:
>>
>> 	fatal:
>> +		if (current->flags & PF_IO_WORKER)
>> +			return false;
>>
>> to catch it in the fatal section.
> 
> Can you try this? Not crazy about adding a special case, but I don't
> think there's any way around this one. And should be pretty cheap, as
> we're already pulling in ->flags right above anyway.
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 5ad8566534e7..5b75fbe3d2d6 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>  		 */
>  		current->flags |= PF_SIGNALED;
>  
> +		/*
> +		 * PF_IO_WORKER threads will catch and exit on fatal signals
> +		 * themselves. They have cleanup that must be performed, so
> +		 * we cannot call do_exit() on their behalf. coredumps also
> +		 * do not apply to them.
> +		 */
> +		if (current->flags & PF_IO_WORKER)
> +			return false;
> +
>  		if (sig_kernel_coredump(signr)) {
>  			if (print_fatal_signals)
>  				print_fatal_signal(ksig->info.si_signo);
> 
I guess not before next week, but if it resolves the problem for you,
I guess it would be good to get this into rc5.
metze
Powered by blists - more mailing lists
 
