lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230614044921.7wxs7w3uncjns33i@zlang-mailbox>
Date:   Wed, 14 Jun 2023 12:49:21 +0800
From:   Zorro Lang <zlang@...hat.com>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     io-uring <io-uring@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] io_uring/io-wq: don't clear PF_IO_WORKER on exit

On Tue, Jun 13, 2023 at 07:14:25PM -0600, Jens Axboe wrote:
> On 6/13/23 6:54?PM, Zorro Lang wrote:
> > On Mon, Jun 12, 2023 at 12:11:57PM -0600, Jens Axboe wrote:
> >> A recent commit gated the core dumping task exit logic on current->flags
> >> remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
> >> This exposed a problem with the io-wq handling of that, which explicitly
> >> clears PF_IO_WORKER before calling do_exit().
> >>
> >> The reasons for this manual clear of PF_IO_WORKER is historical, where
> >> io-wq used to potentially trigger a sleep on exit. As the io-wq thread
> >> is exiting, it should not participate any further accounting. But these
> >> days we don't need to rely on current->flags anymore, so we can safely
> >> remove the PF_IO_WORKER clearing.
> >>
> >> Reported-by: Zorro Lang <zlang@...hat.com>
> >> Reported-by: Dave Chinner <david@...morbit.com>
> >> Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
> >> Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/
> >> Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
> >> Signed-off-by: Jens Axboe <axboe@...nel.dk>
> >>
> >> ---
> > 
> > Hi,
> > 
> > This patch fix the issue I reported. The bug can be reproduced on v6.4-rc6,
> > then test passed on v6.4-rc6 with this patch.
> > 
> > But I found another KASAN bug [1] on aarch64 machine, by running generic/388.
> > I hit that 3 times. And hit a panic [2] (once after that kasan bug) on a x86_64
> > with pmem device (mount with dax=never), by running geneirc/388 too.
> 
> Can you try with this? I suspect the preempt dance isn't really
> necessary, but I can't quite convince myself that it isn't. In any case,
> I think this should fix it and this was exactly what I was worried about
> but apparently not able to easily trigger or prove...

Hi Jens,

Sure, I'm going to submit several new testing jobs to test this patch. BTW,
besides the kasan warning, I'm sure that mount with "dax=never" can crash the
kernel as [2], I just reproduced that. So it's worse than a kasan bug output.

Thanks,
Zorro

> 
> 
> diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
> index fe38eb0cbc82..878ec3feeba9 100644
> --- a/io_uring/io-wq.c
> +++ b/io_uring/io-wq.c
> @@ -220,7 +220,9 @@ static void io_worker_exit(struct io_worker *worker)
>  	list_del_rcu(&worker->all_list);
>  	raw_spin_unlock(&wq->lock);
>  	io_wq_dec_running(worker);
> -	worker->flags = 0;
> +	preempt_disable();
> +	current->worker_private = NULL;
> +	preempt_enable();
>  
>  	kfree_rcu(worker, rcu);
>  	io_worker_ref_put(wq);
> 
> -- 
> Jens Axboe
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ