[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230612093051.c5tkj3jwitwehyxd@zlang-mailbox>
Date: Mon, 12 Jun 2023 17:30:51 +0800
From: Zorro Lang <zlang@...nel.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Chinner <david@...morbit.com>,
"Darrick J. Wong" <djwong@...nel.org>,
Zorro Lang <zlang@...hat.com>, linux-xfs@...r.kernel.org,
Mike Christie <michael.christie@...cle.com>,
"Michael S. Tsirkin" <mst@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [6.5-rc5 regression] core dump hangs (was Re: [Bug report]
fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+)
On Mon, Jun 12, 2023 at 03:45:12AM -0500, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@...ux-foundation.org> writes:
>
> > On Sun, Jun 11, 2023 at 10:49 PM Dave Chinner <david@...morbit.com> wrote:
> >>
> >> On Sun, Jun 11, 2023 at 10:34:29PM -0700, Linus Torvalds wrote:
> >> >
> >> > So that "!=" should obviously have been a "==".
> >>
> >> Same as without the condition - all the fsstress tasks hang in
> >> do_coredump().
> >
> > Ok, that at least makes sense. Your "it made things worse" made me go
> > "What?" until I noticed the stupid backwards test.
> >
> > I'm not seeing anything else that looks odd in that commit
> > f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps
> > regression").
> >
> > Let's see if somebody else goes "Ahh" when they wake up tomorrow...
>
> It feels like there have been about half a dozen bugs pointed out in
> that version of the patch. I am going to have to sleep before I can get
> as far as "Ahh"
>
> One thing that really stands out for me is.
>
> if (test_if_loop_should_continue) {
> set_current_state(TASK_INTERRUPTIBLE);
> schedule();
> }
>
> /* elsewhere */
> llist_add(...);
> wake_up_process()
>
> So it is possible that the code can sleep indefinitely waiting for a
> wake-up that has already come, because the order of set_current_state
> and the test are in the wrong order.
>
> Unfortunately I don't see what would effect a coredump on a process that
> does not trigger the vhost_worker code.
>
>
>
> About the only thing I can image is if io_uring is involved. Some of
> the PF_IO_WORKER code was changed, and the test
> "((t->flags & (PF_USER_WORKER | PF_IO_WORKER)) != PF_USER_WORKER)"
> was sprinkled around.
>
> That is the only code outside of vhost specific code that was changed.
>
>
> Is io_uring involved in the cases that hang?
Oh, right, I involved io_uring into in fstests' fsstress.c, and I built kernel
with CONFIG_IO_URING=y. If Darrick (said he didn't hit this issue) didn't enable
io_uring, that might mean it's io_uring related.
>
>
> Eric
>
Powered by blists - more mailing lists