[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzGXgbfRngNfDhoC@hirez.programming.kicks-ass.net>
Date: Mon, 26 Sep 2022 14:13:53 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Christian Borntraeger <borntraeger@...ux.ibm.com>
Cc: bigeasy@...utronix.de, dietmar.eggemann@....com,
ebiederm@...ssion.com, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org, mgorman@...e.de, mingo@...nel.org,
oleg@...hat.com, rjw@...ysocki.net, rostedt@...dmis.org,
tj@...nel.org, vincent.guittot@...aro.org, will@...nel.org,
Marc Hartmayer <mhartmay@...ux.ibm.com>, amit@...nel.org,
virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v3 6/6] freezer,sched: Rewrite core freezer logic
On Mon, Sep 26, 2022 at 12:55:21PM +0200, Christian Borntraeger wrote:
>
>
> Am 26.09.22 um 10:06 schrieb Christian Borntraeger:
> >
> >
> > Am 23.09.22 um 09:53 schrieb Christian Borntraeger:
> > > Am 23.09.22 um 09:21 schrieb Christian Borntraeger:
> > > > Peter,
> > > >
> > > > as a heads-up. This commit (bisected and verified) triggers a
> > > > regression in our KVM on s390x CI. The symptom is that a specific
> > > > testcase (start a guest with next kernel and a poky ramdisk,
> > > > then ssh via vsock into the guest and run the reboot command) now
> > > > takes much longer (300 instead of 20 seconds). From a first look
> > > > it seems that the sshd takes very long to end during shutdown
> > > > but I have not looked into that yet.
> > > > Any quick idea?
> > > >
> > > > Christian
> > >
> > > the sshd seems to hang in virtio-serial (not vsock).
> >
> > FWIW, sshd does not seem to hang, instead it seems to busy loop in
> > wait_port_writable calling into the scheduler over and over again.
>
> -#define TASK_FREEZABLE 0x00002000
> +#define TASK_FREEZABLE 0x00000000
>
> "Fixes" the issue. Just have to find out which of users is responsible.
Since it's not the wait_port_writable() one -- we already tested that by
virtue of 's/wait_event_freezable/wait_event/' there, it must be on the
producing side of that port. But I'm having a wee bit of trouble
following that code.
Is there a task stuck in FROZEN state? -- then again, I thought you said
there was no actual suspend involved, so that should not be it either.
I'm curious though -- how far does it get into the scheduler? It should
call schedule() with __state == TASK_INTERRUPTIBLE|TASK_FREEZABLE, which
is quite sufficient to get it off the runqueue, who then puts it back?
Or is it bailing early in the wait_event loop?
Powered by blists - more mailing lists