lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 26 Sep 2022 14:13:53 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Christian Borntraeger <borntraeger@...ux.ibm.com>
Cc:     bigeasy@...utronix.de, dietmar.eggemann@....com,
        ebiederm@...ssion.com, linux-kernel@...r.kernel.org,
        linux-pm@...r.kernel.org, mgorman@...e.de, mingo@...nel.org,
        oleg@...hat.com, rjw@...ysocki.net, rostedt@...dmis.org,
        tj@...nel.org, vincent.guittot@...aro.org, will@...nel.org,
        Marc Hartmayer <mhartmay@...ux.ibm.com>, amit@...nel.org,
        virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v3 6/6] freezer,sched: Rewrite core freezer logic

On Mon, Sep 26, 2022 at 12:55:21PM +0200, Christian Borntraeger wrote:
> 
> 
> Am 26.09.22 um 10:06 schrieb Christian Borntraeger:
> > 
> > 
> > Am 23.09.22 um 09:53 schrieb Christian Borntraeger:
> > > Am 23.09.22 um 09:21 schrieb Christian Borntraeger:
> > > > Peter,
> > > > 
> > > > as a heads-up. This commit (bisected and verified) triggers a
> > > > regression in our KVM on s390x CI. The symptom is that a specific
> > > > testcase (start a guest with next kernel and a poky ramdisk,
> > > > then ssh via vsock into the guest and run the reboot command) now
> > > > takes much longer (300 instead of 20 seconds). From a first look
> > > > it seems that the sshd takes very long to end during shutdown
> > > > but I have not looked into that yet.
> > > > Any quick idea?
> > > > 
> > > > Christian
> > > 
> > > the sshd seems to hang in virtio-serial (not vsock).
> > 
> > FWIW, sshd does not seem to hang, instead it seems to busy loop in
> > wait_port_writable calling into the scheduler over and over again.
> 
> -#define TASK_FREEZABLE                 0x00002000
> +#define TASK_FREEZABLE                 0x00000000
> 
> "Fixes" the issue. Just have to find out which of users is responsible.

Since it's not the wait_port_writable() one -- we already tested that by
virtue of 's/wait_event_freezable/wait_event/' there, it must be on the
producing side of that port. But I'm having a wee bit of trouble
following that code.

Is there a task stuck in FROZEN state? -- then again, I thought you said
there was no actual suspend involved, so that should not be it either.

I'm curious though -- how far does it get into the scheduler? It should
call schedule() with __state == TASK_INTERRUPTIBLE|TASK_FREEZABLE, which
is quite sufficient to get it off the runqueue, who then puts it back?
Or is it bailing early in the wait_event loop?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ