linux-kernel - Re: [Query] Preemption (hogging) of the work handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160713231858.GG4695@ubuntu>
Date:	Wed, 13 Jul 2016 16:18:58 -0700
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	"Rafael J. Wysocki" <rafael@...nel.org>
Cc:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Jan Kara <jack@...e.cz>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>, Tejun Heo <tj@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	vlevenetz@...sol.com,
	Vaibhav Hiremath <vaibhav.hiremath@...aro.org>,
	Alex Elder <alex.elder@...aro.org>, johan@...nel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linux PM <linux-pm@...r.kernel.org>,
	Petr Mladek <pmladek@...e.com>
Subject: Re: [Query] Preemption (hogging) of the work handler

On 14-07-16, 01:08, Rafael J. Wysocki wrote:
> On Wed, Jul 13, 2016 at 5:39 PM, Viresh Kumar <viresh.kumar@...aro.org> wrote:
> > Maybe not, as this can still lead to the original bug we were all
> > chasing. This may hog some other CPU if we are doing excessive
> > printing in suspend :(
> 
> How can it hog that CPU, exactly?

Not *that* CPU, but any of the CPUs. Because we are moving back to
synchronous printing, any CPU which is doing a lot of printing, may
end up spending all its time in the print-loop (as the original
problem we had).

> > suspend_console() is called quite early, so for example in my case we
> > do lots of printing during suspend (not from the suspend thread, but
> > an IRQ handled by the USB subsystem, which removes a bus with help of
> > some other thread probably).
> 
> Why doing a lot of printing from an IRQ is not regarded as a bug?

We aren't doing it in Interrupt Context or with interrupts disabled,
but perhaps in the kthread managed by usb hub core.

But, I am not only talking about my platform's printing issues, but
the idea behind the patches that Sergey and Jan are working on. If we
move back to synchronous printing before starting to suspend the
devices, we may have the same problem again that we were trying to
solve.

> Are all of those messages printed actually useful?

Hmm, maybe not. But that's not the point I was trying to raise, as I
earlier mentioned :)

We have a problem with asynchronous printing after disabling
interrupts on the last running CPU, and we are trying to disable that
from suspend_console(), because we already have a function to call
this from.

> > That is why my Hacky patch tried to do it after devices are removed
> > and irqs are disabled, but before syscore users are suspended (and
> > timekeeping is one of them). And so it fixes it for me completely.
> >
> > IOW, we should switch back to synchronous printing after disabling
> > interrupts on the last running CPU.
> >
> > And I of course agree with Rafael that we would need something similar
> > in Hibernation code path as well, if we choose to fix it my way.
> 
> Well, the patch proposed by Sergey is sufficient to fix the deadlock
> issue and it is not clear that anything more needs to be done.
> 
> My suggestion, then, would be to use this patch to start with and see
> if things really go worse then.

Sure, I am just saying that theoretically, we can still have the CPU
hog problem that we all started with :)

-- 
viresh