linux-kernel - Re: [Query] Preemption (hogging) of the work handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0jNgDkkBAX0teTo8MEHa2f+5+TAmiyFWwvEUjTkEez9Jg@mail.gmail.com>
Date:	Tue, 12 Jul 2016 21:59:52 +0200
From:	"Rafael J. Wysocki" <rafael@...nel.org>
To:	Viresh Kumar <viresh.kumar@...aro.org>
Cc:	Petr Mladek <pmladek@...e.com>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Jan Kara <jack@...e.cz>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Tejun Heo <tj@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	vlevenetz@...sol.com, vaibhav.hiremath@...aro.org,
	alex.elder@...aro.org, johan@...nel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [Query] Preemption (hogging) of the work handler

On Tue, Jul 12, 2016 at 7:11 PM, Viresh Kumar <viresh.kumar@...aro.org> wrote:
> On 12-07-16, 06:12, Viresh Kumar wrote:
>
>> Yeah, so I tried debugging this more and I am able to get printing
>> done to just before arch_suspend_disable_irqs() in suspend.c and then
>> it stops because of the async nature.
>>
>> I get to this point for both successful suspend/resume (where system
>> resumes back successfully) and in the bad case (where the system just
>> hangs/crashes).
>>
>> FWIW, I also tried commenting out following in suspend_enter():
>>
>>         error = suspend_ops->enter(state);
>>
>> so that the system doesn't go into suspend at all, and just resume
>> back immediately (similar to TEST_CORE) and I saw the hang/crash then
>> as well one of the times.
>
> So I tried it cleanly without any local hacks using:
>
> echo core > /sys/power/pm_test
>
> and I still see the problem, so whatever happens, happens before
> putting the system into complete suspend.
>
> FWIW, I also tried this hacky thing:
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index bc71478fac26..045ebc88fe08 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -170,6 +170,7 @@ void __attribute__ ((weak)) arch_suspend_enable_irqs(void)
>   *
>   * This function should be called after devices have been suspended.
>   */
> +extern bool printk_sync_suspended;
>  static int suspend_enter(suspend_state_t state, bool *wakeup)
>  {
>         char suspend_abort[MAX_SUSPEND_ABORT_LEN];
> @@ -218,6 +219,7 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>         }
>
>         arch_suspend_disable_irqs();
> +       printk_sync_suspended = true;
>         BUG_ON(!irqs_disabled());
>
>         error = syscore_suspend();
> @@ -237,6 +239,7 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>                 syscore_resume();
>         }
>
> +       printk_sync_suspended = false;
>         arch_suspend_enable_irqs();
>         BUG_ON(irqs_disabled());
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 46bb017ac2c9..187054074b96 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -293,6 +293,7 @@ static u32 log_buf_len = __LOG_BUF_LEN;
>
>  /* Control whether printing to console must be synchronous. */
>  static bool __read_mostly printk_sync = false;
> +bool printk_sync_suspended = false;
>  /* Printing kthread for async printk */
>  static struct task_struct *printk_kthread;
>  /* When `true' printing thread has messages to print */
> @@ -300,7 +301,7 @@ static bool printk_kthread_need_flush_console;
>
>  static inline bool can_printk_async(void)
>  {
> -       return !printk_sync && printk_kthread;
> +       return !printk_sync && !printk_sync_suspended && printk_kthread;
>  }
>
>  /* Return log buffer address */
>
>
> i.e. I disabled async-printk after interrupts are disabled on the last
> running CPU (0) and enabled it again before enabling interrupts back.
>
> This FIXES the hangs for me :)
>
> I don't think its a crash but some sort of deadlock in async printk
> thread because of the state it was left in before we offlined all
> other CPUs and disabled interrupts on the local one.

It looks like a new printk() waits for a previous one to make progress
and since progress cannot be made under the suspend conditions, it
waits forever.