linux-kernel - Re: [PATCH v3] PM/Memory-hotplug: Avoid task freezing failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201111201124.17528.rjw@sisk.pl>
Date:	Sun, 20 Nov 2011 11:24:17 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	pavel@....cz, lenb@...nel.org, ak@...ux.intel.com, tj@...nel.org,
	linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH v3] PM/Memory-hotplug: Avoid task freezing failures

On Sunday, November 20, 2011, Srivatsa S. Bhat wrote:
> On 11/20/2011 03:27 AM, Rafael J. Wysocki wrote:
> > On Thursday, November 17, 2011, Srivatsa S. Bhat wrote:
> >> The lock_system_sleep() function is used in the memory hotplug code at
> >> several places in order to implement mutual exclusion with hibernation.
> >> However, this function tries to acquire the 'pm_mutex' lock using
> >> mutex_lock() and hence blocks in TASK_UNINTERRUPTIBLE state if it doesn't
> >> get the lock. This would lead to task freezing failures and hence
> >> hibernation failure as a consequence, even though the hibernation call path
> >> successfully acquired the lock.
> >>
> >> This patch fixes this issue by modifying lock_system_sleep() to use
> >> mutex_trylock() in a loop until the lock is acquired, instead of using
> >> mutex_lock(), in order to avoid going to uninterruptible sleep.
> >> Also, we use msleep() to avoid busy looping and breaking expectations
> >> that we go to sleep when we fail to acquire the lock.
> >> And we also call try_to_freeze() in order to cooperate with the freezer,
> >> without which we would end up in almost the same situation as mutex_lock(),
> >> due to uninterruptible sleep caused by msleep().
> >>
> >> v3: Tejun suggested avoiding busy-looping by adding an msleep() since
> >>     it is not guaranteed that we will get frozen immediately.
> >>
> >> v2: Tejun pointed problems with using mutex_lock_interruptible() in a
> >>     while loop, when signals not related to freezing are involved.
> >>     So, replaced it with mutex_trylock().
> >>
> >> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
> >> ---
> >>
> >>  include/linux/suspend.h |   37 ++++++++++++++++++++++++++++++++++++-
> >>  1 files changed, 36 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> >> index 57a6924..0af3048 100644
> >> --- a/include/linux/suspend.h
> >> +++ b/include/linux/suspend.h
> >> @@ -5,6 +5,8 @@
> >>  #include <linux/notifier.h>
> >>  #include <linux/init.h>
> >>  #include <linux/pm.h>
> >> +#include <linux/freezer.h>
> >> +#include <linux/delay.h>
> >>  #include <linux/mm.h>
> >>  #include <asm/errno.h>
> >>  
> >> @@ -380,7 +382,40 @@ static inline void unlock_system_sleep(void) {}
> >>  
> >>  static inline void lock_system_sleep(void)
> >>  {
> >> -	mutex_lock(&pm_mutex);
> >> +	/*
> >> +	 * "To sleep, or not to sleep, that is the question!"
> >> +	 *
> >> +	 * We should not use mutex_lock() here because, in case we fail to
> >> +	 * acquire the lock, it would put us to sleep in TASK_UNINTERRUPTIBLE
> >> +	 * state, which would lead to task freezing failures. As a
> >> +	 * consequence, hibernation would fail (even though it had acquired
> >> +	 * the 'pm_mutex' lock).
> >> +	 * Using mutex_lock_interruptible() in a loop is not a good idea,
> >> +	 * because we could end up treating non-freezing signals badly.
> >> +	 * So we use mutex_trylock() in a loop instead.
> >> +	 *
> >> +	 * Also, we add try_to_freeze() to the loop, to co-operate with the
> >> +	 * freezer, to avoid task freezing failures due to busy-looping.
> >> +	 *
> >> +	 * But then, since it is not guaranteed that we will get frozen
> >> +	 * rightaway, we could keep spinning for some time, breaking the
> >> +	 * expectation that we go to sleep when we fail to acquire the lock.
> >> +	 * So we add an msleep() to the loop, to dampen the spin (but we are
> >> +	 * careful enough not to sleep for too long at a stretch, lest the
> >> +	 * freezer whine and give up again!).
> >> +	 *
> >> +	 * Now that we no longer busy-loop, try_to_freeze() becomes all the
> >> +	 * more important, due to a subtle reason: if we don't cooperate with
> >> +	 * the freezer at this point, we could end up in a situation very
> >> +	 * similar to mutex_lock() due to the usage of msleep() (which sleeps
> >> +	 * uninterruptibly).
> >> +	 *
> >> +	 * Phew! What a delicate balance!
> >> +	 */
> >> +	while (!mutex_trylock(&pm_mutex)) {
> >> +		try_to_freeze();
> >> +		msleep(10);
> > 
> > The number here seems to be somewhat arbitrary.  Is there any reason not to
> > use 100 or any other number?
> > 
> 
> Short answer:
> 
> The number is not arbitrary. It is designed to match the frequency at which
> the freezer re-tries to freeze tasks in a loop for 20 seconds (after which
> it gives up).

So that should be documented too, right?  Perhaps there should be a #define
for that number.

> Long answer:
> 
> Let us define 'time-to-freeze-this-task' as the duration of time between the
> setting of TIF_FREEZE flag for this task (after the task enters the while
> loop in this patch) and the time at which this task is considered frozen
> by the freezer.
> 
> There are 2 constraints we are trying to handle here:
> 
> [And let us see extreme case solutions for these constraints, to start with]
> 
> 1. We want task freezing to be fast, to make hibernation fast.
> Hence, we don't want to use msleep() here at all, just the
> try_to_freeze() within the while loop would fit well.
> 
> 2. As Tejun suggested, considering that we might not get frozen immediately,
> we don't want to hurt cpu power management during that time. So, we
> want to sleep when possible. Which means we can sleep for ~20 seconds at a
> stretch and still manage to prevent freezing failures.
> 
> But obviously we need to strike a balance between these 2 contradictions.
> Hence, we observe that the freezer goes in a loop and tries to freeze
> tasks, and waits for 10ms before retrying (and continues this procedure
> for 20 seconds).
> 
> Since we want time-to-freeze-this-task as small as possible, we have to
> minimize the number of iterations the freezer does waiting for us.
> Hence we choose to sleep for 10ms, which means, in the worst case,
> our time-to-freeze-task will be one iteration of the freezer, IOW 10ms.
> [That way, actually sleeping for 9ms would do best, but we probably don't
> want to get that specific here, or should we?]
> 
> I think I have given a slight hint about this issue in the comment as well...
> 
> I prefer not to #define 10 and use it in freezer's loop and in this above
> msleep() because, good design means, "Keep the freezer internals internal
> to the freezer!". But all of us agree that this patch is only a temporary
> hack (which unfortunately needs to know about freezer's internal working)..
> and also that, we need to fix this whole issue at a design level sooner
> or later.
> So having 10ms msleep(), as well as hard-coding this value here, are both
> justified, IMHO.
> 
> As for the comment, I don't know if I should be expanding that "slight hint"
> into a full-fledged explanation, since Tejun is already complaining about
> its verbosity ;-)
> 
> By the way, for somebody who is looking from a purely memory-hotplug point
> of view and is not that familiar with the freezer, the "slight hint" in
> the comment "careful enough not to sleep for too long at a stretch...
> freezing failure..." is supposed to be interpreted as : "Oh when does
> freezing fail? Let me look up the freezer code.. ah, 20 seconds.
> By the way, I spot a 10ms sleep in the freezer loop as well..
> Oh yes, *now* it all makes sense!" :-)
> 
> Or perhaps, adding the same justification I gave above (about the 10ms
> sleep) to the changelog should do, right?
> 
> >> +	}
> >>  }
> >>  
> >>  static inline void unlock_system_sleep(void)
> > 

Well, let's not add temporary hacks, especially as fragile as this one.

Why don't we simply open code what we need using a proper wait queue
and waiting for an event?

So, have something like transition_in_progress (I believe there already is
something like that somewhere), set it to 'true' under pm_mutex in
lock_system_sleep(), possibly waiting for it to become 'false' properly
(using interruptible sleep with try_to_freeze()) and reset it to 'false' (under pm_mutex) in unlock_system_sleep()?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/