lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 10 Feb 2017 10:49:25 -0800
From:   John Stultz <john.stultz@...aro.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Gabriel Beddingfield <gabe@...tlabs.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Harpreet Sangha <eliptus@...gle.com>,
        Andrew LeCain <alecain@...gle.com>,
        John Thompson <jthomp@...tlabs.com>,
        Paul Trautrim <paultrautrim@...gle.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM list <linux-pm@...r.kernel.org>,
        Eric Caruso <ejcaruso@...gle.com>,
        Greg Hackmann <ghackmann@...gle.com>,
        Todd Poynor <toddpoynor@...gle.com>,
        Rom Lemarchand <romlem@...gle.com>
Subject: Re: alarm timer/timerfd expiration does not abort suspend operation

On Fri, Feb 10, 2017 at 1:43 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Fri, 3 Feb 2017, Gabriel Beddingfield wrote:
>> Hi Thomas and John,
>>
>> TL;DR: if an alarmtimer-backed timerfd expires just prior to
>> alarmtimer_suspend() begin called, the system will continue to go into
>> suspend (with possibly no future wakeups scheduled). The expected behavior
>> is that the timer expiration would cause the suspend operation to abort. I
>> see several ways to fix it and want to know your preference.
>>
>> When using an alarmtimer-backed timerfd (i.e. CLOCK_BOOTTIME_ALARM or
>> CLOCK_REALTIME_ALARM) we have observed the following race condition:
>>
>> 1. Userspace commands the system to go into suspend (echo mem >
>> /sys/power/state)
>> 2. The alarmtimer for a timerfd expires, making the timer inactive until
>> someone reads from the file descriptor.
>> 3. alarmtimer_suspend() does not find any pending timers, and therefore
>> does not schedule a wakeup.
>> 4. device goes into suspend.
>>
>> However, if steps 2 and 3 are swapped, alarmtimer_suspend() would have seen
>> that an expiration was "soon" and cause an abort of the suspend. This can
>> be reproduced on an idle system by having a process aggressively doing
>> `echo mem > /sys/power/state' while another process sets a 4-sec repeating
>> timerfd backed by CLOCK_BOOTTIME_ALARM. Eventually the system will go to
>> sleep and not wake up.
>>
>> I see a few ways to fix it:
>>
>> 1. Create a wakeup_source for each timerfd, and if it's an alarm timer call
>> __pm_stay_awake() in timerfd_triggered() and __pm_relax() in timerfd_read().
>> 2. call pm_system_wakeup() in alarmtimer_fired()
>> 3. call `if (isalarm(ctx)) pm_system_wakeup();' in timerfd_triggered()
>> 4. call __pm_wakeup_event(ws, 2 * MSECS_PER_SEC) in alarmtimer_fired()
>> 5. call `if (isalarm(ctc)) __pm_wakeup_event(ws, 2 * MSECS_PER_SEC);' in
>> timerfd_triggered() (using a static struct wakeup_source).
>>
>> I think #1 is right, followed by #2. They all have pros/cons:
>>
>> * #1 Can eliminate race conditions (rather than an arbitrary 2-sec
>> timeout)... but is effectively holding a hard-to-trace block on all PM
>> operations (e.g. read/write of /sys/power/wakeup_count blocks until someone
>> reads from the file descriptor).
>> * #2 Matches the current behavior of the "happy case"... but bypasses the
>> userspace policy system provided by wakeup.
>> * #3 same pro/con as #2... but solution is specific to timerfd's.
>> * #4 Matches the current behavior of the "happy case" if and only if
>> userspace is using the 'wakeup' system, otherwise doesn't change any
>> behavior. But, I wonder how many people think the current behavior is a bug.
>> * #5 Same pro/con as #4... but solution is specific to timerfd's.
>>
>> I've attached a patch that implements #1.

Sorry for not getting back to you earlier (been traveling this week).

I'm surprised this issue hasn't bitten any of the android folks, as I
believe they have been making use of this for some time now. CC'ing a
few just to make sure we're all on the same page.

The approach you took in your patch looks basically ok to me, though I
think the __pm_wakeup_event() method in #4 sounds safer, just to avoid
the problematic issue if no one is waiting on the fd.

Though I worry I'm not quite understanding the con for that case
properly, so maybe you can clarify what concerns you there?

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ