lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Jul 2017 22:12:13 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Krzysztof Kozlowski <krzk@...nel.org>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Kevin Hilman <khilman@...nel.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux PM list <linux-pm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linux-Renesas <linux-renesas-soc@...r.kernel.org>
Subject: Re: [PATCH v3 2/8] PM / Domains: Handle safely genpd_syscore_switch()
 call on non-genpd device

On Tue, Jul 4, 2017 at 10:05 PM, Krzysztof Kozlowski <krzk@...nel.org> wrote:
> On Tue, Jul 04, 2017 at 09:54:10PM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 4, 2017 at 8:36 PM, Krzysztof Kozlowski <krzk@...nel.org> wrote:
>> > On Tue, Jul 04, 2017 at 08:19:47PM +0200, Geert Uytterhoeven wrote:
>> >> Hi Krzysztof,
>> >>
>> >> On Tue, Jul 4, 2017 at 8:10 PM, Krzysztof Kozlowski <krzk@...nel.org> wrote:
>> >> > On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote:
>> >> >> On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@...nel.org> wrote:
>> >> >> > genpd_syscore_switch() had two problems:
>> >> >> > 1. It silently assumed that device, it is being called for, belongs to
>> >> >> >    generic power domain and used container_of() on its power domain
>> >> >> >    pointer.  Such assumption might not be true always.
>> >> >> >
>> >> >> > 2. It iterated over list of generic power domains without holding
>> >> >> >    gpd_list_lock mutex thus list could have been modified in the same
>> >> >> >    time.
>> >> >> >
>> >> >> > Usage of genpd_lookup_dev() solves both problems as it is safe a call
>> >> >> > for non-generic power domains and uses mutex when iterating.
>> >> >> >
>> >> >> > Reported-by: Ulf Hansson <ulf.hansson@...aro.org>
>> >> >> > Signed-off-by: Krzysztof Kozlowski <krzk@...nel.org>
>> >> >> > Acked-by: Ulf Hansson <ulf.hansson@...aro.org>
>> >> >>
>> >> >> This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull
>> >> >> request "[GIT PULL] Power management updates for v4.13-rc1".
>> >> >>
>> >> >> > Not tested on real hardware.
>> >> >>
>> >> >> This causes the following BUG during s2ram on all my Renesas arm32 boards,
>> >> >> where the system timer is an IRQ safe device:
>> >> >>
>> >> >> PM: Syncing filesystems ... done.
>> >> >> PM: Preparing system for sleep (mem)
>> >> >> Freezing user space processes ... (elapsed 0.001 seconds) done.
>> >> >> OOM killer disabled.
>> >> >> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> >> >> PM: Suspending system (mem)
>> >> >> PM: suspend of devices complete after 122.841 msecs
>> >> >> PM: suspend devices took 0.130 seconds
>> >> >> PM: late suspend of devices complete after 2.682 msecs
>> >> >> PM: noirq suspend of devices complete after 29.951 msecs
>> >> >> Disabling non-boot CPUs ...
>> >> >> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
>> >> >
>> >> > Thanks for report!
>> >> >
>> >> > Damn it, although I couldn't find this in the code, but I was fearing
>> >> > that this ends up in atomic section. That would kind of explain why
>> >> > mutex was not there [1].
>> >> >
>> >> > Anyway, the buggy code was there already. Instead of "sleeping in atomic
>> >> > section" there was no locking at all... In this context this was
>> >> > probably safe because it was executed *after* disabling non-boot CPUs
>> >> > but then the function cannot be called in other contexts.
>> >> >
>> >> > I am not sure I will be capable of developing the proper fix as I do not
>> >> > have the hardware and I do not know all stuff happening in sh suspend.
>> >> > Probably reverting this and living with non-locked path would be the
>> >> > safest choice.
>> >> >
>> >> > [1] https://patchwork.kernel.org/patch/9778903/
>> >>
>> >> AFAIU, all syscore stuff runs in atomic context.
>> >
>> > Indeed... The confusing part is that this code is syscore only from
>> > the name, it is not hooked in to syscore_ops. Although going by call
>> > chain (through sh clocksource drivers) we end up in
>> > timekeeping_suspend() which is a syscore op.
>> >
>> > I wonder whether it would be useful - after reverting my commit - to add
>> > an assert (which is a stronger API requirement than only documentation "may
>> > only be called during the system core (syscore) suspend") like:
>> >         WARN_ON(num_online_cpus() > 1));
>> > as without mutexes this should not be executed with more than one online
>> > CPU.
>>
>> Or maybe WARN_ON_ONCE(!in_atomic())?
>
> You could be in atomic section on this CPU and still have other CPUs
> online playing with gpd_list (without any protection from locking).
> This function is safe only on non-SMP case.

Well, not quite.

It is safe if you can guarantee that no other CPUs will touch the data
structure in question concurrently, which pretty much is the case for
timekeeping_suspend() even though it may be invoked without taking the
other CPUs offline (from the suspend-to-idle core path).

Thanks,
Rafael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ