netdev - Re: Suspend/resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTi=Gq5thZk2f1nWTz3dDA5Non8ANZg@mail.gmail.com>
Date:	Fri, 15 Apr 2011 09:14:17 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ciprian Docan <docan@...n.rutgers.edu>, netdev@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc:	Francois Romieu <romieu@...zoreil.com>,
	Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
	"Rafael, J. Wysocki" <rjw@...k.pl>
Subject: Re: Suspend/resume - slow resume

On Fri, Apr 15, 2011 at 8:33 AM, Ciprian Docan <docan@...n.rutgers.edu> wrote:
>
> I enabled CONFIG_PRINTK_TIME in the config file and rebuild the kernel; and
> after a fresh reboot I saved the dmesg output before and after
> suspend/resume. Attached is the diff file containing the resume sequence. At
> linew 133 - 134 there is a longer gap, which I think is related to the r8169
> driver.
>
> I have unloaded the module and did a new suspend/resume sequence, but this
> time with the r8169 module removed it went much faster (3-4s to resume
> completely). Can you please help here ?

Ok, this seems to be yet another case of the ridiculously common error
of "let's load the firmware early, before the machine is even up". It
sometimes happens at boot-time (driver writers only test the module
case, not the built-in case), but it happens very often at
suspend/resume time (which driver writers seem to not test at all).

It is not valid to try to load the firmware at resume time. User space
isn't running, so all the firmware loading helpers are frozen.
Firmware that needs to be reloaded after a suspend needs to be saved
to memory!

I added netdev to the recipient list, because this is definitely not
just a r8169 issue - we've had this bug happen over and over again.

Looking at the code, it looks like r8169 _tries_ to save the firmware
(it has had this SAME bug before), but in case no firmware was ever
loaded successfully at all (either because there was no firmware to
begin with, or because the device had never been opened, so it had
never tried to load it before), it _still_ tries to load the firmware.

WTF? At resume time, you should _resume_, not "load firmware even if
it wasn't loaded before".

I also wonder if the generic power management layer code could do
something smarter about this. Print a big warning when somebody wants
to load firmware when the machine isn't fully operational, instead of
the "wait 60 seconds in the totally futile hope that things will
change". Adding greg to the cc for that case, he's marked as firmware
loader maintainer.

So Francois, can we please not load the firmware at resume time when
it wasn't loaded when suspended!

Greg/Rafael/whoever - could we please replace the "sleep for a minute
if you can't load the firmware" with a big immediate WARN_ON() if
somebody tries to load it at early boot time or at resume time? That
way we don't have the machine dead for sixty seconds (most people just
assume it's dead and will just reboot - I've done that myself when
I've been hit by this), and we'd get a nice "here's the offender"
printout.

And netdev/lkml just cc'd for information, just in case some driver
writer is lurking and suddenly realizes that their driver is broken
too.

                             Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html