linux-kernel - Re: [BUG] Strange 1-second pauses during Resume-from-RAM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071116112317.GA12724@elte.hu>
Date:	Fri, 16 Nov 2007 12:23:17 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Mark Lord <lkml@....ca>
Cc:	Pavel Machek <pavel@....cz>, Mark Lord <liml@....ca>,
	Thomas Gleixner <tglx@...utronix.de>, len.brown@...el.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-pm@...ts.linux-foundation.org,
	rjw@...k.pl, Len Brown <lenb@...nel.org>
Subject: Re: [BUG] Strange 1-second pauses during Resume-from-RAM


* Ingo Molnar <mingo@...e.hu> wrote:

> once that tracer bug was fixed, the best method to generate a trace 
> was to do this:
> 
>    echo 1 > /proc/sys/kernel/stackframe_tracing
>    echo 1 > /proc/sys/kernel/syscall_tracing
>    ./trace-cmd bash -c "echo mem > /sys/power/state" > trace.txt

so here's an UP suspend+resume trace i did:

  http://redhat.com/~mingo/latency-tracing-patches/misc/trace-suspend-long.txt.bz2

tons of detail - which might be interesting to other folks as well. Fact 
is, our suspend-to-RAM+resume cycle is very, very slow, even on fast 
hardware - and this trace shows all the reasons why.

This was a fully cached system - i.e. i've done a suspend+resume before 
to warm up the caches. (not that suspend+resume does much IO normally.)

The trace shows that a suspend+resume cycle is 7.95 seconds long 
(without counting the time the box spent suspended) - ouch! This was a 
T60 with Core2Duo 1.83GHz.

For example here is where freezing starts:

    bash-2397  0.... 31686us : remove_wait_queue (vt_waitactive)
    bash-2397  0.... 31688us : freeze_processes (enter_state)
    bash-2397  0.... 31689us : printk (freeze_processes)

here is where the ACPI code triggers the suspend:

    bash-2397  0D... 1904138us : acpi_hw_low_level_write (acpi_hw_register_write)

but this is a whopping 1.9 seconds into the trace already!

first sign of life after i opened the laptop lid again:

    bash-2397  0D... 1904138us : __restore_processor_state (restore_processor_state)
    bash-2397  0D... 1904138us : enable_sep_cpu (__restore_processor_state)

(in the trace there's no delay visible - the period of time spent 
suspended is not visible to the tracer.)

One good way to start looking at such traces is to filter out 
rescheduling events alone:

  grep ': schedule <'  trace-suspend-long.txt

that gives a rough outline of what's going on:

   <idle>-0     0D... 1776566us : schedule <bash-2397> (0 20)
     bash-2397  0D... 1786748us : schedule <<idle>-0> (20 0)
 scsi_eh_-419   0D... 1786814us : schedule <bash-2397> (0 -5)
     bash-2397  0D... 1786960us : schedule <scsi_eh_-419> (-5 0)
 scsi_eh_-421   0D... 1787020us : schedule <bash-2397> (0 -5)
     bash-2397  0D... 1787125us : schedule <scsi_eh_-421> (-5 0)

so you can zoom in on the real area of interest by searching for the 
timestamp.

Hope this helps,

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/