linux-kernel - Re: 2.6.20-rc6-mm3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1170369202.29240.339.camel@localhost.localdomain>
Date:	Thu, 01 Feb 2007 23:33:22 +0100
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Mattia Dongili <malattia@...ux.it>
Cc:	Andrew Morton <akpm@...l.org>, linux-kernel@...r.kernel.org,
	mingo@...hat.com
Subject: Re: 2.6.20-rc6-mm3

Mattia,

On Thu, 2007-02-01 at 22:11 +0100, Mattia Dongili wrote:
> > It might be helpful if you could try with your original config again.
> > Please enable printk timestamps and SysRq. Once the slowness kicks in
> > please issue a SysRq-Q, so we can look at the internal state of the tick
> > code.
> 
> dmesg is below. I need to say that the printk times are bogus wrt the
> actual time passing and at one point I was sick waiting and killed all
> tasks. Ah, I have Ingo's resume-fix patch applied here.

Ok, does not affect your problem. 

Thanks for testing.

> [   13.117670] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> [   13.391218] APIC calibration not consistent with PM Timer: 228ms instead of 100ms
> [   13.391315] APIC delta adjusted to PM-Timer: 1041737 (2375299)

Sigh. This APIC calibration madness seems to be spreading (especially on
Sony laptops). But this is not the cause of the problem.

> [   14.448611] ACPI: (supports S0 S3 S4 S5)
> [   14.449145] Time: acpi_pm clocksource has been installed.
> [   14.449254] Switched to NOHz mode on CPU #1
> [   14.449258] Switched to NOHz mode on CPU #0

> [   20.937884] usb-storage: device scan complete
> [   22.644599] SysRq : Show Pending Timers
> [   22.644753] Timer List Version: v0.3
> [   22.644833] HRTIMER_MAX_CLOCK_BASES: 2
> [   22.644913] now at 21927904003 nsecs

> [   22.646105] active timers:
> [   22.646183]   .nohz_mode      : 1
> [   22.646343]   .tick_stopped   : 1
> [   22.646823]   .last_jiffies   : 4294897777
> [   22.646903]   .next_jiffies   : 4294897996
> [   22.646984]   .idle_expires   : 22800000000 nsecs
> [   22.647065] jiffies: 4294897777

Looks consistent.

> [   22.647144] cpu: 1
> [   22.648252] active timers:
> [   22.648330]  #0: <c037fd9c>, hrtimer_wakeup, S:01
> [   22.648738]  # expires at 16066999784 nsecs [in -5860904219 nsecs]
> [   22.648821]  #1: <c037fd9c>, hrtimer_wakeup, S:01
> [   22.649218]  # expires at 16325154918 nsecs [in -5602749085 nsecs]

ouch. that's 5 seconds over time

> [   22.649301]   .nohz_mode      : 1
> [   22.649380]   .idle_tick      : 15324783000 nsecs
> [   22.649461]   .tick_stopped   : 1
> [   22.649540]   .idle_jiffies   : 4294896127
> [   22.649619]   .idle_calls     : 2080
> [   22.649698]   .idle_sleeps    : 363
> [   22.649778]   .idle_entrytime : 15325159667 nsecs
> [   22.649858]   .idle_sleeptime : 10015150422 nsecs
> [   22.649940]   .last_jiffies   : 4294896127
> [   22.650020]   .next_jiffies   : 4294896189

delta is 62 jiffies = 62 * 4ms which is consistent with the idle_expires
time below:

> [   22.650100]   .idle_expires   : 15572000000 nsecs
> [   22.650181] jiffies: 4294897777

CPU 1 got no updates since it went idle.

> [   22.650336] Tick Device: mode:     1
> [   22.650488] Clock Event Device: pit
> [   22.655191]  next_event:     21940398000 nsecs

> [   22.655894] tick_broadcast_mask: 00000003
> [   22.655974] tick_broadcast_oneshot_mask: 00000003

Though the broadcast mode is active for both cpus

> [   22.656131] Tick Device: mode:     1
> [   22.656284] Clock Event Device: lapic
> [   22.656821]  next_event:     22800000000 nsecs

> [   22.657521] Tick Device: mode:     1
> [   22.657675] Clock Event Device: lapic
> [   22.658211]  next_event:     21940398000 nsecs

And the broadcast event is set for the next CPU#1 event, but the expiry
time is far away from the idle_expires time above.

I'm a bit puzzled and too tired to spot the bug right now.

May I ask you for another test ? Please turn on high resolution timers
and check, if the same strange behaviour is happening.

Thanks. 

	tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/