lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	09 Jul 2008 15:50:18 -0700
From:	Philippe Troin <phil@...i.org>
To:	john stultz <johnstul@...ibm.com>
Cc:	linux-kernel@...r.kernel.org, macro@...ux-mips.org
Subject: Re: 2.6.25.9: system clocks works normally then speeds up 4x...

john stultz <johnstul@...ibm.com> writes:

> On Wed, 2008-07-09 at 13:53 -0700, Philippe Troin wrote:
> > john stultz <johnstul@...ibm.com> writes:
> > 
> > > When you're seeing the issue, can you do the following:
> > >   cat /proc/interrupts > interrupts
> > > 
> > >   <wait 10 seconds by your wristwatch> 
> > > 
> > >   cat /proc/interrupts >> interrupts
> > > 
> > > And send the results?
> > 
> > There you are:
> > 
> >            CPU0       CPU1
> >   0:        353          0   IO-APIC-edge      timer
> > LOC:  546305845   33155722   Local timer interrupts
> > Roughly 10 seconds later:
> >   0:        353          0   IO-APIC-edge      timer
> > LOC:  546361653   33156517   Local timer interrupts
> 
> Huh. So that's a diff of:
> LOCdiff   55808        795  
> 
> So that's 55 seconds worth of ticks on cpu0 and not one on cpu1. So yea,
> something seems off with your timer interrupts.

On the still-wedged system, if I use 'tsc' as my clocksource (and the
time flows "normally", I still see the same kind of diff (same order
of magnitude).
 
> > > Could you also try booting with noapic to see if that changes anything?
> > 
> > Sure.  This will mean I will lose the "wedged" system.  Is there
> > anything else that needs to be checked on it before I lose the broken
> > state?
> > Also keep in mind that the symptoms take a while to manifest
> > themselves (a few days typically).
 
> I can't think of anything right off. But maybe we should give some
> others a chance to look.
> 
> I would like to see the same /proc/interrupt data when the system is
> properly functioning as well. So whenever you do reboot, that would be
> interesting to me.

So I just rebooted.

Now I see:

Wed Jul  9 15:47:59 PDT 2008: LOC:    2050354    2050438   Local timer
interrupts
Wed Jul  9 15:48:09 PDT 2008: LOC:    2060368    2060452   Local timer
interrupts

So about 10000 timer interrupts for 10 seconds, which sounds good with
HZ=1000.

I've rebooted without noapic, and I will monitor and log these numbers
and see how it goes.

I'm not sure noapic could help here as obviously the interrupts are
routed correctly, at least initially.

Phil.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ