[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200903121805.48041.elendil@planet.nl>
Date: Thu, 12 Mar 2009 18:05:46 +0100
From: Frans Pop <elendil@...net.nl>
To: john stultz <johnstul@...ibm.com>
Cc: linux-s390@...r.kernel.org, Roman Zippel <zippel@...ux-m68k.org>,
Thomas Gleixner <tglx@...utronix.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [BUG,2.6.28,s390] Fails to boot in Hercules S/390 emulator - hang traced
On Thursday 12 March 2009, john stultz wrote:
> On Wed, 2009-03-11 at 17:30 -0700, john stultz wrote:
> > On Wed, 2009-03-11 at 17:03 +0100, Frans Pop wrote:
> > > Even with the cast you're just papering over the issue that we're
> > > moving a negative value into a field that is defined as unsigned:
> > > include/linux/clocksource.h: u64 xtime_nsec;
> >
> > Probably agreed here, xtime_nsec probably should be converted to a
> > s64 as negative values are possible.
> >
> > However, Its unclear to me if my patch worked or not?
> > Did you try it alone?
Not exactly, but I did try a patch that was effectively the same (same
calculations, but using a few intermediate variables). It did not work.
> For a cleaner version, could you try the following, against 2.6.29-git
> with no other modification?
I've applied this patch against 28.7 and it makes no difference.
I'll try later against current mainline, but I don't expect it to work
there either.
BTW, with that patch there's are two other casts that can be removed:
- if (unlikely((s64)clock->xtime_nsec < 0)) {
- s64 neg = -(s64)clock->xtime_nsec;
+ if (unlikely(clock->xtime_nsec < 0)) {
+ s64 neg = -clock->xtime_nsec;
Some other observations:
* when the hang occurs we're definitely still using jiffies
* the hang is not in the clock accumulation loop (see below)
* changing the TIMERINT value does *not* make any difference, I tried a
few different values (incl. the default 50 and 300)
* changing the nr of emulated CPUs from 2 to 1 makes no difference
I have now been able to trace the hang (full log attached). Where I added
tracing printks should be fairly obvious, and see attachment.
No idea what to make of the result.
0.150337! init: calling smp_prepare_cpus
0.183775! CPUs: 2 configured, 0 standby
0.183947! s390_smp: smp_detect_cpus calling get_online_cpus
0.184162! s390_smp: smp_detect_cpus calling __smp_rescan_cpus
0.184381! s390_smp: smp_rescan_cpus_sigp starting loop
[!!! Hang is here !!!]
[... With 5cd1c9c5 and 6c9bacb4 reverted, the boot continues as follows ...]
[... Next two messages are from Hercules ...]
CPU0000: SIGP Set prefix (0D) CPU0001, PARM 0FEC5000: CC 0
CPU0000: SIGP Restart (06) CPU0001, PARM 00000000: CC 0
[... Time of previous message is similar for failed and good boot ...]
0.525049! s390_smp: smp_rescan_cpus_sigp loop done
0.525310! s390_smp: smp_detect_cpus calling put_online_cpus
0.525555! s390_smp: smp_detect_cpus done
0.526037! cpu 0 phys_idx=0 vers=00 ident=002623 machine=3090 unused=0000
0.526408! s390_smp: start loop smp_create_idle
0.531784! s390_smp: loop smp_create_idle done
The problem loop is in smp_rescan_cpus_sigp() from arch/s390/kernel/smp.c.
I tried adding printks inside the loop, but that resulted in the boot
also failing with the two patches reverted!
So it looks like that loop is somehow very sensitive to timing issues.
Note the relatively long delay between the start and end of the loop.
View attachment "herc_2.6.28.7_hang.trace" of type "text/plain" (5407 bytes)
Powered by blists - more mailing lists