lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 14 Apr 2009 08:59:01 -0600
From:	Bjorn Helgaas <bjorn.helgaas@...com>
To:	Alan Jenkins <alan-jenkins@...fmail.co.uk>
Cc:	linux-acpi@...r.kernel.org,
	"linux-kernel" <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: [BISECTED] EEE PC hangs when booting off battery

On Tuesday 14 April 2009 03:26:13 am Alan Jenkins wrote:
> Alan Jenkins wrote:
> > Bjorn Helgaas wrote:
> >> On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote:
> >>> Bjorn Helgaas wrote:
> >>>> On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote:
> >>>>   
> >>>> You mention that this occurs when booting off battery.  So I
> >>>> assume everything works fine when the EEE is plugged in to the
> >>>> wall socket?
> >>>>         
> >>> When I tested it before, that was what I found.
> >>>
> >>> However, I now find that's not quite right.  It only works (i.e. doesn't
> >>> hang) if I remove the battery as well as plugging it into the wall.  If
> >>> I have the battery in, it hangs.
> >
> > ... and right now, I can only reproduce it by booting with it plugged
> > into the wall and the battery present.  If I unplug it from the wall, it
> > boots fine.
> >
> > It must be affected by something else as well, maybe battery level or
> > charging / discharging status.
> >   
> >>>>>>>> Magic SysRQ keys work though.  ...
> >>>>>> I was able to run SysRq-P, and found the following backtrace -
> >>>>>>
> >>>>>> Pid: 0
> >>>>>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor]
> >>>>>>             
> >>>> Can you figure out where this is in acpi_idle_enter_bm() or
> >>>> maybe just email me your processor.ko module?
> >>>>
> >>>> Does it always happen at the same point?
> >>>>         
> >>> Yes, it always happens at the same point.
> >>>
> >>> It turns out I can read the runes, but I don't understand what they're
> >>> saying :-).
> >>>       
> >> I'm not much good with x86 assembly either :-)
> >>
> >> I think that in both cases below, you're right after enabling
> >> interrupts and about to exit the idle routine.  My guess is the
> >> system is not really hung; it just doesn't think it has anything
> >> to do and is spending all its time in the idle loop.
> >>     
> >>> 00001bd0 <acpi_idle_enter_bm>:
> >>> ...
> >>> 00001bd0 + 0x1df = 00001daf
> >>> ...
> >>>     1d70:       b8 03 00 00 00          mov    $0x3,%eax
> >>>     1d75:       e8 90 f3 ff ff          call   110a <tsc_halts_in_c>
> >>>     1d7a:       85 c0                   test   %eax,%eax
> >>>     1d7c:       74 0a                   je     1d88 <acpi_idle_enter_bm+0x1b8>
> >>>     1d7e:       b8 0e 09 00 00          mov    $0x90e,%eax
> >>>                         1d7f: R_386_32  .rodata.str1.1
> >>>     1d83:       e8 fc ff ff ff          call   1d84 <acpi_idle_enter_bm+0x1b4>
> >>>                         1d84: R_386_PC32        mark_tsc_unstable
> >>>     1d88:       8b 45 e8                mov    -0x18(%ebp),%eax
> >>>     1d8b:       8b 55 ec                mov    -0x14(%ebp),%edx
> >>>     1d8e:       e8 ab fd ff ff          call   1b3e <us_to_pm_timer_ticks>
> >>>     1d93:       89 c3                   mov    %eax,%ebx
> >>>     1d95:       b8 17 01 00 00          mov    $0x117,%eax
> >>>     1d9a:       69 ca 17 01 00 00       imul   $0x117,%edx,%ecx
> >>>     1da0:       89 d6                   mov    %edx,%esi
> >>>     1da2:       f7 e3                   mul    %ebx
> >>>     1da4:       8d 14 11                lea    (%ecx,%edx,1),%edx
> >>>     1da7:       e8 fc ff ff ff          call   1da8 <acpi_idle_enter_bm+0x1d8>
> >>>                         1da8: R_386_PC32        sched_clock_idle_wakeup_event
> >>>     1dac:       fb                      sti
> >>>     1dad:       89 e0                   mov    %esp,%eax
> >>> ->  1daf:       31 c9                   xor    %ecx,%ecx              <---------
> >>>     1db1:       25 00 e0 ff ff          and    $0xffffe000,%eax
> >>>     1db6:       89 fa                   mov    %edi,%edx
> >>>     1db8:       83 48 0c 04             orl    $0x4,0xc(%eax)
> >>>     1dbc:       ff 47 18                incl   0x18(%edi)
> >>>     1dbf:       8b 45 e4                mov    -0x1c(%ebp),%eax
> >>>     1dc2:       e8 a4 f5 ff ff          call   136b <acpi_state_timer_broadcast>
> >>>     1dc7:       01 5f 1c                add    %ebx,0x1c(%edi)
> >>>     1dca:       11 77 20                adc    %esi,0x20(%edi)
> >>>     1dcd:       8b 45 e8                mov    -0x18(%ebp),%eax
> >>>     1dd0:       83 c4 10                add    $0x10,%esp
> >>>     1dd3:       5b                      pop    %ebx
> >>>     1dd4:       5e                      pop    %esi
> >>>     1dd5:       5f                      pop    %edi
> >>>     1dd6:       5d                      pop    %ebp
> >>>     1dd7:       c3                      ret
> >>>
> >>>> If you blacklist or rename the processor module to prevent it
> >>>> from loading, does that keep the hang from occurring?
> >>>>         
> >>> No.  In that case I get the hang in default_idle+0x59/0x95
> >>>
> >>> 0000007a <default_idle>:
> >>>   7a:   55                      push   %ebp
> >>>   7b:   89 e5                   mov    %esp,%ebp
> >>>   7d:   56                      push   %esi
> >>>   7e:   53                      push   %ebx
> >>>   7f:   83 ec 18                sub    $0x18,%esp
> >>>   82:   83 3d 18 00 00 00 00    cmpl   $0x0,0x18
> >>>                         84: R_386_32    .bss
> >>>   89:   75 7a                   jne    105 <default_idle+0x8b>
> >>>   8b:   80 3d 05 00 00 00 00    cmpb   $0x0,0x5
> >>>                         8d: R_386_32    boot_cpu_data
> >>>   92:   74 71                   je     105 <default_idle+0x8b>
> >>>   94:   83 3d 04 00 00 00 00    cmpl   $0x0,0x4
> >>>                         96: R_386_32    __tracepoint_power_start
> >>>   9b:   74 23                   je     c0 <default_idle+0x46>
> >>>   9d:   8b 1d 08 00 00 00       mov    0x8,%ebx
> >>>                         9f: R_386_32    __tracepoint_power_start
> >>>   a3:   85 db                   test   %ebx,%ebx
> >>>   a5:   74 19                   je     c0 <default_idle+0x46>
> >>>   a7:   8d 75 e0                lea    -0x20(%ebp),%esi
> >>>   aa:   b9 01 00 00 00          mov    $0x1,%ecx
> >>>   af:   ba 01 00 00 00          mov    $0x1,%edx
> >>>   b4:   89 f0                   mov    %esi,%eax
> >>>   b6:   ff 13                   call   *(%ebx)
> >>>   b8:   83 c3 04                add    $0x4,%ebx
> >>>   bb:   83 3b 00                cmpl   $0x0,(%ebx)
> >>>   be:   75 ea                   jne    aa <default_idle+0x30>
> >>>   c0:   89 e0                   mov    %esp,%eax
> >>>   c2:   25 00 e0 ff ff          and    $0xffffe000,%eax
> >>>   c7:   83 60 0c fb             andl   $0xfffffffb,0xc(%eax)
> >>>   cb:   f6 40 08 08             testb  $0x8,0x8(%eax)
> >>>   cf:   75 04                   jne    d5 <default_idle+0x5b>
> >>>   d1:   fb                      sti
> >>>   d2:   f4                      hlt
> >>> -->  d3:   eb 01                   jmp    d6 <default_idle+0x5c>    <--------
> >>>   d5:   fb                      sti
> >>>   d6:   89 e0                   mov    %esp,%eax
> >>>   d8:   25 00 e0 ff ff          and    $0xffffe000,%eax
> >>>   dd:   83 48 0c 04             orl    $0x4,0xc(%eax)
> >>>   e1:   83 3d 04 00 00 00 00    cmpl   $0x0,0x4
> >>>                         e3: R_386_32    __tracepoint_power_end
> >>>   e8:   74 1e                   je     108 <default_idle+0x8e>
> >>>
> >>>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit
> >>>>> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f
> >>>>> Author: Bjorn Helgaas <bjorn.helgaas@...com>
> >>>>> Date:   Mon Mar 30 17:48:24 2009 +0000
> >>>>>           
> >>>> Ouch, sorry about that.  Thanks for doing all the bisection work.
> >>>>   
> >>>>>     ACPI: processor: use .notify method instead of installing handler
> >>>>> directly
> >>>>>
> >>>>>     This patch adds a .notify() method.  The presence of .notify() causes
> >>>>>     Linux/ACPI to manage event handlers and notify handlers on our behalf,
> >>>>>     so we don't have to install and remove them ourselves.
> >>>>>
> >>>>>     Signed-off-by: Bjorn Helgaas <bjorn.helgaas@...com>
> >>>>>     CC: Zhang Rui <rui.zhang@...el.com>
> >>>>>     CC: Zhao Yakui <yakui.zhao@...el.com>
> >>>>>     CC: Venki Pallipadi <venkatesh.pallipadi@...el.com>
> >>>>>     CC: Anil S Keshavamurthy <anil.s.keshavamurthy@...el.com>
> >>>>>     Signed-off-by: Len Brown <len.brown@...el.com>
> >>>>>
> >>>>> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang.
> >>>>>           
> >>>> I don't see the problem in that commit yet, and if there is a problem
> >>>> with it, I would think that reverting it from 2.6.30-rc1 would solve
> >>>> it.  But maybe it'd be useful to revert the whole .notify series to
> >>>> make sure.  From 2.6.30-rc1, you should be able to revert these:
> >>>>
> >>>>   7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor
> >>>>   373cfc360ec773be2f7615e59a19f3313255db7c button
> >>>>   46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure
> >>>>
> >>>> What happens with those commits reverted?
> >>>>         
> >>> I'll find out tomorrow.
> >>>       
> >> The fact that it still hangs even when you don't load the processor
> >> driver at all suggests that the 7ec0a729079 commit identified by the
> >> bisection is not the real problem.  That commit only touches
> >> drivers/acpi/processor_core.c.
> >
> > Yah.
> >   
> >> I think it's more likely some kind of race or missed wakeup.
> >>
> >> Since it seems to be sensitive to whether the battery is present,
> >> I guess you could try blacklisting the battery.ko driver.  There
> >> have been a few changes to it since 2.6.29-rc8.  If things work
> >> without battery.ko, we can look through those changes.
> >
> > Good guess :-).  I tried a couple of times either way, and blacklisting
> > "battery" definitely avoids the hang.
> 
> Ok, I tried reverting
> 
> 0f66af530116e9f4dd97f328d91718b56a6fc5a4 "ACPI: battery: asynchronous init"
> 
> and that fixed it.

I can't help with the real problem of why the asynchronous battery
init causes the hang.

But I do object to the magic makefile ordering change in that commit.
Nobody reading the makefile can tell why battery is down at the end,
and moving it apparently slows down boot significantly.  So the
ordering change just feels like a band-aid that covers up a place
where ACPI could be improved.

I don't see anything unusual in what the battery init is doing, so
it's probably just some ACPI methods that take a long time to
execute.  Other drivers could easily have similar problems.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ