lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 13 Apr 2012 10:04:51 +0200
From:	Jiri Slaby <jslaby@...e.cz>
To:	Jiri Slaby <jirislaby@...il.com>
CC:	Michael Neuling <mikey@...ling.org>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	ppc-dev <linuxppc-dev@...ts.ozlabs.org>,
	linux-next@...r.kernel.org
Subject: Re: linux-next: boot failures with next-20120411

On 04/13/2012 10:02 AM, Jiri Slaby wrote:
> On 04/13/2012 04:30 AM, Michael Neuling wrote:
>> Stephen Rothwell <sfr@...b.auug.org.au> wrote:
>>
>>> Hi all,
>>>
>>> Some (not all) of my PowerPC boot tests have failed like this after
>>> getting into user mode (this one was just after udev started, but others
>>> are after other processes getting going):
>>>
>>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
>>> Faulting instruction address: 0xc0000000001b7f40
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> SMP NR_CPUS=32 NUMA pSeries
>>> Modules linked in: ehea
>>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
>>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
>>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
>>> SOFTE: 1
>>> CFAR: 000000000000562c
>>> DAR: 00c0000003f9d550, DSISR: 40000000
>>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
>>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
>>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
>>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
>>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
>>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
>>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
>>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
>>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
>>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
>>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
>>> Call Trace:
>>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
>>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
>>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
>>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
>>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
>>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
>>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
>>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
>>> Instruction dump:
>>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
>>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
>>> ---[ end trace 366fe6c7ced3bfb0 ]---
>>>
>>> This did not happen yesterday.  Just wondering if anyone can think of
>>> anything obvious.  Full console log at
>>> http://ozlabs.org/~sfr/next-20120411.log.bz2
>>
>> I managed to bisect this down using pseries_defconfig with next-20120412
>> to this patch:
>>
>>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>>   Author: Jiri Slaby <jslaby@...e.cz>
>>   Date:   Mon Apr 2 13:54:22 2012 +0200
>>
>>       TTY: HVC, use tty from tty_port
>>
>>       The driver already used refcounting. So we just switch it to tty_port
>>       helpers. And switch to tty_port->lock for tty.
>>
>>       Signed-off-by: Jiri Slaby <jslaby@...e.cz>
>>       Cc: linuxppc-dev@...ts.ozlabs.org
>>       Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>>
>> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
>> otherwise git barfs) fixes the problem on next-20120412.  
>>
>> I'm assuming we got the ref count changes wrong somewhere in the patch
>> but the tty code is beyond me.  Jiri, can you take a look?
> 
> Yeah, I see. I forgot to remove a couple of tty reference drops. The
> reference is dropped by tty_port_tty_set in open/close/hangup now. Does
> the attached patch help?

And the patch is incomplete. Now we have a leak. This one should work.

> thanks,
-- 
js
suse labs


View attachment "0001-HVC-fix-refcounting.patch" of type "text/x-patch" (1497 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ