linux-kernel - RE: PROBLEM: kernel oops with g

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <001c01c9f3cd$ad02b1d0$07081570$%szyprowski@samsung.com>
Date:	Tue, 23 Jun 2009 08:41:40 +0200
From:	Marek Szyprowski <m.szyprowski@...sung.com>
To:	'David Brownell' <david-b@...bell.net>
Cc:	'Alan Stern' <stern@...land.harvard.edu>,
	'Peter Korsgaard' <jacmet@...site.dk>,
	'USB list' <linux-usb@...r.kernel.org>,
	'Kernel development list' <linux-kernel@...r.kernel.org>,
	kyungmin.park@...sung.com,
	'Marek Szyprowski' <m.szyprowski@...sung.com>
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Tuesday, June 23, 2009 5:27 AM, David Brownell wrote:

> > > This is just a guess...  But there's a good possibility that the
> oops
> > > was caused by recent changes to the serial layer which have not
> been
> > > propagated through to the g_serial driver.
> >
> > How recent these changes are? I did a test on another ARM-based Linux
> > platform with old 2.6.28 kernel and the result was exactly the same
> as
> > above...
> 
> Just for the record, the reworked g_serial code merged in 2.6.27
> and was mostly developed on 2.6.25 and 2.6.26 ... and it included
> a lot of stress testing.  No such mutex_lock() in_irq() bug showed
> up at that time.  And that was running with all practical kernel
> debug options, so it should have showed up if it were that easy.
> 
> I do however recall turning up several regressions in how "sparse"
> lock checking behaved.  As in, it broke when faced with common
> idioms like needing to temporarily drop a lock deep in a call stack.
> 
> Now, the serial layer has been getting a *LONG* overdue incremental
> overhaul since before that started.  So there's been plenty of time
> for incompatible changes to sneak in; I believe Alan Cox focuses on
> host side things, out of defensive necessity.
> 
> Like, oh, changing a spinlock to a mutex.   You might change the
> low_latency setting and review how that's now supposed to behave.

In in gs_open() function in drivers/usb/gadget/u_serial.c, after the 
line 780 I've added:

tty->low_latency = 1;

This helped a bit, but after a few serial transfers I got the same 
bug again.

With mutex debugging and preemptive kernel turned off I got the
following crash:

# ls >/dev/ttyGS0
[   80.070000] BUG: scheduling while atomic: ls/497/0x00000100
[   80.070000] Modules linked in:
[   80.070000]
[   80.070000] Pid: 497, comm:                   ls
[   80.070000] CPU: 0    Not tainted  (2.6.30 #340)
[   80.080000] PC is at n_tty_write+0x1fc/0x39c
[   80.080000] LR is at gs_write_room+0x38/0x54
[   80.090000] pc : [<c013d7d4>]    lr : [<c0184fb0>]    psr: 60000013
[   80.090000] sp : c6573ee0  ip : 00000020  fp : c6505400
[   80.100000] r10: c6505380  r9 : c6f13f00  r8 : c6572000
[   80.100000] r7 : 0000005a  r6 : c6505400  r5 : c6505000  r4 : 00001fff
[   80.110000] r3 : 000000a0  r2 : 00000015  r1 : c6e6eb40  r0 : 0000005a
[   80.120000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment
user
[   80.120000] Control: 00c5387d  Table: 56580008  DAC: 00000015

With preemptive kernel and mutex debuging:

[   26.800000] BUG: sleeping function called from invalid context at
kernel/mutex.c:280
[   26.800000] in_atomic(): 1, irqs_disabled(): 0, pid: 500, name: ls
[   26.810000] no locks held by ls/500.
[   26.810000] [<c0025c30>] (unwind_backtrace+0x0/0xdc) from [<c0272570>]
(mutex_lock_nested+0x28/0x304)
[   26.820000] [<c0272570>] (mutex_lock_nested+0x28/0x304) from [<c0153b0c>]
(echo_set_canon_col+0x20/0x48)
[   26.830000] [<c0153b0c>] (echo_set_canon_col+0x20/0x48) from [<c0155b2c>]
(n_tty_receive_buf+0xbf8/0xecc)
[   26.840000] [<c0155b2c>] (n_tty_receive_buf+0xbf8/0xecc) from
[<c0158054>] (flush_to_ldisc+0x104/0x1b0)
[   26.850000] [<c0158054>] (flush_to_ldisc+0x104/0x1b0) from [<c019e0b8>]
(gs_rx_push+0x118/0x1cc)
[   26.860000] [<c019e0b8>] (gs_rx_push+0x118/0x1cc) from [<c003d794>]
(tasklet_action+0x78/0xc8)
[   26.870000] [<c003d794>] (tasklet_action+0x78/0xc8) from [<c003dc28>]
(__do_softirq+0x6c/0xf4)
[   26.880000] [<c003dc28>] (__do_softirq+0x6c/0xf4) from [<c003dcf4>]
(irq_exit+0x44/0x5c)
[   26.890000] [<c003dcf4>] (irq_exit+0x44/0x5c) from [<c001f054>]
(_text+0x54/0x6c)
[   26.890000] [<c001f054>] (_text+0x54/0x6c) from [<c001fc04>]
(__irq_usr+0x44/0xa0)
[   26.900000] Exception stack(0xc65a9fb0 to 0xc65a9ff8)
[   26.910000] 9fa0:                                     4022e68c 000007b2
40215e98 00000001
[   26.910000] 9fc0: 4022e68c 4022fef8 4022d000 00002f28 00000001 00001688
beb16a34 00000028
[   26.920000] 9fe0: 4022d000 beb1693c 401904e0 4018f620 20000010 ffffffff
[   26.930000] ------------[ cut here ]------------
[   26.930000] WARNING: at kernel/mutex.c:207 mutex_lock_nested+0xa4/0x304()
[   26.940000] Modules linked in:
[   26.940000] [<c0025c30>] (unwind_backtrace+0x0/0xdc) from [<c0038e4c>]
(warn_slowpath_common+0x4c/0x80)
[   26.950000] [<c0038e4c>] (warn_slowpath_common+0x4c/0x80) from
[<c02725ec>] (mutex_lock_nested+0xa4/0x304)
[   26.960000] [<c02725ec>] (mutex_lock_nested+0xa4/0x304) from [<c0153b0c>]
(echo_set_canon_col+0x20/0x48)
[   26.970000] [<c0153b0c>] (echo_set_canon_col+0x20/0x48) from [<c0155b2c>]
(n_tty_receive_buf+0xbf8/0xecc)
[   26.980000] [<c0155b2c>] (n_tty_receive_buf+0xbf8/0xecc) from
[<c0158054>] (flush_to_ldisc+0x104/0x1b0)
[   26.990000] [<c0158054>] (flush_to_ldisc+0x104/0x1b0) from [<c019e0b8>]
(gs_rx_push+0x118/0x1cc)
[   27.000000] [<c019e0b8>] (gs_rx_push+0x118/0x1cc) from [<c003d794>]
(tasklet_action+0x78/0xc8)
[   27.010000] [<c003d794>] (tasklet_action+0x78/0xc8) from [<c003dc28>]
(__do_softirq+0x6c/0xf4)
[   27.020000] [<c003dc28>] (__do_softirq+0x6c/0xf4) from [<c003dcf4>]
(irq_exit+0x44/0x5c)
[   27.030000] [<c003dcf4>] (irq_exit+0x44/0x5c) from [<c001f054>]
(_text+0x54/0x6c)
[   27.030000] [<c001f054>] (_text+0x54/0x6c) from [<c001fc04>]
(__irq_usr+0x44/0xa0)
[   27.040000] Exception stack(0xc65a9fb0 to 0xc65a9ff8)
[   27.050000] 9fa0:                                     4022e68c 000007b2
40215e98 00000001
[   27.050000] 9fc0: 4022e68c 4022fef8 4022d000 00002f28 00000001 00001688
beb16a34 00000028
[   27.060000] 9fe0: 4022d000 beb1693c 401904e0 4018f620 20000010 ffffffff
[   27.070000] ---[ end trace 675aa93c2a01b9b1 ]---

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/