linux-kernel - Re: chipidea: udc: kernel panic in isr_setup_status

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160824081102.GA27233@shlinux2>
Date:   Wed, 24 Aug 2016 16:11:02 +0800
From:   Peter Chen <hzpeterchen@...il.com>
To:     Clemens Gruber <clemens.gruber@...ruber.com>
Cc:     linux-usb@...r.kernel.org, Peter Chen <Peter.Chen@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org
Subject: Re: chipidea: udc: kernel panic in isr_setup_status_phase

On Tue, Aug 23, 2016 at 02:36:30AM +0200, Clemens Gruber wrote:
> Hi,
> 
> I am using an i.MX6Q embedded board, acting as a (ethernet) gadget with
> RNDIS function, connected over an USB OTG cable to a PC.
> Most of the time it works fine, but in some mysterious circumstances,
> a kernel panic occurs, just after attaching the OTG cable, connecting it
> to the other machine:
> 
> [   54.012989] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> [   54.021099] pgd = 80004000
> [   54.023816] [00000020] *pgd=00000000
> [   54.027422] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
> [   54.032915] Modules linked in:
> [   54.035998] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3-00017-g336bc4a #315
> [   54.043662] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [   54.050196] task: 80b05f80 task.stack: 80b00000
> [   54.054744] PC is at isr_setup_status_phase+0x1c/0x40
> [   54.059805] LR is at 0xbe570890
> [   54.062957] pc : [<804ac464>]    lr : [<be570890>]    psr: 200e0193
> [   54.062957] sp : 80b01e10  ip : be570570  fp : be570890
> [   54.074442] r10: be5eeebc  r9 : be570010  r8 : be5eeebc
> [   54.079673] r7 : be5708d0  r6 : be5eee80  r5 : be7fcf40  r4 : 00000001
> [   54.086206] r3 : be571010  r2 : 804ab368  r1 : 00000000  r0 : be570010
> [   54.092742] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [   54.099972] Control: 10c5387d  Table: 4e34404a  DAC: 00000051
> [   54.105723] Process swapper/0 (pid: 0, stack limit = 0x80b00210)
> (snip)
> [   54.247100] [<804ac464>] (isr_setup_status_phase) from [<804acbbc>] (isr_tr_complete_handler+0x734/0x98c)
> [   54.256680] [<804acbbc>] (isr_tr_complete_handler) from [<804acfc0>] (udc_irq+0x1ac/0x318)
> [   54.264964] [<804acfc0>] (udc_irq) from [<8018ba28>] (__handle_irq_event_percpu+0x9c/0x128)
> [   54.273330] [<8018ba28>] (__handle_irq_event_percpu) from [<8018bae0>] (handle_irq_event_percpu+0x2c/0x7c)
> [   54.282995] [<8018bae0>] (handle_irq_event_percpu) from [<8018bb68>] (handle_irq_event+0x38/0x5c)
> [   54.291880] [<8018bb68>] (handle_irq_event) from [<8018f2cc>] (handle_fasteoi_irq+0xd0/0x1bc)
> [   54.300418] [<8018f2cc>] (handle_fasteoi_irq) from [<8018afb0>] (generic_handle_irq+0x24/0x34)
> [   54.309042] [<8018afb0>] (generic_handle_irq) from [<8018b2dc>] (__handle_domain_irq+0x7c/0xec)
> [   54.317754] [<8018b2dc>] (__handle_domain_irq) from [<80101524>] (gic_handle_irq+0x38/0x74)
> [   54.326119] [<80101524>] (gic_handle_irq) from [<8010ccb0>] (__irq_svc+0x70/0xb0)
> (snip)
> 
> After looking through the isr_setup_status_phase disassembly, I found
> that ci->status must have been NULL and dereferencing it in
> ci->status->context = ci; triggered the panic.
> 
> The interrupt was a USBINT (UI bit was set) and isr_tr_complete_handler
> was called from udc_irq.
> In the IMX6DQRM I read about the UI bit: "This bit is also set by the
> Host/Device Controller when a short packet is detected." and about
> USBERRINT / UEI bit: "This bit is set along with the USBINT bit, if the
> TD on which the error interrupt occurred also had its interrupt on
> complete (IOC) bit set." (page 5494)
> 
> However, we do not check for UEI in udc_irq.
> Could this be the cause of this error?

UEI is an error interrupt, and software have not handled it, so it will
not affect ci->status.

> Should we only call isr_tr_complete_handler if UI && !UEI ?
> 
> Or would adding a check for ci->status == NULL in isr_setup-status_phase
> and returning an error code also be a good idea?

I agree with that.

> 
> Do you have an idea what's going on there and why ci->status is NULL?
> 

I can't understand it, the only possible is the last disconnect event
(see ci_udc_vbus_session->_gadget_stop_activity) has scheduled very late
due to vbus lowers very slow.

-- 

Best Regards,
Peter Chen