linux-kernel - Re: [PATCH v2] usb: xhci: Check endpoint is valid before dereferencing it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <379b395f-b65c-96fe-7ecc-f18e3740b990@linux.intel.com>
Date:   Thu, 22 Dec 2022 12:52:33 +0200
From:   Mathias Nyman <mathias.nyman@...ux.intel.com>
To:     Jimmy Hu <hhhuuu@...gle.com>, mathias.nyman@...el.com,
        gregkh@...uxfoundation.org
Cc:     badhri@...gle.com, linux-usb@...r.kernel.org,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2] usb: xhci: Check endpoint is valid before
 dereferencing it

On 22.12.2022 9.29, Jimmy Hu wrote:
> When the host controller is not responding, all URBs queued to all
> endpoints need to be killed. This can cause a kernel panic if we
> dereference an invalid endpoint.
> 
> Fix this by using xhci_get_virt_ep() helper to find the endpoint and
> checking if the endpoint is valid before dereferencing it.
> 
> [233311.853271] xhci-hcd xhci-hcd.1.auto: xHCI host controller not responding, assume dead
> [233311.853393] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
> 
> [233311.853964] pc : xhci_hc_died+0x10c/0x270
> [233311.853971] lr : xhci_hc_died+0x1ac/0x270
> 
> [233311.854077] Call trace:
> [233311.854085]  xhci_hc_died+0x10c/0x270
> [233311.854093]  xhci_stop_endpoint_command_watchdog+0x100/0x1a4
> [233311.854105]  call_timer_fn+0x50/0x2d4
> [233311.854112]  expire_timers+0xac/0x2e4
> [233311.854118]  run_timer_softirq+0x300/0xabc
> [233311.854127]  __do_softirq+0x148/0x528
> [233311.854135]  irq_exit+0x194/0x1a8
> [233311.854143]  __handle_domain_irq+0x164/0x1d0
> [233311.854149]  gic_handle_irq.22273+0x10c/0x188
> [233311.854156]  el1_irq+0xfc/0x1a8
> [233311.854175]  lpm_cpuidle_enter+0x25c/0x418 [msm_pm]
> [233311.854185]  cpuidle_enter_state+0x1f0/0x764
> [233311.854194]  do_idle+0x594/0x6ac
> [233311.854201]  cpu_startup_entry+0x7c/0x80
> [233311.854209]  secondary_start_kernel+0x170/0x198
> 
> Fixes: 50e8725e7c42 ("xhci: Refactor command watchdog and fix split string.")
> Cc: stable@...r.kernel.org
> Signed-off-by: Jimmy Hu <hhhuuu@...gle.com>
> ---
>   drivers/usb/host/xhci-ring.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index ddc30037f9ce..f5b0e1ce22af 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1169,7 +1169,10 @@ static void xhci_kill_endpoint_urbs(struct xhci_hcd *xhci,
>   	struct xhci_virt_ep *ep;
>   	struct xhci_ring *ring;
>   
> -	ep = &xhci->devs[slot_id]->eps[ep_index];
> +	ep = xhci_get_virt_ep(xhci, slot_id, ep_index);
> +	if (!ep)
> +		return;
> +


This is a good generic change that should be added anyway, but but might not fix
the rootcause. It will reduce the risk of the race drastically.

We do check that xhci-devs[slot_id] exists once before calling xhci_kill_endpoint_urbs()
for the endpoints of that virt_dev in xhci_hc_died().
And xhci_hc_died() should always be calling with lock held (need to doublecheck this is true).

Older kernels used release and re-aquire the lock while giving back urbs for an endpoint,
so older kernels really need this change.

But newer kernels don't release the lock anymore when giving back URBs.

Looks like real issue is that we don't take the lock when we free the virt_dev.

-Mathias