netdev - Re: NULL pointer dereference panic in stable (2.6.33.2), amd64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <OF802D1BE7.C2709B61-ON65257703.00235EF5-65257703.002B5F38@in.ibm.com>
Date:	Mon, 12 Apr 2010 13:24:11 +0530
From:	Krishna Kumar2 <krkumar2@...ibm.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	Denys Fedorysychenko <nuclearcat@...learcat.com>
Subject: Re: NULL pointer dereference panic in stable (2.6.33.2), amd64

Hi Eric,

Thanks for your patch, just one question on it though:

Eric Dumazet <eric.dumazet@...il.com> wrote on 04/12/2010 11:31:51 AM:

> > When route changes, I think my patch had reset sk->sk_tx_queue_mapping
> > by calling sk_tx_queue_clear. I don't know if I missed any path where
> > the route changes and sk_dst_reset() was not called.
> >
>
> Problem is when you reset sk->sk_tx_queue_mapping at the very moment
> route (or destination) changes, we might have old packets queued in tx
> queues, of the old ethernet device (eth0 : multi queue compatable)

> 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache
> is rebuild, it points to a dst_entry referring a new device (eth1 : non
> multiqueue)
>
> 3) When one old packet finally is transmitted, we do :
>
>    queue_index = 1; // any value > 0
>
>    if (sk && sk->sk_dst_cache)
>       sk_tx_queue_set(sk, queue_index); // remember a >0 value
>
> 4) application does a sendmsg(), enqueues a new skb on eth1
>
> 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid.
>    we pick a non existent txq for eth1 device.
>
> 6) We crash.
>
> > The following might be better to prove the panic is due to this, since
> > your suggestion will hide a panic that happens somewhat rare (according
> > to Denys):
> >
> >       if (sk_tx_queue_recorded(sk)) {
> >             queue_index = sk_tx_queue_get(sk);
> > +           queue_index = dev_cap_txqueue(dev, queue_index);
> >       } else {
> >
>
> Sure, but I thought I was clear enough to prove this commit was wrong,
> and we have to find a fix.

If the dst got changed between call to vlan_dev_hwaccel_hard_start_xmit
and it's call to dev_queue_xmit, that change to dst should have reset
sk_tx_queue_mapping to -1 by calling sk_tx_queue_clear (assuming that I
have changed in all paths, eg __sk_dst_reset), and thus result in a new
mapping in dev_pick_tx. Would the patch hide the actual bug where we do
not clear sk_tx_queue_mapping, eg __sk_dst_set does it? I agree the
patch will fix the panic, but this check could be removed if the code
which changes the dst is fixed to clear the mapping. I could check that
if you think this assumption is correct.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html