netdev - Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e6d1cecd0910190619t3e009e1by49cc8f7307eb7cdb@mail.gmail.com>
Date:	Mon, 19 Oct 2009 08:19:23 -0500
From:	Michal Ostrowski <mostrows@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Denys Fedoryschenko <denys@...p.net.lb>,
	netdev <netdev@...r.kernel.org>, linux-ppp@...r.kernel.org,
	paulus@...ba.org, mostrows@...thlink.net,
	Cyrill Gorcunov <gorcunov@...il.com>
Subject: Re: kernel panic in latest vanilla stable, while using nameif with 
	"alive" pppoe interfaces

The entire scheme for managing net namespaces seems unsafe.  We depend
on synchronization via pn->hash_lock, but have no guarantee of the
existence of the "net" object -- hence no way to ensure the existence
of the lock itself.  This should be relatively easy to fix though as
we should be able to get/put the net namespace as we add remove
objects to/from the pppoe hash.

Once you solve this existence issue, the flush_lock can be eliminated
altogether since all of the relevant code paths already depend on a
write_lock_bh(&pn->hash_lock), and that's the lock that should be use
to protect the pppoe_dev field.

Another patch to follow later...

--
Michal Ostrowski
mostrows@...il.com



On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Michal Ostrowski a écrit :
>> Here's my theory on this after an inital look...
>>
>> Looking at the oops report and disassembly of the actual module binary
>> that caused the oops, one can deduce that:
>>
>> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
>> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
>> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
>>
>> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
>> of 0xffffffff in pn->hash_table[i].
>>
>>>>From this I'd hypothesize that the combination of dev_put() and release_sock()
>> may have allowed us to free "pn".  At the bottom of the loop we alreayd
>> recognize that since locks are dropped we're responsible for handling
>> invalidation of objects, and perhaps that should be extended to "pn" as well.
>> --
>> Michal Ostrowski
>> mostrows@...il.com
>>
>>
>
> Looking at this stuff, I do believe flush_lock protection is not
> properly done.
>
> At the end of pppoe_connect() for example we can find :
>
> err_put:
>        if (po->pppoe_dev) {
>                dev_put(po->pppoe_dev);
>                po->pppoe_dev = NULL;
>        }
>
> This is done without any protection, and can therefore clash with
> pppoe_flush_dev() :
>
>        spin_lock(&flush_lock);
>        po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */
>        spin_unlock(&flush_lock);
>
>        dev_put(dev);    /* oops */
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html