netdev - Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200910191501.15009.denys@visp.net.lb>
Date:	Mon, 19 Oct 2009 15:01:14 +0300
From:	Denys Fedoryschenko <denys@...p.net.lb>
To:	Michal Ostrowski <mostrows@...il.com>
Cc:	netdev <netdev@...r.kernel.org>, linux-ppp@...r.kernel.org,
	paulus@...ba.org, mostrows@...thlink.net
Subject: Re: kernel panic in latest vanilla stable, while using nameif with  "alive" pppoe interfaces

Applied patch manually, still panic (maybe different now):

[   42.596904]
[   42.596904] Pid: 0, comm: swapper Not tainted (2.6.31.4-build-0047 #12)
[   42.596904] EIP: 0060:[<f886865e>] EFLAGS: 00010286 CPU: 0
[   42.596904] EIP is at pppoe_rcv+0x153/0x1be [pppoe]
[   42.596904] EAX: 00003100 EBX: ffffffff ECX: 00000002 EDX: f74007cb
[   42.596904] ESI: f79b4b00 EDI: f6332f00 EBP: c1806edc ESP: c1806eb8
[   42.596904]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   42.596904] Process swapper (pid: 0, ti=c1806000 task=c03f6de0 
task.ti=c03f1000)
[   42.596904] Stack:
[   42.596904]  f7445000
00000004
f6fc0028
31000030
f6fc0030
f6332f40
f79b4b00
c0418140
Oct 19 15:00:46 194.146.153.99
[   42.596904] <0>
f886a444
c1806f14
c029e702
f7445000
f79b4bb0
f79b4bb0
c0418160
f7445000
Oct 19 15:00:46 194.146.153.99
[   42.596904] <0>
00000000
00000001
64886f14
c02ac3ce
f79b4b00
00000000
f753105c
c1806f58
Oct 19 15:00:46 194.146.153.99
[   42.596904] Call Trace:
[   42.596904]  [<c029e702>] ? netif_receive_skb+0x43b/0x45a
[   42.596904]  [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[   42.596904]  [<f840f736>] ? rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[   42.596904]  [<f8412191>] ? rtl8169_poll+0x2f/0x1b2 [r8169]
[   42.596904]  [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[   42.596904]  [<c029ec95>] ? net_rx_action+0x93/0x177
[   42.596904]  [<c0131bdd>] ? __do_softirq+0xa7/0x144
[   42.596904]  [<c0131b36>] ? __do_softirq+0x0/0x144
[   42.596904]  <IRQ>
Oct 19 15:00:46 194.146.153.99
[   42.596904]  [<c0131957>] ? irq_exit+0x29/0x5c
[   42.596904]  [<c0104393>] ? do_IRQ+0x80/0x96
[   42.596904]  [<c0102f49>] ? common_interrupt+0x29/0x30
[   42.596904]  [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[   42.596904]  [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[   42.596904]  [<c0101bf0>] ? cpu_idle+0x44/0x60
[   42.596904]  [<c02ef8c7>] ? rest_init+0x53/0x55
[   42.596904]  [<c04197df>] ? start_kernel+0x2b9/0x2be
[   42.596904]  [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[   42.596904] Code:
53
0a
32
53
0b
31
c2
c1
e8
08
31
c2
eb
08
0f
b6
c2
c1
f8
04
31
c2
d1
e9
83
f9
04
74
f1
89
d0
83
e0
0f
8b
1c
87
eb
35
66
8b
45
ea
Oct 19 15:00:46 194.146.153.99
39
83
98
01
00
00
75
22
8b
55
e4
8d
83
9a
01
00
00
b9
06
00
Oct 19 15:00:46 194.146.153.99
[   42.596904] EIP: [<f886865e>]
pppoe_rcv+0x153/0x1be [pppoe]
SS:ESP 0068:c1806eb8
[   42.596904] CR2: 0000000000000197
[   42.606148] ---[ end trace b477ff4ee072d9b9 ]---
[   42.606209] Kernel panic - not syncing: Fatal exception in interrupt
[   42.606273] Pid: 0, comm: swapper Tainted: G      D    2.6.31.4-build-0047 
#12
[   42.606351] Call Trace:
[   42.606413]  [<c02fc496>] ? printk+0xf/0x11
[   42.606474]  [<c02fc3e7>] panic+0x39/0xd9
[   42.606535]  [<c01059b7>] oops_end+0x8b/0x9a
[   42.606597]  [<c0118f6d>] no_context+0x13d/0x147
[   42.606658]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
[   42.606722]  [<c0121dc1>] ? check_preempt_wakeup+0x34/0x141
[   42.606862]  [<c01294bb>] ? try_to_wake_up+0x1aa/0x1b4
[   42.606930]  [<c0209541>] ? cpumask_next_and+0x26/0x37
[   42.607003]  [<c01256f5>] ? find_busiest_group+0x291/0x885
[   42.607067]  [<c019cf47>] ? pollwake+0x5a/0x63
[   42.607127]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
[   42.607189]  [<c0119301>] do_page_fault+0x114/0x26f
[   42.607251]  [<c01191ed>] ? do_page_fault+0x0/0x26f
[   42.607313]  [<c02fe506>] error_code+0x66/0x6c
[   42.607375]  [<c02900d8>] ? pcibios_set_master+0x89/0x8d
[   42.607436]  [<c01191ed>] ? do_page_fault+0x0/0x26f
[   42.607501]  [<f886865e>] ? pppoe_rcv+0x153/0x1be [pppoe]
[   42.607564]  [<c029e702>] netif_receive_skb+0x43b/0x45a
[   42.607625]  [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[   42.607691]  [<f840f736>] rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[   42.607759]  [<f8412191>] rtl8169_poll+0x2f/0x1b2 [r8169]
[   42.607824]  [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[   42.607886]  [<c029ec95>] net_rx_action+0x93/0x177
[   42.607948]  [<c0131bdd>] __do_softirq+0xa7/0x144
[   42.608022]  [<c0131b36>] ? __do_softirq+0x0/0x144
[   42.608082]  <IRQ>
[<c0131957>] ? irq_exit+0x29/0x5c
[   42.608183]  [<c0104393>] ? do_IRQ+0x80/0x96
[   42.608183]  [<c0104393>] ? do_IRQ+0x80/0x96
[   42.608245]  [<c0102f49>] ? common_interrupt+0x29/0x30
[   42.608307]  [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[   42.608369]  [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[   42.608431]  [<c0101bf0>] ? cpu_idle+0x44/0x60
[   42.608493]  [<c02ef8c7>] ? rest_init+0x53/0x55
[   42.608553]  [<c04197df>] ? start_kernel+0x2b9/0x2be
[   42.608616]  [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[   42.608682] Rebooting in 5 seconds..



On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote:
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax,
> 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer
> dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a
> value of 0xffffffff in pn->hash_table[i].
>
> From this I'd hypothesize that the combination of dev_put() and
> release_sock() may have allowed us to free "pn".  At the bottom of the loop
> we alreayd recognize that since locks are dropped we're responsible for
> handling invalidation of objects, and perhaps that should be extended to
> "pn" as well. --
> Michal Ostrowski
> mostrows@...il.com
>
>
> ---
>  drivers/net/pppoe.c |   86 ++++++++++++++++++++++++++----
> --------------------
>  1 files changed, 45 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
> index 7cbf6f9..720c4ea 100644
> --- a/drivers/net/pppoe.c
> +++ b/drivers/net/pppoe.c
> @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)
>
>        BUG_ON(dev == NULL);
>
> +restart:
>        pn = pppoe_pernet(dev_net(dev));
>        if (!pn) /* already freed */
>                return;
> @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
>        write_lock_bh(&pn->hash_lock);
>        for (i = 0; i < PPPOE_HASH_SIZE; i++) {
>                struct pppox_sock *po = pn->hash_table[i];
> +                struct sock *sk;
>
> -               while (po != NULL) {
> -                       struct sock *sk;
> -                       if (po->pppoe_dev != dev) {
> -                               po = po->next;
> -                               continue;
> -                       }
> -                       sk = sk_pppox(po);
> -                       spin_lock(&flush_lock);
> -                       po->pppoe_dev = NULL;
> -                       spin_unlock(&flush_lock);
> -                       dev_put(dev);
> -
> -                       /* We always grab the socket lock, followed by the
> -                        * hash_lock, in that order.  Since we should
> -                        * hold the sock lock while doing any unbinding,
> -                        * we need to release the lock we're holding.
> -                        * Hold a reference to the sock so it doesn't
> disappear -                        * as we're jumping between locks.
> -                        */
> +                while (po && po->pppoe_dev != dev) {
> +                        po = po->next;
> +                }
>
> -                       sock_hold(sk);
> +                if (po == NULL) {
> +                        continue;
> +                }
>
> -                       write_unlock_bh(&pn->hash_lock);
> -                       lock_sock(sk);
> +                sk = sk_pppox(po);
>
> -                       if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND))
> { -                               pppox_unbind_sock(sk);
> -                               sk->sk_state = PPPOX_ZOMBIE;
> -                               sk->sk_state_change(sk);
> -                       }
> +                spin_lock(&flush_lock);
> +                po->pppoe_dev = NULL;
> +                spin_unlock(&flush_lock);
>
> -                       release_sock(sk);
> -                       sock_put(sk);
> +                dev_put(dev);
>
> -                       /* Restart scan at the beginning of this hash
> chain. -                        * While the lock was dropped the chain
> contents may -                        * have changed.
> -                        */
> -                       write_lock_bh(&pn->hash_lock);
> -                       po = pn->hash_table[i];
> -               }
> +                /* We always grab the socket lock, followed by the
> +                 * hash_lock, in that order.  Since we should
> +                 * hold the sock lock while doing any unbinding,
> +                 * we need to release the lock we're holding.
> +                 * Hold a reference to the sock so it doesn't disappear
> +                 * as we're jumping between locks.
> +                 */
> +
> +                sock_hold(sk);
> +
> +                write_unlock_bh(&pn->hash_lock);
> +                lock_sock(sk);
> +
> +                if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
> +                        pppox_unbind_sock(sk);
> +                        sk->sk_state = PPPOX_ZOMBIE;
> +                        sk->sk_state_change(sk);
> +                }
> +
> +                release_sock(sk);
> +                sock_put(sk);
> +
> +                /* Restart the flush process from the beginning.  While
> +                 * the lock was dropped the chain contents may have
> +                 * changed, and sock_put may have made things go away.
> +                 */
> +                goto restart;
>        }
>        write_unlock_bh(&pn->hash_lock);
>  }
> --
> 1.6.3.3
>
> On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@...p.net.lb> 
wrote:
> > I have server running as pppoe NAS.
> > Tried to rename customers without dropping pppd connections first, got
> > panic after few seconds.
> > Panic triggerable at 2.6.30.4 and 2.6.31.4
> > pppoe users running on eth2
> > pppoe flags:
> >  1457 root     /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> > 172.16.1.2 -N 253 -C gpzone -S gpzone
> >
> >
> > Commands sequence that i think triggered that:
> >
> > ip link set eth0 down
> > ip link set eth1 down
> > ip link set eth2 down
> > nameif etherx 00:16:76:8D:83:BA
> > nameif eth0 00:19:e0:72:4a:37
> > nameif eth1 00:19:e0:72:4a:4b
> >
> > ip addr flush dev eth0
> > ip addr flush dev eth1
> > ip addr add X.X.X.X/29 dev eth0
> > ip addr add 192.168.2.177/24 dev eth0
> > ip addr add 192.168.0.1/32 dev eth1
> > ip addr add 127.0.0.0/8 dev lo
> > #ip link set eth0 up
> > ip link set eth0 up
> > ip link set eth1 up
> > ip link set lo up
> > ip route add 0.0.0.0/0 via X.X.X.X
> >
> >
> > [  103.428591] r8169: eth0: link up
> > [  103.430360] r8169: eth1: link up
> > [  113.361528] BUG: unable to handle kernel
> > NULL pointer dereference
> > at 0000018f
> > [  113.361717] IP:
> > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.361853] *pdpt = 000000003411a001
> > *pde = 0000000000000000
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362012] Oops: 0000 [#1]
> > SMP
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> > [  113.362246] Modules linked in:
> > netconsole
> > configfs
> > act_skbedit
> > sch_ingress
> > sch_prio
> > cls_flow
> > cls_u32
> > em_meta
> > cls_basic
> > xt_dscp
> > xt_DSCP
> > ipt_REJECT
> > ts_bm
> > xt_string
> > xt_hl
> > ifb
> > cls_fw
> > sch_tbf
> > sch_htb
> > act_ipt
> > act_mirred
> > xt_MARK
> > pppoe
> > pppox
> > ppp_generic
> > slhc
> > xt_TCPMSS
> > xt_mark
> > xt_tcpudp
> > iptable_mangle
> > iptable_nat
> > nf_nat
> > rtc_cmos
> > nf_conntrack_ipv4
> > rtc_core
> > nf_conntrack
> > rtc_lib
> > nf_defrag_ipv4
> > iptable_filter
> > ip_tables
> > x_tables
> > 8021q
> > garp
> > stp
> > llc
> > loop
> > sata_sil
> > pata_atiixp
> > pata_acpi
> > ata_generic
> > libata
> > 8139cp
> > usb_storage
> > mtdblock
> > mtd_blkdevs
> > mtd
> > sr_mod
> > cdrom
> > tulip
> > r8169
> > sky2
> > via_velocity
> > via_rhine
> > sis900
> > ne2k_pci
> > 8390
> > skge
> > tg3
> > libphy
> > 8139too
> > e1000
> > e100
> > usbhid
> > ohci_hcd
> > uhci_hcd
> > ehci_hcd
> > usbcore
> > nls_base
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344]
> > [  113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> > [  113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> > [  113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> > [  113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> > [  113.362344]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [  113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> > task.ti=f4145000)
> > [  113.362344] Stack:
> > [  113.362344]  f4fbe220
> > f4fbe000
> > f6cea5a0
> > f886a430
> > fffffff5
> > 00000000
> > f4145e54
> > c01422b3
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] <0>
> > f4fbe000
> > 00000009
> > f8a457d8
> > f4fbe000
> > f8850190
> > 00001091
> > f4145e64
> > c0142361
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] <0>
> > ffffffff
> > 00000000
> > f4145e74
> > c029ffbf
> > f4fbe000
> > 000010d0
> > f4145e90
> > c029fa70
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] Call Trace:
> > [  113.362344]  [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> > [  113.362344]  [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> > [  113.362344]  [<c029ffbf>] ? dev_close+0x4c/0x8c
> > [  113.362344]  [<c029fa70>] ? dev_change_flags+0xa5/0x158
> > [  113.362344]  [<c02da633>] ? devinet_ioctl+0x21a/0x503
> > [  113.362344]  [<c02db693>] ? inet_ioctl+0x8d/0xa6
> > [  113.362344]  [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> > [  113.362344]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [  113.362344]  [<c019af2b>] ? vfs_ioctl+0x22/0x69
> > [  113.362344]  [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> > [  113.362344]  [<c02934eb>] ? sys_send+0x18/0x1a
> > [  113.362344]  [<c011942f>] ? do_page_fault+0x242/0x26f
> > [  113.362344]  [<c019b49b>] ? sys_ioctl+0x2c/0x45
> > [  113.362344]  [<c0102975>] ? syscall_call+0x7/0xb
> > [  113.362344] Code:
> > c9
> > 00
> > 00
> > 00
> > 89
> > c7
> > 31
> > f6
> > 83
> > c7
> > 40
> > 89
> > f8
> > e8
> > cc
> > 60
> > a9
> > c7
> > 8b
> > 45
> > ec
> > 05
> > 20
> > 02
> > 00
> > 00
> > 89
> > 45
> > e8
> > 8b
> > 4d
> > f0
> > 8b
> > 1c
> > b1
> > e9
> > 8c
> > 00
> > 00
> > 00
> > 8b
> > 45
> > ec
> > Oct 18 23:59:40 194.146.153.93
> > 83
> > 90
> > 01
> > 00
> > 00
> > 74
> > 08
> > 8b
> > 9b
> > 8c
> > 01
> > 00
> > 00
> > eb
> > 79
> > b8
> > c0
> > a6
> > 86
> > f8
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] EIP: [<f8868269>]
> > pppoe_device_event+0x80/0x12c [pppoe]
> > SS:ESP 0068:f4145e1c
> > [  113.362344] CR2: 000000000000018f
> > [  113.373124] ---[ end trace f6fe64a307e97f3b ]---
> > [  113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> > [  113.373286] Pid: 2858, comm: pppd Tainted: G      D  
> >  2.6.31.4-build-0047 #7
> > [  113.373379] Call Trace:
> > [  113.373479]  [<c02fc496>] ? printk+0xf/0x11
> > [  113.373561]  [<c02fc3e7>] panic+0x39/0xd9
> > [  113.373656]  [<c01059b7>] oops_end+0x8b/0x9a
> > [  113.373727]  [<c0118f6d>] no_context+0x13d/0x147
> > [  113.373800]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> > [  113.373881]  [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> > [  113.373959]  [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> > [  113.374030]  [<c0131967>] ? irq_exit+0x39/0x5c
> > [  113.374107]  [<c0104393>] ? do_IRQ+0x80/0x96
> > [  113.374183]  [<c0102f49>] ? common_interrupt+0x29/0x30
> > [  113.374259]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
> > [  113.374348]  [<c0119301>] do_page_fault+0x114/0x26f
> > [  113.374526]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [  113.374605]  [<c02fe506>] error_code+0x66/0x6c
> > [  113.374683]  [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> > [  113.374764]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [  113.374850]  [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.374928]  [<c01422b3>] notifier_call_chain+0x2b/0x4a
> > [  113.375012]  [<c0142361>] raw_notifier_call_chain+0xc/0xe
> > [  113.375097]  [<c029ffbf>] dev_close+0x4c/0x8c
> > [  113.375169]  [<c029fa70>] dev_change_flags+0xa5/0x158
> > [  113.375239]  [<c02da633>] devinet_ioctl+0x21a/0x503
> > [  113.375318]  [<c02db693>] inet_ioctl+0x8d/0xa6
> > [  113.375411]  [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> > [  113.375491]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [  113.375574]  [<c019af2b>] vfs_ioctl+0x22/0x69
> > [  113.375653]  [<c019b435>] do_vfs_ioctl+0x41f/0x459
> > [  113.375734]  [<c02934eb>] ? sys_send+0x18/0x1a
> > [  113.375813]  [<c011942f>] ? do_page_fault+0x242/0x26f
> > [  113.375884]  [<c019b49b>] sys_ioctl+0x2c/0x45
> > [  113.375960]  [<c0102975>] syscall_call+0x7/0xb
> > [  113.376041] Rebooting in 5 seconds..


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html