[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200910191501.15009.denys@visp.net.lb>
Date: Mon, 19 Oct 2009 15:01:14 +0300
From: Denys Fedoryschenko <denys@...p.net.lb>
To: Michal Ostrowski <mostrows@...il.com>
Cc: netdev <netdev@...r.kernel.org>, linux-ppp@...r.kernel.org,
paulus@...ba.org, mostrows@...thlink.net
Subject: Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
Applied patch manually, still panic (maybe different now):
[ 42.596904]
[ 42.596904] Pid: 0, comm: swapper Not tainted (2.6.31.4-build-0047 #12)
[ 42.596904] EIP: 0060:[<f886865e>] EFLAGS: 00010286 CPU: 0
[ 42.596904] EIP is at pppoe_rcv+0x153/0x1be [pppoe]
[ 42.596904] EAX: 00003100 EBX: ffffffff ECX: 00000002 EDX: f74007cb
[ 42.596904] ESI: f79b4b00 EDI: f6332f00 EBP: c1806edc ESP: c1806eb8
[ 42.596904] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 42.596904] Process swapper (pid: 0, ti=c1806000 task=c03f6de0
task.ti=c03f1000)
[ 42.596904] Stack:
[ 42.596904] f7445000
00000004
f6fc0028
31000030
f6fc0030
f6332f40
f79b4b00
c0418140
Oct 19 15:00:46 194.146.153.99
[ 42.596904] <0>
f886a444
c1806f14
c029e702
f7445000
f79b4bb0
f79b4bb0
c0418160
f7445000
Oct 19 15:00:46 194.146.153.99
[ 42.596904] <0>
00000000
00000001
64886f14
c02ac3ce
f79b4b00
00000000
f753105c
c1806f58
Oct 19 15:00:46 194.146.153.99
[ 42.596904] Call Trace:
[ 42.596904] [<c029e702>] ? netif_receive_skb+0x43b/0x45a
[ 42.596904] [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[ 42.596904] [<f840f736>] ? rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[ 42.596904] [<f8412191>] ? rtl8169_poll+0x2f/0x1b2 [r8169]
[ 42.596904] [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[ 42.596904] [<c029ec95>] ? net_rx_action+0x93/0x177
[ 42.596904] [<c0131bdd>] ? __do_softirq+0xa7/0x144
[ 42.596904] [<c0131b36>] ? __do_softirq+0x0/0x144
[ 42.596904] <IRQ>
Oct 19 15:00:46 194.146.153.99
[ 42.596904] [<c0131957>] ? irq_exit+0x29/0x5c
[ 42.596904] [<c0104393>] ? do_IRQ+0x80/0x96
[ 42.596904] [<c0102f49>] ? common_interrupt+0x29/0x30
[ 42.596904] [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[ 42.596904] [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[ 42.596904] [<c0101bf0>] ? cpu_idle+0x44/0x60
[ 42.596904] [<c02ef8c7>] ? rest_init+0x53/0x55
[ 42.596904] [<c04197df>] ? start_kernel+0x2b9/0x2be
[ 42.596904] [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[ 42.596904] Code:
53
0a
32
53
0b
31
c2
c1
e8
08
31
c2
eb
08
0f
b6
c2
c1
f8
04
31
c2
d1
e9
83
f9
04
74
f1
89
d0
83
e0
0f
8b
1c
87
eb
35
66
8b
45
ea
Oct 19 15:00:46 194.146.153.99
39
83
98
01
00
00
75
22
8b
55
e4
8d
83
9a
01
00
00
b9
06
00
Oct 19 15:00:46 194.146.153.99
[ 42.596904] EIP: [<f886865e>]
pppoe_rcv+0x153/0x1be [pppoe]
SS:ESP 0068:c1806eb8
[ 42.596904] CR2: 0000000000000197
[ 42.606148] ---[ end trace b477ff4ee072d9b9 ]---
[ 42.606209] Kernel panic - not syncing: Fatal exception in interrupt
[ 42.606273] Pid: 0, comm: swapper Tainted: G D 2.6.31.4-build-0047
#12
[ 42.606351] Call Trace:
[ 42.606413] [<c02fc496>] ? printk+0xf/0x11
[ 42.606474] [<c02fc3e7>] panic+0x39/0xd9
[ 42.606535] [<c01059b7>] oops_end+0x8b/0x9a
[ 42.606597] [<c0118f6d>] no_context+0x13d/0x147
[ 42.606658] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
[ 42.606722] [<c0121dc1>] ? check_preempt_wakeup+0x34/0x141
[ 42.606862] [<c01294bb>] ? try_to_wake_up+0x1aa/0x1b4
[ 42.606930] [<c0209541>] ? cpumask_next_and+0x26/0x37
[ 42.607003] [<c01256f5>] ? find_busiest_group+0x291/0x885
[ 42.607067] [<c019cf47>] ? pollwake+0x5a/0x63
[ 42.607127] [<c011909f>] bad_area_nosemaphore+0xd/0x10
[ 42.607189] [<c0119301>] do_page_fault+0x114/0x26f
[ 42.607251] [<c01191ed>] ? do_page_fault+0x0/0x26f
[ 42.607313] [<c02fe506>] error_code+0x66/0x6c
[ 42.607375] [<c02900d8>] ? pcibios_set_master+0x89/0x8d
[ 42.607436] [<c01191ed>] ? do_page_fault+0x0/0x26f
[ 42.607501] [<f886865e>] ? pppoe_rcv+0x153/0x1be [pppoe]
[ 42.607564] [<c029e702>] netif_receive_skb+0x43b/0x45a
[ 42.607625] [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[ 42.607691] [<f840f736>] rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[ 42.607759] [<f8412191>] rtl8169_poll+0x2f/0x1b2 [r8169]
[ 42.607824] [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[ 42.607886] [<c029ec95>] net_rx_action+0x93/0x177
[ 42.607948] [<c0131bdd>] __do_softirq+0xa7/0x144
[ 42.608022] [<c0131b36>] ? __do_softirq+0x0/0x144
[ 42.608082] <IRQ>
[<c0131957>] ? irq_exit+0x29/0x5c
[ 42.608183] [<c0104393>] ? do_IRQ+0x80/0x96
[ 42.608183] [<c0104393>] ? do_IRQ+0x80/0x96
[ 42.608245] [<c0102f49>] ? common_interrupt+0x29/0x30
[ 42.608307] [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[ 42.608369] [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[ 42.608431] [<c0101bf0>] ? cpu_idle+0x44/0x60
[ 42.608493] [<c02ef8c7>] ? rest_init+0x53/0x55
[ 42.608553] [<c04197df>] ? start_kernel+0x2b9/0x2be
[ 42.608616] [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[ 42.608682] Rebooting in 5 seconds..
On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote:
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax,
> 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer
> dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a
> value of 0xffffffff in pn->hash_table[i].
>
> From this I'd hypothesize that the combination of dev_put() and
> release_sock() may have allowed us to free "pn". At the bottom of the loop
> we alreayd recognize that since locks are dropped we're responsible for
> handling invalidation of objects, and perhaps that should be extended to
> "pn" as well. --
> Michal Ostrowski
> mostrows@...il.com
>
>
> ---
> drivers/net/pppoe.c | 86 ++++++++++++++++++++++++++----
> --------------------
> 1 files changed, 45 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
> index 7cbf6f9..720c4ea 100644
> --- a/drivers/net/pppoe.c
> +++ b/drivers/net/pppoe.c
> @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)
>
> BUG_ON(dev == NULL);
>
> +restart:
> pn = pppoe_pernet(dev_net(dev));
> if (!pn) /* already freed */
> return;
> @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
> write_lock_bh(&pn->hash_lock);
> for (i = 0; i < PPPOE_HASH_SIZE; i++) {
> struct pppox_sock *po = pn->hash_table[i];
> + struct sock *sk;
>
> - while (po != NULL) {
> - struct sock *sk;
> - if (po->pppoe_dev != dev) {
> - po = po->next;
> - continue;
> - }
> - sk = sk_pppox(po);
> - spin_lock(&flush_lock);
> - po->pppoe_dev = NULL;
> - spin_unlock(&flush_lock);
> - dev_put(dev);
> -
> - /* We always grab the socket lock, followed by the
> - * hash_lock, in that order. Since we should
> - * hold the sock lock while doing any unbinding,
> - * we need to release the lock we're holding.
> - * Hold a reference to the sock so it doesn't
> disappear - * as we're jumping between locks.
> - */
> + while (po && po->pppoe_dev != dev) {
> + po = po->next;
> + }
>
> - sock_hold(sk);
> + if (po == NULL) {
> + continue;
> + }
>
> - write_unlock_bh(&pn->hash_lock);
> - lock_sock(sk);
> + sk = sk_pppox(po);
>
> - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND))
> { - pppox_unbind_sock(sk);
> - sk->sk_state = PPPOX_ZOMBIE;
> - sk->sk_state_change(sk);
> - }
> + spin_lock(&flush_lock);
> + po->pppoe_dev = NULL;
> + spin_unlock(&flush_lock);
>
> - release_sock(sk);
> - sock_put(sk);
> + dev_put(dev);
>
> - /* Restart scan at the beginning of this hash
> chain. - * While the lock was dropped the chain
> contents may - * have changed.
> - */
> - write_lock_bh(&pn->hash_lock);
> - po = pn->hash_table[i];
> - }
> + /* We always grab the socket lock, followed by the
> + * hash_lock, in that order. Since we should
> + * hold the sock lock while doing any unbinding,
> + * we need to release the lock we're holding.
> + * Hold a reference to the sock so it doesn't disappear
> + * as we're jumping between locks.
> + */
> +
> + sock_hold(sk);
> +
> + write_unlock_bh(&pn->hash_lock);
> + lock_sock(sk);
> +
> + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
> + pppox_unbind_sock(sk);
> + sk->sk_state = PPPOX_ZOMBIE;
> + sk->sk_state_change(sk);
> + }
> +
> + release_sock(sk);
> + sock_put(sk);
> +
> + /* Restart the flush process from the beginning. While
> + * the lock was dropped the chain contents may have
> + * changed, and sock_put may have made things go away.
> + */
> + goto restart;
> }
> write_unlock_bh(&pn->hash_lock);
> }
> --
> 1.6.3.3
>
> On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@...p.net.lb>
wrote:
> > I have server running as pppoe NAS.
> > Tried to rename customers without dropping pppd connections first, got
> > panic after few seconds.
> > Panic triggerable at 2.6.30.4 and 2.6.31.4
> > pppoe users running on eth2
> > pppoe flags:
> > 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> > 172.16.1.2 -N 253 -C gpzone -S gpzone
> >
> >
> > Commands sequence that i think triggered that:
> >
> > ip link set eth0 down
> > ip link set eth1 down
> > ip link set eth2 down
> > nameif etherx 00:16:76:8D:83:BA
> > nameif eth0 00:19:e0:72:4a:37
> > nameif eth1 00:19:e0:72:4a:4b
> >
> > ip addr flush dev eth0
> > ip addr flush dev eth1
> > ip addr add X.X.X.X/29 dev eth0
> > ip addr add 192.168.2.177/24 dev eth0
> > ip addr add 192.168.0.1/32 dev eth1
> > ip addr add 127.0.0.0/8 dev lo
> > #ip link set eth0 up
> > ip link set eth0 up
> > ip link set eth1 up
> > ip link set lo up
> > ip route add 0.0.0.0/0 via X.X.X.X
> >
> >
> > [ 103.428591] r8169: eth0: link up
> > [ 103.430360] r8169: eth1: link up
> > [ 113.361528] BUG: unable to handle kernel
> > NULL pointer dereference
> > at 0000018f
> > [ 113.361717] IP:
> > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.361853] *pdpt = 000000003411a001
> > *pde = 0000000000000000
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362012] Oops: 0000 [#1]
> > SMP
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> > [ 113.362246] Modules linked in:
> > netconsole
> > configfs
> > act_skbedit
> > sch_ingress
> > sch_prio
> > cls_flow
> > cls_u32
> > em_meta
> > cls_basic
> > xt_dscp
> > xt_DSCP
> > ipt_REJECT
> > ts_bm
> > xt_string
> > xt_hl
> > ifb
> > cls_fw
> > sch_tbf
> > sch_htb
> > act_ipt
> > act_mirred
> > xt_MARK
> > pppoe
> > pppox
> > ppp_generic
> > slhc
> > xt_TCPMSS
> > xt_mark
> > xt_tcpudp
> > iptable_mangle
> > iptable_nat
> > nf_nat
> > rtc_cmos
> > nf_conntrack_ipv4
> > rtc_core
> > nf_conntrack
> > rtc_lib
> > nf_defrag_ipv4
> > iptable_filter
> > ip_tables
> > x_tables
> > 8021q
> > garp
> > stp
> > llc
> > loop
> > sata_sil
> > pata_atiixp
> > pata_acpi
> > ata_generic
> > libata
> > 8139cp
> > usb_storage
> > mtdblock
> > mtd_blkdevs
> > mtd
> > sr_mod
> > cdrom
> > tulip
> > r8169
> > sky2
> > via_velocity
> > via_rhine
> > sis900
> > ne2k_pci
> > 8390
> > skge
> > tg3
> > libphy
> > 8139too
> > e1000
> > e100
> > usbhid
> > ohci_hcd
> > uhci_hcd
> > ehci_hcd
> > usbcore
> > nls_base
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344]
> > [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> > [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> > [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> > [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> > [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> > task.ti=f4145000)
> > [ 113.362344] Stack:
> > [ 113.362344] f4fbe220
> > f4fbe000
> > f6cea5a0
> > f886a430
> > fffffff5
> > 00000000
> > f4145e54
> > c01422b3
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] <0>
> > f4fbe000
> > 00000009
> > f8a457d8
> > f4fbe000
> > f8850190
> > 00001091
> > f4145e64
> > c0142361
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] <0>
> > ffffffff
> > 00000000
> > f4145e74
> > c029ffbf
> > f4fbe000
> > 000010d0
> > f4145e90
> > c029fa70
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] Call Trace:
> > [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> > [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> > [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c
> > [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158
> > [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503
> > [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6
> > [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> > [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69
> > [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> > [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a
> > [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f
> > [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45
> > [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb
> > [ 113.362344] Code:
> > c9
> > 00
> > 00
> > 00
> > 89
> > c7
> > 31
> > f6
> > 83
> > c7
> > 40
> > 89
> > f8
> > e8
> > cc
> > 60
> > a9
> > c7
> > 8b
> > 45
> > ec
> > 05
> > 20
> > 02
> > 00
> > 00
> > 89
> > 45
> > e8
> > 8b
> > 4d
> > f0
> > 8b
> > 1c
> > b1
> > e9
> > 8c
> > 00
> > 00
> > 00
> > 8b
> > 45
> > ec
> > Oct 18 23:59:40 194.146.153.93
> > 83
> > 90
> > 01
> > 00
> > 00
> > 74
> > 08
> > 8b
> > 9b
> > 8c
> > 01
> > 00
> > 00
> > eb
> > 79
> > b8
> > c0
> > a6
> > 86
> > f8
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] EIP: [<f8868269>]
> > pppoe_device_event+0x80/0x12c [pppoe]
> > SS:ESP 0068:f4145e1c
> > [ 113.362344] CR2: 000000000000018f
> > [ 113.373124] ---[ end trace f6fe64a307e97f3b ]---
> > [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 113.373286] Pid: 2858, comm: pppd Tainted: G D
> > 2.6.31.4-build-0047 #7
> > [ 113.373379] Call Trace:
> > [ 113.373479] [<c02fc496>] ? printk+0xf/0x11
> > [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9
> > [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a
> > [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147
> > [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> > [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> > [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> > [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c
> > [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96
> > [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30
> > [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10
> > [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f
> > [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [ 113.374605] [<c02fe506>] error_code+0x66/0x6c
> > [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> > [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a
> > [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe
> > [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c
> > [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158
> > [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503
> > [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6
> > [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> > [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69
> > [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459
> > [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a
> > [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f
> > [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45
> > [ 113.375960] [<c0102975>] syscall_call+0x7/0xb
> > [ 113.376041] Rebooting in 5 seconds..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists