linux-kernel - Re: BUG: IPv6 stops working after a while, needs ip ne del command to reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.1008171053230.21857@red.crap.retrofitta.se>
Date:	Tue, 17 Aug 2010 13:08:41 +0200 (CEST)
From:	Thomas Habets <thomas@...ets.pp.se>
To:	Eric Dumazet <eric.dumazet@...il.com>
cc:	Thomas Habets <thomas@...ets.pp.se>, linux-kernel@...r.kernel.org,
	netdev <netdev@...r.kernel.org>
Subject: Re: BUG: IPv6 stops working after a while, needs ip ne del command
 to reset


Aha! New development:

The Cisco router can't discover the address of the Linux box because Linux 
doesn't seem to be listening to ff02::1 (all-nodes).

-----------
cisco#ping ff02::1
Output Interface: GigabitEthernet1/2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to FF02::1, timeout is 2 seconds:
Packet sent with a source address of 
FE80::222:55FF:FE17:4B80%GigabitEthernet1/2

Request 0 timed out
Request 1 timed out
Request 2 timed out
Request 3 timed out
Request 4 timed out
Success rate is 0 percent (0/5)
0 multicast replies and 0 errors.
------------

If i set promisc mode on the interface (tcpdump without -p or "ip link set 
promisc on eth0") it starts working (both normal ping and the above ping 
from the Cisco to ff02::1). It continues working until I guess the 
neighbor table on the cisco times out (leaving it overnight seems to 
be enough idle time) or I manually do a "clear ipv6 neig".

So great news! I can reproduce it at will with no waiting time! Right 
after rebooting the Linux box I run "clear ipv6 neighbors" and Linux can 
no longer ping the router. Tested reproducing it immediately after reboot.

The Linux box itself can ping ff02::1%eth0 with no problem, and gets 
replies from the fe80:: link-local of itself and the Cisco router.

So could this be that for some reason the NIC isn't listening 
multicast MAC address 33:33:ff:5c:00:02 ?

Is there a way to see the list of addresses that get past the NIC? Or can 
this perhaps be filtered after the NIC, but before tcpdump -p?

Since this now looks like a NIC thing, here's some info about eth0:

$ dmesg | grep eth0
[...]
tg3 0000:03:04.0: eth0: Tigon3 [partno(N/A) rev 9003] (PCIX:133MHz:64-bit) 
MAC address 00:24:81:a3:44:24
tg3 0000:03:04.0: eth0: attached PHY is 5714 (10/100/1000Base-T Ethernet) 
(WireSpeed[1])
tg3 0000:03:04.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
tg3 0000:03:04.0: eth0: dma_rwctrl[76148000] dma_mask[40-bit]
[...]

$ sudo lspci -v -s 03:04.0
03:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 
Gigabit Ethernet (rev a3)
Subsystem: Hewlett-Packard Company NC326i PCIe Dual Port Gigabit Server 
Adapter
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 47
Memory at fdff0000 (64-bit, non-prefetchable) [size=64K]
Memory at fdfe0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at <ignored> [disabled]
Capabilities: [40] PCI-X non-bridge device
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 
Enable+
Kernel driver in use: tg3
Kernel modules: tg3

$ sudo ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:24:81:a3:44:24
           inet addr:x.x.x.x  Bcast:x.x.x.x 
Mask:255.255.255.252
           inet6 addr: 2a00:800:752:1::5c:2/112 Scope:Global
           inet6 addr: fe80::224:81ff:fea3:4424/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:928 errors:0 dropped:0 overruns:0 frame:0
           TX packets:834 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:142281 (138.9 KiB)  TX bytes:154616 (150.9 KiB)
           Interrupt:16

I have doublechecked iptables, ip6tables and arptables, and they are 
either not compiled in the kernel or they are empty ACCEPT lists.

I have answered your questions below even if they may no longer be 
applicable.


On Tue, 17 Aug 2010, Eric Dumazet wrote:
>> $ ip -6 ne sh
>> 2a00:800:752:1::5c:1 dev eth0 lladdr 00:22:55:17:4b:80 router STALE
>>
>> [try ping6 again, no reply]
>>
>> $ ip -6 ne sh
>> 2a00:800:752:1::5c:1 dev eth0 lladdr 00:22:55:17:4b:80 router DELAY
>>
>> [try ping6 again, no reply]
>>
>> $ ip -6 ne sh
>> 2a00:800:752:1::5c:1 dev eth0 lladdr 00:22:55:17:4b:80 router REACHABLE
>>
> This seems a bit different than previous mail. Apparently discovery now
> works ?

I didn't post the "ip -6 ne sh" immediately after ping attempt last time. 
I'm not sure this changed since last time.

But the tcpdump output from last time seems to indicate that ND did work 
then, at least in one direction, even if solicitation came from link-local 
address and not the global address. The solicitation was answered, after 
all (as seen in the tcpdump in in the original mail).

> Could you have a tcpdump on both sides ?

Not easily. The other end is a Cisco and a bit inconvenient to get to. I'm 
going there tomorrow night, so I can hook up a cable and do a monitor 
port then if needed.

---------
typedef struct me_s {
   char name[]      = { "Thomas Habets" };
   char email[]     = { "thomas@...ets.pp.se" };
   char kernel[]    = { "Linux" };
   char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt" };
   char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE  0945 286A E90A AD48 E854" };
   char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/