linux-kernel - Re: [patch] revert: [NET]: Fix races in net_rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070719100135.GA2986@elte.hu>
Date:	Thu, 19 Jul 2007 12:01:35 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Olaf Kirch <olaf.kirch@...cle.com>
Cc:	Jarek Poplawski <jarkao2@...pl>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, davem@...emloft.net
Subject: Re: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll


* Olaf Kirch <olaf.kirch@...cle.com> wrote:

>  -	You say that netconsole output continues to trickle after
> 	the network gets wedged. This could be caused by the
> 	e1000 watchdog, which triggers a NIC interrupt "to ensure
> 	rx ring is cleaned". I assume that this triggers the
> 	regular e1000_intr, which succeeds in putting the NIC on
> 	the poll_list, and net_rx_action call dev->poll once.

no - it appears that 'trickle' only happened with one of your patches 
(to which i replied with that 'trickle' mail). With what i have booted 
now (only your original patch and nothing else, 100 Hz and !dynticks), 
netconsole output stopped here:

 Calling initcall 0xc0603f55: netpoll_init+0x0/0x39()
 initcall 0xc0603f55: netpoll_init+0x0/0x39() returned 0.
 initcall 0xc0603f55 ran for 0 msecs: netpoll_init+0x0/0x39()
 Calling initcall 0xc0604257: netlink_proto_init+0x0/0x12a()
 NET: Registered protocol family 16

and no output ever since - and the box has been up for a few minutes.

> So, can you verify whether there are any interrupts arriving on the 
> NIC after the network got wedged? You could also try ethtool -s eth0 
> msglevel 65535 - would be interesting to see what dmesg contains. If 
> there's little to no debug output from the driver, let it run for 10 
> seconds or so, in order to catch the e1000 watchdog timer a few times.

eth0's irq count is stuck at 5 interrupts - and has not changed for 
minutes.

i tried ethtool -s eth0 msglvl 65535, but (sa expected) there's no 
output. I've attached below ifconfig output and ethtool -S output - 
maybe that tells you something new about the state of eth0. (to me it 
only tells what we already know: tx timed out once and eth0 is stuck 
ever since.)

Btw., i definitely need your help with this bug as it's now hopelessly 
out of my league :-/

	Ingo

------------------>
eth0      Link encap:Ethernet  HWaddr 00:16:41:17:49:D2
          inet addr:10.0.1.15  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:873 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:87076 (85.0 KiB)
          Base address:0x2000 Memory:ee000000-ee020000 

NIC statistics:
     rx_packets: 0
     tx_packets: 873
     rx_bytes: 0
     tx_bytes: 87076
     rx_broadcast: 0
     tx_broadcast: 0
     rx_multicast: 0
     tx_multicast: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 1
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 0
     rx_csum_offload_good: 0
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/