lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4B878EC3.3020605@jp.fujitsu.com>
Date:	Fri, 26 Feb 2010 18:05:07 +0900
From:	Taku Izumi <izumi.taku@...fujitsu.com>
To:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Allan, Bruce W" <bruce.w.allan@...el.com>,
	"David S. Miller" <davem@...emloft.net>,
	"Ronciak, John" <john.ronciak@...el.com>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
	"Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>,
	Koki Sanagi <sanagi.koki@...fujitsu.com>,
	Kenji Kaneshige <kaneshige.kenji@...fujitsu.com>,
	"chavey@...gle.com" <chavey@...gle.com>,
	"e1000-devel@...ts.sourceforge.net" 
	<e1000-devel@...ts.sourceforge.net>
Subject: Re: [PATCH v2 0/3] e1000e,igb,ixgbe: add registers etc. printout
 code just before resetting adapters

Hi Jesse,

(2010/02/18 4:10), Brandeburg, Jesse wrote:
> 
> On Tue, 26 Jan 2010, Taku Izumi wrote:
>> (2010/01/23 6:54), Brandeburg, Jesse wrote:
>>> Taku, thanks for these, we are talking the patches over and reviewing
>>> them.  While I agree with the idea of these patches is good, I still don't
>>> agree with the default being enabled.  Usually if someone is getting tx
>>> hangs they are repeatable and we can work with them to get the debug
>>> turned on.  I DO think it is useful to have the feature available by
>>> default but NOT enabled.
>>>
>>> If we wanted to enable something by default it might be useful to print
>>> something that actually draws some conclusions from known failure modes,
>>> like if TDH!=TDT after some amount of time.  I think one or two lines
>>> maximum for default printing.
>>>
>>> If you're working in this area I had an idea.  I had wanted to be able to
>>> print the large amount of ring information (especially in the ixgbe case
>>> with many rings) to the ftrace buffers in order to not overrun the syslog
>>> daemon.  Not sure if you're interested in more new features, it certainly
>>> is separate but related to this patch.
>>
>> I thought similar things, that is, all information should be dumped to
>> the private ring buffer to avoid filling syslog up with driver messages.
>> But I didn't have any good idea to extract information from ring buffers,
>> so as the first step, I decided to printout it by using printk().
>> Is there the easy way to extract it from ring buffers?
> 
> First, I need to apologize for the delay, it's my fault.
> 
> I'm not sure about an easy way to use the private ring buffers yet, I know
> you can use the ftrace buffers directly, and then when someone read
> /sys/kernel/debug/tracing/trace they would get all the output.
> 
> Lets not do ftrace yet.
> 
> I have a counter proposal to make, here is a (incomplete) patch that we
> use all the time to debug tx hangs.  This example is for e1000e.  We are
> trying to avoid changes with other users' copyright to some of the files
> in our drivers so that we can ship them under multiple license.  We would
> much prefer something like this to be used, can you review?
> 
> I also have another patch I made for ixgb to show the use of sysfs and a
> module parameter, I'll reply again with that next.
> 
> ---
> From: Jesse Brandeburg<jesse.brandeburg@...el.com>
> 
> e1000e: dump code for descriptor rings
> 
> This is debug code for descriptor ring dumps, only enable if you really want a
> lot of stuff in your syslog/dmesg when you get NETDEV WATCHDOG or tx hang
> reports from the driver.
> 
> <not implemented>  Code can be enabled at boot time with a module parameter or
> via sysfs at runtime.
> 
> Signed-off-by: Jesse Brandeburg<jesse.brandeburg@...el.com>
> ---
> 
>   drivers/net/e1000e/ethtool.c |    4 +
>   drivers/net/e1000e/hw.h      |   10 +
>   drivers/net/e1000e/netdev.c  |  298 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 311 insertions(+), 1 deletions(-)
> 
> 
> diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
> index 0aa50c2..bbcf5c7 100644
> --- a/drivers/net/e1000e/ethtool.c
> +++ b/drivers/net/e1000e/ethtool.c
> @@ -434,6 +434,8 @@ static int e1000_get_regs_len(struct net_device *netdev)
>   	return E1000_REGS_LEN * sizeof(u32);
>   }
> 
> +extern void e1000e_dump(struct e1000_adapter *);
> +
>   static void e1000_get_regs(struct net_device *netdev,
>   			   struct ethtool_regs *regs, void *p)
>   {
> @@ -443,6 +445,8 @@ static void e1000_get_regs(struct net_device *netdev,
>   	u16 phy_data;
>   	u8 revision_id;
> 
> +	e1000e_dump(adapter);
> +
>   	memset(p, 0, E1000_REGS_LEN * sizeof(u32));
> 
>   	pci_read_config_byte(adapter->pdev, PCI_REVISION_ID,&revision_id);
> diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
> index eccf29b..d23501d 100644
> --- a/drivers/net/e1000e/hw.h
> +++ b/drivers/net/e1000e/hw.h
> @@ -92,6 +92,11 @@ enum e1e_registers {
>   	E1000_FCRTL    = 0x02160, /* Flow Control Receive Threshold Low - RW */
>   	E1000_FCRTH    = 0x02168, /* Flow Control Receive Threshold High - RW */
>   	E1000_PSRCTL   = 0x02170, /* Packet Split Receive Control - RW */
> +	E1000_RDFH     = 0x02410, /* Rx Data FIFO Head - RW */
> +	E1000_RDFT     = 0x02418, /* Rx Data FIFO Tail - RW */
> +	E1000_RDFHS    = 0x02420, /* Rx Data FIFO Head Saved - RW */
> +	E1000_RDFTS    = 0x02428, /* Rx Data FIFO Tail Saved - RW */
> +	E1000_RDFPC    = 0x02430, /* Rx Data FIFO Packet Count - RW */
>   	E1000_RDBAL    = 0x02800, /* Rx Descriptor Base Address Low - RW */
>   	E1000_RDBAH    = 0x02804, /* Rx Descriptor Base Address High - RW */
>   	E1000_RDLEN    = 0x02808, /* Rx Descriptor Length - RW */
> @@ -112,6 +117,11 @@ enum e1e_registers {
>    */
>   #define E1000_RDBAL_REG(_n)   (E1000_RDBAL + (_n<<  8))
>   	E1000_KABGTXD  = 0x03004, /* AFE Band Gap Transmit Ref Data */
> +	E1000_TDFH     = 0x03410, /* TX Data FIFO Head - RW */
> +	E1000_TDFT     = 0x03418, /* TX Data FIFO Tail - RW */
> +	E1000_TDFHS    = 0x03420, /* TX Data FIFO Head Saved - RW */
> +	E1000_TDFTS    = 0x03428, /* Tx Data FIFO Tail Saved - RW */
> +	E1000_TDFPC    = 0x03430, /* TX Data FIFO Packet Count - RW */
>   	E1000_TDBAL    = 0x03800, /* Tx Descriptor Base Address Low - RW */
>   	E1000_TDBAH    = 0x03804, /* Tx Descriptor Base Address High - RW */
>   	E1000_TDLEN    = 0x03808, /* Tx Descriptor Length - RW */
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index c45965a..c990341 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -48,7 +48,7 @@
> 
>   #include "e1000.h"
> 
> -#define DRV_VERSION "1.0.2-k2"
> +#define DRV_VERSION "1.0.2-k4"
>   char e1000e_driver_name[] = "e1000e";
>   const char e1000e_driver_version[] = DRV_VERSION;
> 
> @@ -601,6 +601,301 @@ static void e1000_print_hw_hang(struct work_struct *work)
>   	      pci_status);
>   }
> 
> +#if 0
> +static void hexdump(dma_addr_t dma, u16 len)
> +{
> +	u16 offset, i;
> +	char str[80], byte[4];
> +	void *va = phys_to_virt((unsigned long)dma);
> +
> +	printk("buffer at %016llX (%d)\n", (u64)dma, len);
> +	for (offset = 0; offset<  len; offset += 16) {
> +		sprintf(str, "%04x: ", offset);
> +		for (i = 0; i<  16; i++) {
> +			if ((offset + i)<  len)
> +				sprintf(byte, "%02x ",
> +				        ((unsigned char *)va)[offset + i]);
> +			else
> +				strcpy(byte, "   ");
> +			strcat(str, byte);
> +		}
> +		printk("%s\n", str);
> +	}
> +}
> +
> +#endif
> +void e1000e_dump(struct e1000_adapter *adapter)
> +{
> +	/* this code doesn't handle multiple rings */
> +	struct e1000_ring *tx_ring = adapter->tx_ring;
> +	struct e1000_ring *rx_ring = adapter->rx_ring;
> +	struct e1000_hw *hw =&adapter->hw;
> +	int i = 0;
> +#define NUM_REGS 38 /* 1 based count */
> +	u32 regs[NUM_REGS];
> +	u32 *regs_buff = regs;
> +
> +	char *reg_name[] = {
> +	"CTRL",  "STATUS",
> +	"RCTL", "RDLEN", "RDH", "RDT", "RDTR",
> +	"TCTL", "TDBAL", "TDBAH", "TDLEN", "TDH", "TDT",
> +	"TIDV", "TXDCTL", "TADV", "TARC0",
> +	"TDBAL1", "TDBAH1", "TDLEN1", "TDH1", "TDT1",
> +	"TXDCTL1", "TARC1",
> +	"CTRL_EXT", "ERT", "RDBAL", "RDBAH",
> +	 "TDFH", "TDFT", "TDFHS", "TDFTS", "TDFPC",
> +	 "RDFH", "RDFT", "RDFHS", "RDFTS", "RDFPC",
> +	};
> +
> +	//if (!e1000e_debug_dump)
> +		//return;
> +
> +	regs_buff[0]  = er32(CTRL);
> +	regs_buff[1]  = er32(STATUS);
> +
> +	regs_buff[2]  = er32(RCTL);
> +	regs_buff[3]  = er32(RDLEN);
> +	regs_buff[4]  = er32(RDH);
> +	regs_buff[5]  = er32(RDT);
> +	regs_buff[6]  = er32(RDTR);
> +
> +	regs_buff[7]  = er32(TCTL);
> +	regs_buff[8]  = er32(TDBAL);
> +	regs_buff[9]  = er32(TDBAH);
> +	regs_buff[10] = er32(TDLEN);
> +	regs_buff[11] = er32(TDH);
> +	regs_buff[12] = er32(TDT);
> +	regs_buff[13] = er32(TIDV);
> +	regs_buff[14] = er32(TXDCTL(0));
> +	regs_buff[15] = er32(TADV);
> +	regs_buff[16] = er32(TARC(0));
> +
> +	regs_buff[22] = er32(TXDCTL(1));
> +	regs_buff[23] = er32(TARC(1));
> +	regs_buff[24] = er32(CTRL_EXT);
> +	regs_buff[25] = er32(ERT);
> +	regs_buff[26] = er32(RDBAL);
> +	regs_buff[27] = er32(RDBAH);
> +	regs_buff[28] = er32(TDFH);
> +	regs_buff[29] = er32(TDFT);
> +	regs_buff[30] = er32(TDFHS);
> +	regs_buff[31] = er32(TDFTS);
> +	regs_buff[32] = er32(TDFPC);
> +	regs_buff[33] = er32(RDFH);
> +	regs_buff[34] = er32(RDFT);
> +	regs_buff[35] = er32(RDFHS);
> +	regs_buff[36] = er32(RDFTS);
> +	regs_buff[37] = er32(RDFPC);

 I think indexing is annoying especially when the number of registers
 is large for example in case of the ixgbe driver.

> +
> +	e_err("Register dump\n");
> +	for (i = 0; i<  NUM_REGS; i++) {
> +		printk("%-15s  %08x\n",
> +		reg_name[i], regs_buff[i]);
> +	}
> +
> +	/*
> +	 * transmit dump
> +	 */
> +	printk(KERN_ERR"TX Desc ring0 dump\n");
> +
> +	/* Transmit Descriptor Formats - DEXT[29] is 0 (Legacy) or 1 (Extended)
> +	 *
> +	 * Legacy Transmit Descriptor
> +	 *   +--------------------------------------------------------------+
> +	 * 0 |         Buffer Address [63:0] (Reserved on Write Back)       |
> +	 *   +--------------------------------------------------------------+
> +	 * 8 | Special  |    CSS     | Status |  CMD    |  CSO   |  Length  |
> +	 *   +--------------------------------------------------------------+
> +	 *   63       48 47        36 35    32 31     24 23    16 15        0
> +	 *
> +	 * Extended Context Descriptor (DTYP=0x0) for TSO or checksum offload
> +	 *   63      48 47    40 39       32 31             16 15    8 7      0
> +	 *   +----------------------------------------------------------------+
> +	 * 0 |  TUCSE  | TUCS0  |   TUCSS   |     IPCSE       | IPCS0 | IPCSS |
> +	 *   +----------------------------------------------------------------+
> +	 * 8 |   MSS   | HDRLEN | RSV | STA | TUCMD | DTYP |      PAYLEN      |
> +	 *   +----------------------------------------------------------------+
> +	 *   63      48 47    40 39 36 35 32 31   24 23  20 19                0
> +	 *
> +	 * Extended Data Descriptor (DTYP=0x1)
> +	 *   +----------------------------------------------------------------+
> +	 * 0 |                     Buffer Address [63:0]                      |
> +	 *   +----------------------------------------------------------------+
> +	 * 8 | VLAN tag |  POPTS  | Rsvd | Status | Command | DTYP |  DTALEN  |
> +	 *   +----------------------------------------------------------------+
> +	 *   63       48 47     40 39  36 35    32 31     24 23  20 19        0
> +	 */
> +	printk("Tl(Legacy)   [address 63:0  ] [SpeCssSCmCsLen] [buffer_info dma"
> +	       " leng  ntw       time_stamp             skb]\n");
> +	printk("Tc(ExtCont)  [Ce CoCsIpceCoS] [MssHlRSCm0Plen] [buffer_info dma"
> +	       " leng  ntw       time_stamp             skb]\n");
> +	printk("Td(ExtData)  [address 63:0  ] [VlaPoRSCm1Dlen] [buffer_info dma"
> +	       " leng  ntw       time_stamp             skb]\n");
> +	for (i = 0; tx_ring->desc&&  (i<  tx_ring->count); i++) {
> +		struct e1000_tx_desc *tx_desc = E1000_TX_DESC(*tx_ring, i);
> +		struct e1000_buffer *buffer_info =&tx_ring->buffer_info[i];
> +		struct my_u { u64 a; u64 b; };
> +		struct my_u *u = (struct my_u *)tx_desc;
> +		printk("T%c[0x%03X]    %016llX %016llX %016llX %04X  %3X "
> +		       "%016llX %p",
> +		       (!(le64_to_cpu(u->b)&  (1<<29)) ? 'l' :
> +		         ((le64_to_cpu(u->b)&  (1<<20)) ? 'd' : 'c')), i,
> +		       le64_to_cpu(u->a), le64_to_cpu(u->b),
> +		       (u64)buffer_info->dma, buffer_info->length,
> +		       buffer_info->next_to_watch, (u64)buffer_info->time_stamp,
> +		       buffer_info->skb);
> +		if (i == tx_ring->next_to_use&&  i == tx_ring->next_to_clean)
> +			printk(" NTC/U\n");
> +		else if (i == tx_ring->next_to_use)
> +			printk(" NTU\n");
> +		else if (i == tx_ring->next_to_clean)
> +			printk(" NTC\n");
> +		else
> +			printk("\n");
> +
> +#if 0
> +		if (adapter->dump_buffers&&  (buffer_info->dma != 0))
> +			hexdump(buffer_info->dma, buffer_info->length);
> +#endif
> +	}
> +
> +	/*
> +	 * receive dump
> +	 */
> +	printk(KERN_ERR"\nRX Desc ring dump\n");
> +
> +	switch (adapter->rx_ps_pages) {
> +	case 1:
> +	case 2:
> +	case 3:
> +		/* [Extended] Packet Split Receive Descriptor Format
> +		 *
> +		 *    +-----------------------------------------------------+
> +		 *  0 |                Buffer Address 0 [63:0]              |
> +		 *    +-----------------------------------------------------+
> +		 *  8 |                Buffer Address 1 [63:0]              |
> +		 *    +-----------------------------------------------------+
> +		 * 16 |                Buffer Address 2 [63:0]              |
> +		 *    +-----------------------------------------------------+
> +		 * 24 |                Buffer Address 3 [63:0]              |
> +		 *    +-----------------------------------------------------+
> +		 */
> +		printk("R  [desc]      [buffer 0 63:0 ] [buffer 1 63:0 ] "
> +		       "[buffer 2 63:0 ] [buffer 3 63:0 ] [bi->dma       ] "
> +		       "[bi->skb]<-- Ext Pkt Split format\n");
> +		/* [Extended] Receive Descriptor (Write-Back) Format
> +		 *
> +		 *   63       48 47    32 31     13 12    8 7    4 3        0
> +		 *   +------------------------------------------------------+
> +		 * 0 | Packet   | IP     |  Rsvd   | MRQ   | Rsvd | MRQ RSS |
> +		 *   | Checksum | Ident  |         | Queue |      |  Type   |
> +		 *   +------------------------------------------------------+
> +		 * 8 | VLAN Tag | Length | Extended Error | Extended Status |
> +		 *   +------------------------------------------------------+
> +		 *   63       48 47    32 31            20 19               0
> +		 */
> +		printk("RWB[desc]      [ck ipid mrqhsh] [vl   l0 ee  es] "
> +		       "[ l3  l2  l1 hs] [reserved      ] ---------------- "
> +		       "[bi->skb]<-- Ext Rx Write-Back format\n");
> +		for (i = 0; i<  rx_ring->count; i++) {
> +			struct e1000_buffer *buffer_info =
> +						&rx_ring->buffer_info[i];
> +			union e1000_rx_desc_packet_split *rx_desc =
> +						E1000_RX_DESC_PS(*rx_ring, i);
> +			struct my_u { u64 a; u64 b; u64 c; u64 d; };
> +			struct my_u *u = (struct my_u *)rx_desc;
> +			u32 staterr;
> +			staterr = le32_to_cpu(rx_desc->wb.middle.status_error);
> +			if (staterr&  E1000_RXD_STAT_DD) {
> +				/* Descriptor Done */
> +				printk("RWB[0x%03X]     %016llX %016llX %016llX"
> +				       " %016llX ---------------- %p",
> +				       i, le64_to_cpu(u->a), le64_to_cpu(u->b),
> +				       le64_to_cpu(u->c), le64_to_cpu(u->d),
> +				       buffer_info->skb);
> +			} else {
> +				printk("R  [0x%03X]     %016llX %016llX %016llX"
> +				       " %016llX %016llX %p",
> +				       i, le64_to_cpu(u->a), le64_to_cpu(u->b),
> +				       le64_to_cpu(u->c), le64_to_cpu(u->d),
> +				       (u64)buffer_info->dma, buffer_info->skb);
> +
> +#if 0
> +				if (adapter->dump_buffers)
> +					hexdump(buffer_info->dma,
> +					        adapter->rx_ps_bsize0);
> +#endif
> +			}
> +
> +			if (i == rx_ring->next_to_use)
> +				printk(" NTU\n");
> +			else if (i == rx_ring->next_to_clean)
> +				printk(" NTC\n");
> +			else
> +				printk("\n");
> +		}
> +		break;
> +	default:
> +	case 0:
> +		/* Legacy Receive Descriptor Format
> +		 *
> +		 * +-----------------------------------------------------+
> +		 * |                Buffer Address [63:0]                |
> +		 * +-----------------------------------------------------+
> +		 * | VLAN Tag | Errors | Status 0 | Packet csum | Length |
> +		 * +-----------------------------------------------------+
> +		 * 63       48 47    40 39      32 31         16 15      0
> +		 */
> +		printk("Rl[desc]     [address 63:0  ] [vl er S cks ln] "
> +		       "[bi->dma       ] [bi->skb]<-- Legacy format\n");
> +		for (i = 0; rx_ring->desc&&  (i<  rx_ring->count); i++) {
> +			struct e1000_rx_desc *rx_desc =
> +						E1000_RX_DESC(*rx_ring, i);
> +			struct e1000_buffer *buffer_info =
> +						&rx_ring->buffer_info[i];
> +			struct my_u { u64 a; u64 b; };
> +			struct my_u *u = (struct my_u *)rx_desc;
> +			printk("Rl[0x%03X]    %016llX %016llX %016llX %p",
> +				i, le64_to_cpu(u->a), le64_to_cpu(u->b),
> +				(u64)buffer_info->dma, buffer_info->skb);
> +			if (i == rx_ring->next_to_use)
> +				printk(" NTU\n");
> +			else if (i == rx_ring->next_to_clean)
> +				printk(" NTC\n");
> +			else
> +				printk("\n");
> +
> +#if 0
> +			if (adapter->dump_buffers)
> +				hexdump(buffer_info->dma,
> +				        adapter->rx_buffer_len);
> +#endif
> +		} /* for */
> +	} /* switch */
> +
> +	/* dump the descriptor caches */
> +	/* rx */
> +	printk("Rx descriptor cache in 64bit format\n");
> +	for (i = 0x6000; i<= 0x63FF ; i += 0x10) {
> +		printk("R%04X: %08X|%08X %08X|%08X\n",
> +				i,
> +				readl(adapter->hw.hw_addr + i+4),
> +				readl(adapter->hw.hw_addr + i),
> +				readl(adapter->hw.hw_addr + i+12),
> +				readl(adapter->hw.hw_addr + i+8));
> +	}
> +	/* tx */
> +	printk("Tx descriptor cache in 64bit format\n");
> +	for (i = 0x7000; i<= 0x73FF ; i += 0x10) {
> +		printk("T%04X: %08X|%08X %08X|%08X\n",
> +				i,
> +				readl(adapter->hw.hw_addr + i+4),
> +				readl(adapter->hw.hw_addr + i),
> +				readl(adapter->hw.hw_addr + i+12),
> +				readl(adapter->hw.hw_addr + i+8));
> +	}
> +}
> +
>   /**
>    * e1000_clean_tx_irq - Reclaim resources after transmit completes
>    * @adapter: board private structure
> @@ -682,6 +977,7 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
>   			       + (adapter->tx_timeout_factor * HZ))&&
>   		!(er32(STATUS)&  E1000_STATUS_TXOFF)) {
>   			schedule_work(&adapter->print_hang_task);
> +			e1000e_dump(adapter);

  I think it is not proper to invoke e1000e_dump() here because here is in interupt
  context and there is the case of resetting adapters without meeting the condition
  of "adapter->detect_tx_hung == true"

	
>   			netif_stop_queue(netdev);
>   		}
>   	}
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

Best regards,
Taku Izumi


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ