[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4B878EC3.3020605@jp.fujitsu.com>
Date: Fri, 26 Feb 2010 18:05:07 +0900
From: Taku Izumi <izumi.taku@...fujitsu.com>
To: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Allan, Bruce W" <bruce.w.allan@...el.com>,
"David S. Miller" <davem@...emloft.net>,
"Ronciak, John" <john.ronciak@...el.com>,
"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
"Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>,
Koki Sanagi <sanagi.koki@...fujitsu.com>,
Kenji Kaneshige <kaneshige.kenji@...fujitsu.com>,
"chavey@...gle.com" <chavey@...gle.com>,
"e1000-devel@...ts.sourceforge.net"
<e1000-devel@...ts.sourceforge.net>
Subject: Re: [PATCH v2 0/3] e1000e,igb,ixgbe: add registers etc. printout
code just before resetting adapters
Hi Jesse,
(2010/02/18 4:10), Brandeburg, Jesse wrote:
>
> On Tue, 26 Jan 2010, Taku Izumi wrote:
>> (2010/01/23 6:54), Brandeburg, Jesse wrote:
>>> Taku, thanks for these, we are talking the patches over and reviewing
>>> them. While I agree with the idea of these patches is good, I still don't
>>> agree with the default being enabled. Usually if someone is getting tx
>>> hangs they are repeatable and we can work with them to get the debug
>>> turned on. I DO think it is useful to have the feature available by
>>> default but NOT enabled.
>>>
>>> If we wanted to enable something by default it might be useful to print
>>> something that actually draws some conclusions from known failure modes,
>>> like if TDH!=TDT after some amount of time. I think one or two lines
>>> maximum for default printing.
>>>
>>> If you're working in this area I had an idea. I had wanted to be able to
>>> print the large amount of ring information (especially in the ixgbe case
>>> with many rings) to the ftrace buffers in order to not overrun the syslog
>>> daemon. Not sure if you're interested in more new features, it certainly
>>> is separate but related to this patch.
>>
>> I thought similar things, that is, all information should be dumped to
>> the private ring buffer to avoid filling syslog up with driver messages.
>> But I didn't have any good idea to extract information from ring buffers,
>> so as the first step, I decided to printout it by using printk().
>> Is there the easy way to extract it from ring buffers?
>
> First, I need to apologize for the delay, it's my fault.
>
> I'm not sure about an easy way to use the private ring buffers yet, I know
> you can use the ftrace buffers directly, and then when someone read
> /sys/kernel/debug/tracing/trace they would get all the output.
>
> Lets not do ftrace yet.
>
> I have a counter proposal to make, here is a (incomplete) patch that we
> use all the time to debug tx hangs. This example is for e1000e. We are
> trying to avoid changes with other users' copyright to some of the files
> in our drivers so that we can ship them under multiple license. We would
> much prefer something like this to be used, can you review?
>
> I also have another patch I made for ixgb to show the use of sysfs and a
> module parameter, I'll reply again with that next.
>
> ---
> From: Jesse Brandeburg<jesse.brandeburg@...el.com>
>
> e1000e: dump code for descriptor rings
>
> This is debug code for descriptor ring dumps, only enable if you really want a
> lot of stuff in your syslog/dmesg when you get NETDEV WATCHDOG or tx hang
> reports from the driver.
>
> <not implemented> Code can be enabled at boot time with a module parameter or
> via sysfs at runtime.
>
> Signed-off-by: Jesse Brandeburg<jesse.brandeburg@...el.com>
> ---
>
> drivers/net/e1000e/ethtool.c | 4 +
> drivers/net/e1000e/hw.h | 10 +
> drivers/net/e1000e/netdev.c | 298 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 311 insertions(+), 1 deletions(-)
>
>
> diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
> index 0aa50c2..bbcf5c7 100644
> --- a/drivers/net/e1000e/ethtool.c
> +++ b/drivers/net/e1000e/ethtool.c
> @@ -434,6 +434,8 @@ static int e1000_get_regs_len(struct net_device *netdev)
> return E1000_REGS_LEN * sizeof(u32);
> }
>
> +extern void e1000e_dump(struct e1000_adapter *);
> +
> static void e1000_get_regs(struct net_device *netdev,
> struct ethtool_regs *regs, void *p)
> {
> @@ -443,6 +445,8 @@ static void e1000_get_regs(struct net_device *netdev,
> u16 phy_data;
> u8 revision_id;
>
> + e1000e_dump(adapter);
> +
> memset(p, 0, E1000_REGS_LEN * sizeof(u32));
>
> pci_read_config_byte(adapter->pdev, PCI_REVISION_ID,&revision_id);
> diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
> index eccf29b..d23501d 100644
> --- a/drivers/net/e1000e/hw.h
> +++ b/drivers/net/e1000e/hw.h
> @@ -92,6 +92,11 @@ enum e1e_registers {
> E1000_FCRTL = 0x02160, /* Flow Control Receive Threshold Low - RW */
> E1000_FCRTH = 0x02168, /* Flow Control Receive Threshold High - RW */
> E1000_PSRCTL = 0x02170, /* Packet Split Receive Control - RW */
> + E1000_RDFH = 0x02410, /* Rx Data FIFO Head - RW */
> + E1000_RDFT = 0x02418, /* Rx Data FIFO Tail - RW */
> + E1000_RDFHS = 0x02420, /* Rx Data FIFO Head Saved - RW */
> + E1000_RDFTS = 0x02428, /* Rx Data FIFO Tail Saved - RW */
> + E1000_RDFPC = 0x02430, /* Rx Data FIFO Packet Count - RW */
> E1000_RDBAL = 0x02800, /* Rx Descriptor Base Address Low - RW */
> E1000_RDBAH = 0x02804, /* Rx Descriptor Base Address High - RW */
> E1000_RDLEN = 0x02808, /* Rx Descriptor Length - RW */
> @@ -112,6 +117,11 @@ enum e1e_registers {
> */
> #define E1000_RDBAL_REG(_n) (E1000_RDBAL + (_n<< 8))
> E1000_KABGTXD = 0x03004, /* AFE Band Gap Transmit Ref Data */
> + E1000_TDFH = 0x03410, /* TX Data FIFO Head - RW */
> + E1000_TDFT = 0x03418, /* TX Data FIFO Tail - RW */
> + E1000_TDFHS = 0x03420, /* TX Data FIFO Head Saved - RW */
> + E1000_TDFTS = 0x03428, /* Tx Data FIFO Tail Saved - RW */
> + E1000_TDFPC = 0x03430, /* TX Data FIFO Packet Count - RW */
> E1000_TDBAL = 0x03800, /* Tx Descriptor Base Address Low - RW */
> E1000_TDBAH = 0x03804, /* Tx Descriptor Base Address High - RW */
> E1000_TDLEN = 0x03808, /* Tx Descriptor Length - RW */
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index c45965a..c990341 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -48,7 +48,7 @@
>
> #include "e1000.h"
>
> -#define DRV_VERSION "1.0.2-k2"
> +#define DRV_VERSION "1.0.2-k4"
> char e1000e_driver_name[] = "e1000e";
> const char e1000e_driver_version[] = DRV_VERSION;
>
> @@ -601,6 +601,301 @@ static void e1000_print_hw_hang(struct work_struct *work)
> pci_status);
> }
>
> +#if 0
> +static void hexdump(dma_addr_t dma, u16 len)
> +{
> + u16 offset, i;
> + char str[80], byte[4];
> + void *va = phys_to_virt((unsigned long)dma);
> +
> + printk("buffer at %016llX (%d)\n", (u64)dma, len);
> + for (offset = 0; offset< len; offset += 16) {
> + sprintf(str, "%04x: ", offset);
> + for (i = 0; i< 16; i++) {
> + if ((offset + i)< len)
> + sprintf(byte, "%02x ",
> + ((unsigned char *)va)[offset + i]);
> + else
> + strcpy(byte, " ");
> + strcat(str, byte);
> + }
> + printk("%s\n", str);
> + }
> +}
> +
> +#endif
> +void e1000e_dump(struct e1000_adapter *adapter)
> +{
> + /* this code doesn't handle multiple rings */
> + struct e1000_ring *tx_ring = adapter->tx_ring;
> + struct e1000_ring *rx_ring = adapter->rx_ring;
> + struct e1000_hw *hw =&adapter->hw;
> + int i = 0;
> +#define NUM_REGS 38 /* 1 based count */
> + u32 regs[NUM_REGS];
> + u32 *regs_buff = regs;
> +
> + char *reg_name[] = {
> + "CTRL", "STATUS",
> + "RCTL", "RDLEN", "RDH", "RDT", "RDTR",
> + "TCTL", "TDBAL", "TDBAH", "TDLEN", "TDH", "TDT",
> + "TIDV", "TXDCTL", "TADV", "TARC0",
> + "TDBAL1", "TDBAH1", "TDLEN1", "TDH1", "TDT1",
> + "TXDCTL1", "TARC1",
> + "CTRL_EXT", "ERT", "RDBAL", "RDBAH",
> + "TDFH", "TDFT", "TDFHS", "TDFTS", "TDFPC",
> + "RDFH", "RDFT", "RDFHS", "RDFTS", "RDFPC",
> + };
> +
> + //if (!e1000e_debug_dump)
> + //return;
> +
> + regs_buff[0] = er32(CTRL);
> + regs_buff[1] = er32(STATUS);
> +
> + regs_buff[2] = er32(RCTL);
> + regs_buff[3] = er32(RDLEN);
> + regs_buff[4] = er32(RDH);
> + regs_buff[5] = er32(RDT);
> + regs_buff[6] = er32(RDTR);
> +
> + regs_buff[7] = er32(TCTL);
> + regs_buff[8] = er32(TDBAL);
> + regs_buff[9] = er32(TDBAH);
> + regs_buff[10] = er32(TDLEN);
> + regs_buff[11] = er32(TDH);
> + regs_buff[12] = er32(TDT);
> + regs_buff[13] = er32(TIDV);
> + regs_buff[14] = er32(TXDCTL(0));
> + regs_buff[15] = er32(TADV);
> + regs_buff[16] = er32(TARC(0));
> +
> + regs_buff[22] = er32(TXDCTL(1));
> + regs_buff[23] = er32(TARC(1));
> + regs_buff[24] = er32(CTRL_EXT);
> + regs_buff[25] = er32(ERT);
> + regs_buff[26] = er32(RDBAL);
> + regs_buff[27] = er32(RDBAH);
> + regs_buff[28] = er32(TDFH);
> + regs_buff[29] = er32(TDFT);
> + regs_buff[30] = er32(TDFHS);
> + regs_buff[31] = er32(TDFTS);
> + regs_buff[32] = er32(TDFPC);
> + regs_buff[33] = er32(RDFH);
> + regs_buff[34] = er32(RDFT);
> + regs_buff[35] = er32(RDFHS);
> + regs_buff[36] = er32(RDFTS);
> + regs_buff[37] = er32(RDFPC);
I think indexing is annoying especially when the number of registers
is large for example in case of the ixgbe driver.
> +
> + e_err("Register dump\n");
> + for (i = 0; i< NUM_REGS; i++) {
> + printk("%-15s %08x\n",
> + reg_name[i], regs_buff[i]);
> + }
> +
> + /*
> + * transmit dump
> + */
> + printk(KERN_ERR"TX Desc ring0 dump\n");
> +
> + /* Transmit Descriptor Formats - DEXT[29] is 0 (Legacy) or 1 (Extended)
> + *
> + * Legacy Transmit Descriptor
> + * +--------------------------------------------------------------+
> + * 0 | Buffer Address [63:0] (Reserved on Write Back) |
> + * +--------------------------------------------------------------+
> + * 8 | Special | CSS | Status | CMD | CSO | Length |
> + * +--------------------------------------------------------------+
> + * 63 48 47 36 35 32 31 24 23 16 15 0
> + *
> + * Extended Context Descriptor (DTYP=0x0) for TSO or checksum offload
> + * 63 48 47 40 39 32 31 16 15 8 7 0
> + * +----------------------------------------------------------------+
> + * 0 | TUCSE | TUCS0 | TUCSS | IPCSE | IPCS0 | IPCSS |
> + * +----------------------------------------------------------------+
> + * 8 | MSS | HDRLEN | RSV | STA | TUCMD | DTYP | PAYLEN |
> + * +----------------------------------------------------------------+
> + * 63 48 47 40 39 36 35 32 31 24 23 20 19 0
> + *
> + * Extended Data Descriptor (DTYP=0x1)
> + * +----------------------------------------------------------------+
> + * 0 | Buffer Address [63:0] |
> + * +----------------------------------------------------------------+
> + * 8 | VLAN tag | POPTS | Rsvd | Status | Command | DTYP | DTALEN |
> + * +----------------------------------------------------------------+
> + * 63 48 47 40 39 36 35 32 31 24 23 20 19 0
> + */
> + printk("Tl(Legacy) [address 63:0 ] [SpeCssSCmCsLen] [buffer_info dma"
> + " leng ntw time_stamp skb]\n");
> + printk("Tc(ExtCont) [Ce CoCsIpceCoS] [MssHlRSCm0Plen] [buffer_info dma"
> + " leng ntw time_stamp skb]\n");
> + printk("Td(ExtData) [address 63:0 ] [VlaPoRSCm1Dlen] [buffer_info dma"
> + " leng ntw time_stamp skb]\n");
> + for (i = 0; tx_ring->desc&& (i< tx_ring->count); i++) {
> + struct e1000_tx_desc *tx_desc = E1000_TX_DESC(*tx_ring, i);
> + struct e1000_buffer *buffer_info =&tx_ring->buffer_info[i];
> + struct my_u { u64 a; u64 b; };
> + struct my_u *u = (struct my_u *)tx_desc;
> + printk("T%c[0x%03X] %016llX %016llX %016llX %04X %3X "
> + "%016llX %p",
> + (!(le64_to_cpu(u->b)& (1<<29)) ? 'l' :
> + ((le64_to_cpu(u->b)& (1<<20)) ? 'd' : 'c')), i,
> + le64_to_cpu(u->a), le64_to_cpu(u->b),
> + (u64)buffer_info->dma, buffer_info->length,
> + buffer_info->next_to_watch, (u64)buffer_info->time_stamp,
> + buffer_info->skb);
> + if (i == tx_ring->next_to_use&& i == tx_ring->next_to_clean)
> + printk(" NTC/U\n");
> + else if (i == tx_ring->next_to_use)
> + printk(" NTU\n");
> + else if (i == tx_ring->next_to_clean)
> + printk(" NTC\n");
> + else
> + printk("\n");
> +
> +#if 0
> + if (adapter->dump_buffers&& (buffer_info->dma != 0))
> + hexdump(buffer_info->dma, buffer_info->length);
> +#endif
> + }
> +
> + /*
> + * receive dump
> + */
> + printk(KERN_ERR"\nRX Desc ring dump\n");
> +
> + switch (adapter->rx_ps_pages) {
> + case 1:
> + case 2:
> + case 3:
> + /* [Extended] Packet Split Receive Descriptor Format
> + *
> + * +-----------------------------------------------------+
> + * 0 | Buffer Address 0 [63:0] |
> + * +-----------------------------------------------------+
> + * 8 | Buffer Address 1 [63:0] |
> + * +-----------------------------------------------------+
> + * 16 | Buffer Address 2 [63:0] |
> + * +-----------------------------------------------------+
> + * 24 | Buffer Address 3 [63:0] |
> + * +-----------------------------------------------------+
> + */
> + printk("R [desc] [buffer 0 63:0 ] [buffer 1 63:0 ] "
> + "[buffer 2 63:0 ] [buffer 3 63:0 ] [bi->dma ] "
> + "[bi->skb]<-- Ext Pkt Split format\n");
> + /* [Extended] Receive Descriptor (Write-Back) Format
> + *
> + * 63 48 47 32 31 13 12 8 7 4 3 0
> + * +------------------------------------------------------+
> + * 0 | Packet | IP | Rsvd | MRQ | Rsvd | MRQ RSS |
> + * | Checksum | Ident | | Queue | | Type |
> + * +------------------------------------------------------+
> + * 8 | VLAN Tag | Length | Extended Error | Extended Status |
> + * +------------------------------------------------------+
> + * 63 48 47 32 31 20 19 0
> + */
> + printk("RWB[desc] [ck ipid mrqhsh] [vl l0 ee es] "
> + "[ l3 l2 l1 hs] [reserved ] ---------------- "
> + "[bi->skb]<-- Ext Rx Write-Back format\n");
> + for (i = 0; i< rx_ring->count; i++) {
> + struct e1000_buffer *buffer_info =
> + &rx_ring->buffer_info[i];
> + union e1000_rx_desc_packet_split *rx_desc =
> + E1000_RX_DESC_PS(*rx_ring, i);
> + struct my_u { u64 a; u64 b; u64 c; u64 d; };
> + struct my_u *u = (struct my_u *)rx_desc;
> + u32 staterr;
> + staterr = le32_to_cpu(rx_desc->wb.middle.status_error);
> + if (staterr& E1000_RXD_STAT_DD) {
> + /* Descriptor Done */
> + printk("RWB[0x%03X] %016llX %016llX %016llX"
> + " %016llX ---------------- %p",
> + i, le64_to_cpu(u->a), le64_to_cpu(u->b),
> + le64_to_cpu(u->c), le64_to_cpu(u->d),
> + buffer_info->skb);
> + } else {
> + printk("R [0x%03X] %016llX %016llX %016llX"
> + " %016llX %016llX %p",
> + i, le64_to_cpu(u->a), le64_to_cpu(u->b),
> + le64_to_cpu(u->c), le64_to_cpu(u->d),
> + (u64)buffer_info->dma, buffer_info->skb);
> +
> +#if 0
> + if (adapter->dump_buffers)
> + hexdump(buffer_info->dma,
> + adapter->rx_ps_bsize0);
> +#endif
> + }
> +
> + if (i == rx_ring->next_to_use)
> + printk(" NTU\n");
> + else if (i == rx_ring->next_to_clean)
> + printk(" NTC\n");
> + else
> + printk("\n");
> + }
> + break;
> + default:
> + case 0:
> + /* Legacy Receive Descriptor Format
> + *
> + * +-----------------------------------------------------+
> + * | Buffer Address [63:0] |
> + * +-----------------------------------------------------+
> + * | VLAN Tag | Errors | Status 0 | Packet csum | Length |
> + * +-----------------------------------------------------+
> + * 63 48 47 40 39 32 31 16 15 0
> + */
> + printk("Rl[desc] [address 63:0 ] [vl er S cks ln] "
> + "[bi->dma ] [bi->skb]<-- Legacy format\n");
> + for (i = 0; rx_ring->desc&& (i< rx_ring->count); i++) {
> + struct e1000_rx_desc *rx_desc =
> + E1000_RX_DESC(*rx_ring, i);
> + struct e1000_buffer *buffer_info =
> + &rx_ring->buffer_info[i];
> + struct my_u { u64 a; u64 b; };
> + struct my_u *u = (struct my_u *)rx_desc;
> + printk("Rl[0x%03X] %016llX %016llX %016llX %p",
> + i, le64_to_cpu(u->a), le64_to_cpu(u->b),
> + (u64)buffer_info->dma, buffer_info->skb);
> + if (i == rx_ring->next_to_use)
> + printk(" NTU\n");
> + else if (i == rx_ring->next_to_clean)
> + printk(" NTC\n");
> + else
> + printk("\n");
> +
> +#if 0
> + if (adapter->dump_buffers)
> + hexdump(buffer_info->dma,
> + adapter->rx_buffer_len);
> +#endif
> + } /* for */
> + } /* switch */
> +
> + /* dump the descriptor caches */
> + /* rx */
> + printk("Rx descriptor cache in 64bit format\n");
> + for (i = 0x6000; i<= 0x63FF ; i += 0x10) {
> + printk("R%04X: %08X|%08X %08X|%08X\n",
> + i,
> + readl(adapter->hw.hw_addr + i+4),
> + readl(adapter->hw.hw_addr + i),
> + readl(adapter->hw.hw_addr + i+12),
> + readl(adapter->hw.hw_addr + i+8));
> + }
> + /* tx */
> + printk("Tx descriptor cache in 64bit format\n");
> + for (i = 0x7000; i<= 0x73FF ; i += 0x10) {
> + printk("T%04X: %08X|%08X %08X|%08X\n",
> + i,
> + readl(adapter->hw.hw_addr + i+4),
> + readl(adapter->hw.hw_addr + i),
> + readl(adapter->hw.hw_addr + i+12),
> + readl(adapter->hw.hw_addr + i+8));
> + }
> +}
> +
> /**
> * e1000_clean_tx_irq - Reclaim resources after transmit completes
> * @adapter: board private structure
> @@ -682,6 +977,7 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
> + (adapter->tx_timeout_factor * HZ))&&
> !(er32(STATUS)& E1000_STATUS_TXOFF)) {
> schedule_work(&adapter->print_hang_task);
> + e1000e_dump(adapter);
I think it is not proper to invoke e1000e_dump() here because here is in interupt
context and there is the case of resetting adapters without meeting the condition
of "adapter->detect_tx_hung == true"
> netif_stop_queue(netdev);
> }
> }
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
Best regards,
Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists