lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 31 Jan 2010 01:34:49 +0100
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Michael Breuer <mbreuer@...jas.com>
Cc:	Stephen Hemminger <shemminger@...ux-foundation.org>,
	David Miller <davem@...emloft.net>, akpm@...ux-foundation.org,
	flyboy@...il.com, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, Michael Chan <mchan@...adcom.com>,
	Don Fry <pcnet32@...izon.net>,
	Francois Romieu <romieu@...zoreil.com>,
	Matt Carlson <mcarlson@...adcom.com>
Subject: Re: [PATCH] sky2:  receive dma mapping error handling

On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
> On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
> >Please try this patch (and only this patch), on 2.6.33-rc5[*];
> >none of the other patches that did not make it upstream because that
> >confuses things too much.
> >
> >The code that checks for DMA mapping errors on receive buffers would
> >not handle errors correctly.  I doubt you have these errors, but if you
> >did then it would explain the problems.  The code has to be a little
> >tricky and build mapping for new rx buffer before releasing old one,
> >that way if new mapping fails, the old one can be reused.
> >
> >If it works for you, I will resubmit with signed-off.
> >
> >-
> >
> Nope - tx crash again. This time the system stayed up (but hosed)
> for a few hours. When I tried to recover eth0 the system then
> crashed.
> 
> Brief summary of events (log extract below):
> 
> System start Jan 28 19:29
> Everything seemed good (load and all) until 17:13:11 the following
> day when I got rx errors:
> 
> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
> length 1518
> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
> length 1518

These are length errors, but status shows more than 1518, e.g. 2036
here, unless I miss something. Please, don't use jumbo frames in your
network until we fully debug it for regular frames (Stephen admitted
sky2 jumbo might be broken).

...
> As I started looking at logs, the system hung and rebooted. I'm up
> now with dma debug enabled, however as with 2.6.32.4 num_entries is
> dropping and I don't think that dma debug will remain enabled long
> enough to catch a crash.

Could you try the patch below to show maybe some other users of
dma-debug entries?

Jarek P.
---

 lib/dma-debug.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 7d2f0b3..e2dcc9c 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -310,6 +310,53 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
 	list_del(&entry->list);
 }
 
+struct dma_debug_dev {
+	struct device *dev;
+	unsigned int cnt;
+};
+
+#define DMA_DEBUG_DEVS 100
+static struct dma_debug_dev dma_debug_devs[DMA_DEBUG_DEVS];
+
+static void debug_dma_dump_devs(void)
+{
+	int idx, i;
+
+	memset(dma_debug_devs, 0, sizeof(struct dma_debug_dev) * DMA_DEBUG_DEVS);
+
+	for (idx = 0; idx < HASH_SIZE; idx++) {
+		struct hash_bucket *bucket = &dma_entry_hash[idx];
+		struct dma_debug_entry *entry;
+		unsigned long flags;
+
+		spin_lock_irqsave(&bucket->lock, flags);
+
+		list_for_each_entry(entry, &bucket->list, list) {
+			for (i = 0; i < DMA_DEBUG_DEVS; i++) {
+				struct device *dev = dma_debug_devs[i].dev;
+
+				if (!dev || dev == entry->dev) {
+					dma_debug_devs[i].dev = entry->dev;
+					dma_debug_devs[i].cnt++;
+					break;
+				}
+			}
+		}
+
+		spin_unlock_irqrestore(&bucket->lock, flags);
+	}
+
+	for (i = 0; i < DMA_DEBUG_DEVS; i++) {
+		struct device *dev = dma_debug_devs[i].dev;
+
+		if (!dev)
+			break;
+
+		pr_info("DMA-API: %s: entries: %d\n", dev_name(dev),
+			dma_debug_devs[i].cnt);
+	}
+}
+
 /*
  * Dump mapping entries for debugging purposes
  */
@@ -363,8 +410,11 @@ static struct dma_debug_entry *__dma_entry_alloc(void)
 	memset(entry, 0, sizeof(*entry));
 
 	num_free_entries -= 1;
-	if (num_free_entries < min_free_entries)
+	if (num_free_entries < min_free_entries) {
 		min_free_entries = num_free_entries;
+		if ((min_free_entries & 0xffff) == 0)
+			debug_dma_dump_devs();
+	}
 
 	return entry;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists