[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1336903839.7390.13.camel@marge.simpson.net>
Date: Sun, 13 May 2012 12:10:39 +0200
From: Mike Galbraith <mgalbraith@...e.de>
To: Thadeu Lima de Souza Cascardo <cascardo@...ux.vnet.ibm.com>
Cc: netdev <netdev@...r.kernel.org>
Subject: [patch] Re: qlge driver corrupting kernel memory
On Fri, 2012-05-11 at 13:38 +0200, Mike Galbraith wrote:
> On Tue, 2012-05-08 at 09:07 -0300, Thadeu Lima de Souza Cascardo wrote:
> > On Tue, May 08, 2012 at 01:00:18PM +0200, Mike Galbraith wrote:
> > > Greetings network wizards,
> > >
> > > $subject is happening in an 2.6.32 enterprise kernel with the driver
> > > updated to what looks to me to be 2.6.38 or so.
> > >
> > > Allegedly, IFF boxen are running dual CNAs with storage and LAN sharing
> > > a port, $subject happens fairly regularly. Rummaging in crashdumps
> > > seems to show corruption happens because we somehow end up stuffing
> > > loads of frags into skb_shared_info, scribbling all over the place.
> > >
> > > Before I proceed, what I know about skbs can be found here..
> > >
> > > http://vger.kernel.org/~davem/skb_data.html
> > >
> > > ..and that's the sum and total ;-)
> > >
> > > I guess the first thing I should ask is whether anyone has seen such
> > > scribbling with this driver. Known issue would be a case of happiness,
> > > but I doubt that will be the case from searching, so onward.
> > >
> >
> > Hi, Mike.
> >
> > From what you describe, I suspect this is related to this fix:
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=782428535e0819b5b7c9825cd3faa2ad37032a70
> >
> > Please, apply and report if that works for you.
>
> Nope, box exploded. I haven't seen a dump yet, but expect it'll be more
> of the same scribbling.
Something else popped up meanwhile. Shortly after tx_ring->q order 5
allocation failure and ql_release_adapter_resources(), BUG: Bad page
state has now arrived twice to muddy the water.
[ 3537.150327] Node 0 DMA: 2*4kB 2*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 360kB
[ 3537.150345] Node 0 DMA32: 318*4kB 144*8kB 89*16kB 17*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4712kB
[ 3537.150364] 5248 total pagecache pages
[ 3537.150367] 211 pages in swap cache
[ 3537.150372] Swap cache stats: add 1437, delete 1226, find 1641/1752
[ 3537.150377] Free swap = 67109880kB
[ 3537.150381] Total swap = 67111528kB
[ 3537.152314] 73723 pages RAM
[ 3537.152319] 13128 pages reserved
[ 3537.152322] 4910 pages shared
[ 3537.152326] 22795 pages non-shared
[ 3537.152333] qlge 0000:04:00.0: ql_alloc_mem_resources: TX resource allocation failed.
[ 3537.152343] qlge 0000:04:00.0: ql_get_adapter_resources: Unable to allocate memory.
[ 3537.152499] qlge 0000:04:00.0: ql_set_mac_addr_reg: Adding UNICAST address 00:c0:dd:1a:46:ac at index 0 in the CAM.
[ 3537.440237] BUG: Bad page state in process ifdown-dhcp pfn:10940
[ 3537.440244] page:ffffea00003a0600 flags:0020000000000000 count:-1 mapcount:0 mapping:(null) index:0
[ 3537.440249] Pid: 4317, comm: ifdown-dhcp Tainted: G X 2.6.32.54-0.3.1.4242.0.TEST-default #1
[ 3537.440253] Call Trace:
[ 3537.440265] [<ffffffff810061dc>] dump_trace+0x6c/0x2d0
[ 3537.440271] [<ffffffff8139b366>] dump_stack+0x69/0x73
[ 3537.440279] [<ffffffff810badb3>] bad_page+0xe3/0x170
[ 3537.440284] [<ffffffff810bbedb>] prep_new_page+0xab/0x1b0
[ 3537.440289] [<ffffffff810bc2e4>] get_page_from_freelist+0x304/0x720
[ 3537.440295] [<ffffffff810bc9ba>] __alloc_pages_slowpath+0x11a/0x5f0
[ 3537.440300] [<ffffffff810bcfca>] __alloc_pages_nodemask+0x13a/0x140
[ 3537.440305] [<ffffffff810bbdd9>] __get_free_pages+0x9/0x50
[ 3537.440314] [<ffffffff8104ba62>] dup_task_struct+0x42/0x150
[ 3537.440320] [<ffffffff8104cc54>] copy_process+0xb4/0xe50
[ 3537.440324] [<ffffffff8104da7c>] do_fork+0x8c/0x3c0
[ 3537.440331] [<ffffffff81003263>] stub_clone+0x13/0x20
[ 3537.441094] DWARF2 unwinder stuck at stub_clone+0x13/0x20
[ 3537.441097]
[ 3537.441098] Leftover inexact backtrace:
[ 3537.441099]
[ 3537.441103] [<ffffffff81002f7b>] ? system_call_fastpath+0x16/0x1b
[ 3537.441107] Disabling lock debugging due to kernel taint
[ 3537.899545] bonding: bond0 is being deleted..
glge: Fix double pci_free_consistent() upon tx_ring->q allocation failure
Let ql_free_tx_resources() do it's job. You are not helping.
Signed-off-by: Mike Galbraith <mgalbraith@...e.de>
---
drivers/net/qlge/qlge_main.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -2664,11 +2664,8 @@ static int ql_alloc_tx_resources(struct
pci_alloc_consistent(qdev->pdev, tx_ring->wq_size,
&tx_ring->wq_base_dma);
- if ((tx_ring->wq_base == NULL) ||
- tx_ring->wq_base_dma & WQ_ADDR_ALIGN) {
- QPRINTK(qdev, IFUP, ERR, "tx_ring alloc failed.\n");
- return -ENOMEM;
- }
+ if ((tx_ring->wq_base == NULL) tx_ring->wq_base_dma & WQ_ADDR_ALIGN)
+ goto err;
tx_ring->q =
kmalloc(tx_ring->wq_len * sizeof(struct tx_ring_desc), GFP_KERNEL);
if (tx_ring->q == NULL)
@@ -2676,8 +2673,7 @@ static int ql_alloc_tx_resources(struct
return 0;
err:
- pci_free_consistent(qdev->pdev, tx_ring->wq_size,
- tx_ring->wq_base, tx_ring->wq_base_dma);
+ QPRINTK(qdev, IFUP, ERR, "tx_ring alloc failed.\n");
return -ENOMEM;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists