[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200707171607.08644.olaf.kirch@oracle.com>
Date: Tue, 17 Jul 2007 16:07:07 +0200
From: Olaf Kirch <olaf.kirch@...cle.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Jarek Poplawski <jarkao2@...pl>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, davem@...emloft.net
Subject: Re: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll
On Tuesday 17 July 2007 10:57, Ingo Molnar wrote:
> i've got a new observation: changing CONFIG_HZ from 250 to 1000 makes
> the problem go away. So it's somehow also related to jiffies.
There are several "Tx Hang detected" messages in the log, which looks
a lot as if net_rx_action never runs, or at least never calls
dev->poll on the e1000 nic.
Can you check whether/how often it bails out of net_rx_action taking
the softnet_break path?
My suspicion right now is that dev->quota goes way negative when
pushing out netconsole output. Normally, we bump the quota in
__net_rx_schedule. But the whole point of the patch is that
netpoll has no business removing the device from the poll_list,
so it stays there, and we don't end up calling __net_rx_schedule
as often as we would otherwise.
Can you try what happens if you change netif_rx_complete to
something like this:
if (test_bit(__LINK_STATE_POLL_LIST_FROZEN, &dev->state)) {
dev->quota = dev->weight;
return;
}
This is just a hack to make sure that we don't go to insanely
negative quotas while sending packets through netpoll.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@....de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists