[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100608003707.GA30604@sysclose.org>
Date: Mon, 7 Jun 2010 21:37:07 -0300
From: Flavio Leitner <fbl@...close.org>
To: David Miller <davem@...emloft.net>
Cc: netdev@...r.kernel.org, amwang@...hat.com, fubar@...ibm.com,
mpm@...enic.com, gospo@...hat.com, nhorman@...driver.com,
jmoyer@...hat.com, shemminger@...ux-foundation.org,
linux-kernel@...r.kernel.org, bridge@...ts.linux-foundation.org,
bonding-devel@...ts.sourceforge.net
Subject: Re: [PATCH] netconsole: queue console messages to send later
On Mon, Jun 07, 2010 at 04:50:24PM -0700, David Miller wrote:
> From: Flavio Leitner <fleitner@...hat.com>
> Date: Mon, 7 Jun 2010 16:24:52 -0300
>
> > There are some networking drivers that hold a lock in the
> > transmit path. Therefore, if a console message is printed
> > after that, netconsole will push it through the transmit path,
> > resulting in a deadlock.
> >
> > This patch fixes the re-injection problem by queuing the console
> > messages in a preallocated circular buffer and then scheduling a
> > workqueue to send them later with another context.
> >
> > Signed-off-by: Flavio Leitner <fleitner@...hat.com>
>
> You absolutely and positively MUST NOT do this. Otherwise netconsole
> becomes completely useless. Your idea has been proposed several times
> as far back as 6 years ago, it was unacceptable then and it's
> unacceptable now.
>
> The whole point of netconsole is that we may be deep in an interrupt
> or other atomic context, the machine is about to hard hang, and it's
> absolutely essential that we get out any and all kernel logging
> messages that we can, immediately.
Got it. I've never assumed that netconsole would work reliable on
such situations, so I thought as we have better ways now it would
be helpful. See another idea below.
> There may not be another timer or workqueue able to execute after the
> printk() we're trying to emit. We may never get to that point.
What if in the netpoll, before we push the skb to the driver, we check
for a bit saying that it's already pushing another skb. In this case,
queue the new skb inside of netpoll and soon as the first call returns
and try to clear the bit, it will send the next skb?
printk("message 1")
...
netconsole called
netpoll sets the flag bit
pushes to the bonding driver which does another printk("message 2")
netconsole called again
netpoll checks for the flag, queue the message, returns.
so, bonding can finish up to send the first message
netpoll is about to return, checks for new queued messages, and pushes them.
bonding finishes up to send the second message
....
No deadlocks, skbs are ordered and still under the same opportunity
to send something. Does it sound acceptable?
It's off the top of my head, so probably this idea has some problems.
> Fix the locking in the drivers or layers that cause the issue instead
> of breaking netconsole.
Someday, somewhere, I know because I did this before, someone will
use a debugging printk() and will see the entire box hanging with
absolutely no message in any console because of this problem.
I'm not saying that fixing driver isn't the right way to go but
it seems not enough to me.
--
Flavio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists