lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 Aug 2008 12:21:38 +0200
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Alexey Dobriyan" <adobriyan@...il.com>
Cc:	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	"Jeff Garzik" <jgarzik@...ox.com>
Subject: Re: [RFC][PATCH] netconsole: avoid deadlock on printk from driver code

(Hm, did my original e-mail not make it to the lists?)

>> The least invasive fix is to detect that we're trying to re-enter the
>> driver code. We provide a netdev_busy() function which can be used to
>> determine whether a deadlock can occur if we try to transmit another
>> packet.
>>
>> Note that this may lead to lost messages if the driver is active on
>> another CPU while we try to use the same device for netconsole.
>
> This sucks.

Indeed. What can be done about it?

>> It would probably be best to set a "lost messages" flag in this case and
>> add it to the stream when the device becomes ready again.

With such a flag, we can at least avoid silent loss of messages. But
what is the likelihood of this happening in the first place? I have no
idea.

I still think that this is better than a possible deadlock, which
gives NO CLUE as to what happened (because it jams the other
consoles). Hopefully the interface will not be used for other traffic.

But I agree. This possibly penalizes all the other cases where people
*don't* pull out their netconsole-device cables.

We could simply not print out the "link up/down" message if the device
is running netconsole. But it feels like papering over the design
error. Because there are other functions in there which may cause
additional messages to be printed, not just the very visible printk().

Can we detect if the lock was taken on the same CPU or not?

Maybe it is possible to postpone the packet instead of discarding it?

>> +static bool rtl8139_busy (struct net_device *dev)
>> +{
>> +     struct rtl8139_private *tp = netdev_priv(dev);
>> +     return spin_is_locked(&tp->lock);
>> +}
>
> How do I know if my driver is suspectible to this sort of deadlock?

if the ->hard_start_xmit() takes any locks which can be taken from any
code that calls (also indirectly!) printk(), in this case the
interrupt handler that is invoked when I replace the cable. My guess
is that _all_ drivers need to take some sort of lock in
hard_start_xmit().

It's a bit fragile, I agree. But the completely silent deadlock is nasty too!


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ