[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <560E8B1C.9030209@eclis.ch>
Date: Fri, 02 Oct 2015 15:48:12 +0200
From: Jean-Christian de Rivaz <jc@...is.ch>
To: Thomas Osterried <thomas@...erried.de>
Cc: Peter Hurley <peter@...leysoftware.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jslaby@...e.com>, David Ranch <dranch@...nnet.net>,
Ralf Bächle DL5RB <ralf@...ux-mips.org>,
linux-hams@...nnet.net, linux-hams@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: Force mkiss to reset the line discipline when serial device is
removed
Le 02. 10. 15 12:35, Thomas Osterried a écrit :
> Hello,
>
>
>> Am 02.10.2015 um 10:30 schrieb Jean-Christian de Rivaz <jc@...is.ch>:
>>
>> Le 02. 10. 15 00:57, Peter Hurley a écrit :
>>> On 10/01/2015 12:56 PM, Jean-Christian de Rivaz wrote:
>>>> Hi Greg and Jiri,
>>>>
>>>> I try to fix a kernel panic bug related to the AX25 (and probably SLIP) line discipline when the corresponding serial device is removed [1]. I proposed some patches [2] [3] on the linux-hams mailing list but I think there raise more questions about how tty_ldisc_hangup() should work when a serial device is removed [4].
>>>>
>>>> I actually see the following options:
>>>>
>>>> a) Let the specific line discipline set the TTY_DRIVER_RESET_TERMIOS flag in tty->driver as in [2] but this is suspected bad practice [5].
>>>>
>>>> b) Let the specific line discipline set the TTY_OTHER_CLOSED flag in tty and check it in tty_ldisc_hangup() as in [3].
> If I understand correctly, in current kernels TTY_OTHER_DONE is introduced, instead of TTY_OTHER_CLOSED.
>
>>>> c) Let the specific line discipline set the TTY_LDISC_HALTED flag in tty and check it in tty_ldisc_hangup().
>>>>
>>>> d) Let the specific line discipline set a new flag for that purpose, for example TTY_LDISC_RESET, and check it in tty_ldisc_hangup().
>>>>
>>>> e) Close the tty earlier so that tty_ldisc_reinit() is not even called. Need some advise on how this should be done.
>>>>
>>>> f) That's all wrong, something other need to be changed.
>>>>
>>>> I would appreciate some comments from tty subsystem experts about this issue.
>>>>
>>>> [1] http://www.spinics.net/lists/linux-hams/msg03500.html
>> Hi Peter, thanks for your time,
>>
>>> The crash reported here appears to be related to how mkiss handles its netdev;
>>> maybe prematurely freeing the tx/rx buffers? I'd relook at how slip handles
>>> netdev teardown.
>> Yes but this is a consequence of the fact that the ax0 interface was re-opened uninitialized while the corresponding serial device is no longer connected to the system. I don’t see any rational to create this bogus interface: the serial device is gone.
> I also tried 6pack, and the traditional slip interface. The same thing happens - device reappears.
>
> I don’t think there’s a good reason for this, because after reinitialization, the iface is down, ip address and routes over it have disappeard. Thus it’s even not usable anymore as not-active-dummy-interface for ip/routes.
>
> I’ve not tested ppp and possible other line-based protocols - but I assume they’ve all the same issue, and nobody noticed before. Anyone likes to track down ppp’s behavior?
>
> Do we have a way to determine if the interface was re-initialized by the ldisc handler? Then we (and any other line based driver) could try to check in the .open call and decide what to do.
> But imho, it would always may cause problems when people write new drivers and oversee the not obvious situation where devices may reappear. The default should be not to call open again.
Fully agree on that.
> I also wonder why userspace processes like kissattach do not get a signal by the kernel, indicating that the filedescriptor is not valid anymore.
> Who’s job would it be to signal, the serial driver’s (slip, ppp, mkiss, ..), or ldisc’s?
It's a complete other problem, not kernel related. The safety of the
kernel cannot depend on a user application closing a file descriptor.
Even if the user application close his file descriptor, process
scheduling can make this delayed long enough to let's a packet reach the
parasitic uninitialized interface and completely crash the system. This
will at best only reduce the race window but do nothing to fix the real
bug. That said, kissattach uses a while (1) { sleep(); } loop that can
be cheaply replaced by a single old select() waiting on the file
descriptor. My understanding is that after the the AX25 discipline is in
place the only event that can happen is that the descriptor is to be
closed. I will test a kissattach patch for this.
AFAIK tty_ldisc_hangup() already signal EOF to the file descriptor owner
with these lines:
wake_up_interruptible_poll(&tty->write_wait, POLLOUT);
wake_up_interruptible_poll(&tty->read_wait, POLLIN);
>>> I don't see a problem with the ACM tty/tty core side of this.
>>>
>>> At the time the hangup occurs, there is actually still an ACM tty device.
>> Not physically, sorry. The physical serial device was unplugged front the system (or in hardware forced reset in the case of my test), causing a USB disconnect. It's important to understand that the USB disconnect has already occurred seconds before the crash. The fact that there is still an ACM tty structure in the kernel corresponding to nothing real is the cause of the problem.
>>
>>> The line discipline is reinited as a security precaution to prevent a previous
>>> session's data from being visible in the new session.
>> Pragmatically reinited to N_TTY is ok, this is in fact how my proposed patches work. But reinited to N_AX25 while the serial device is no more have no sense at all and cause the crash when the new uninitialized parasitic interface try to send a packet.
>>
>>> The tty core does not know
>>> at the time the vhangup() occurs that the ACM driver plans to unregister the
>>> tty device.
>> That’s the root problem: It must a least known that it must not call mkiss_open().
> Or at least mkiss_open() must have a way to dectect that a re-open was initiated.
> But as said, I’d prefer it would not happen, because otherwise it depends on every serial protocol driver to implement it correctly.
>
Fully agree again. There is absolutely no doubt that the N_AX25 line
discipline must not be open again when the serial device is removed. The
fact is that tty_ldisc_hangup() is actually a mandatory path and that it
call tty_ldisc_reinit() if a line discipline exists for the tty (alway
true in this case). So there is at least the following options:
A) Let's tty_ldisc_hangup() call tty_ldisc_reinit() but with N_TTY since
the existing code already make that possible. This is how my patches
work. Only have to agree on the condition/flag to be used. My patch rely
on code in the line discipline driver, but something more cleaver could
maybe done.
B) Same as A) but make the corresponding serial driver responsible to
set the TTY_DRIVER_RESET_TERMIOS flag in case the device is removed,
since the existing code handle that case. Unless some generic code in
the serial device layer can do that, this imply to modify all serial
drivers.
C) Modify tty_ldisc_hangup() to call tty_ldisc_close() in that case but
still have to agree on the condition.
The raise to me this question: when an application still have an open
file descriptor on a removed serial device, how long the kernel tty
structure is supposed to live ?
1) Only until the serial device removal, the file descriptor is an other
structure.
2) Until the application close the file descriptor, even if it's days
after the serial device has been removed.
>> That's the bug that must be fixed. Or maybe the option e) fix must be developed.
>>
>>> Don't do any of the things you suggest above.
>>>
>> Can I ask what did you suggest to solve the problem ? The bug is real, causing a kernel panic and complete crash of the system, requiring a hardware reset to reboot.
>>
>> Best Regards,
>> Jean-Christian de Rivaz
> vy 73,
> - Thomas dl9sau
Jean-Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists