linux-kernel - Re: [PATCH] serial: flush ldisc after hangup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56D5DDA6.5080605@fb.com>
Date:	Tue, 1 Mar 2016 13:21:26 -0500
From:	Josef Bacik <jbacik@...com>
To:	Peter Hurley <peter@...leysoftware.com>,
	<gregkh@...uxfoundation.org>, <jslaby@...e.com>,
	<linux-serial@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] serial: flush ldisc after hangup

On 03/01/2016 01:17 PM, Peter Hurley wrote:
> Hi Josef,
>
> On 03/01/2016 10:02 AM, Josef Bacik wrote:
>> We hit a panic pretty consistently in production that looked like this
>>
>> PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
>>   #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
>>   #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
>>   #2 [ffff88015834ba60] oops_end at ffffffff81006478
>>   #3 [ffff88015834ba90] no_context at ffffffff818c5262
>>   #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
>>   #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
>>   #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
>>   #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
>>   #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
>>      [exception RIP: __uart_start+0x1a]
>>      RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
>>      RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
>>      RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
>>      RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
>>      R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
>>      R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
>
> Thanks for the report, but where's the rest of the stack trace?

Woops sorry about that

crash> bt
PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
  #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
  #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
  #2 [ffff88015834ba60] oops_end at ffffffff81006478
  #3 [ffff88015834ba90] no_context at ffffffff818c5262
  #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
  #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
  #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
  #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
  #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
     [exception RIP: __uart_start+0x1a]
     RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
     RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
     RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
     RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
     R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
     R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
#10 [ffff88015834bcb0] uart_flush_chars at ffffffff8152fc1e
#11 [ffff88015834bcc0] n_tty_receive_buf_common at ffffffff81516cf1
#12 [ffff88015834bd80] n_tty_receive_buf2 at ffffffff81517414
#13 [ffff88015834bd90] flush_to_ldisc at ffffffff8151ab6d
#14 [ffff88015834bdf0] process_one_work at ffffffff81069871
#15 [ffff88015834be40] worker_thread at ffffffff81069c53
#16 [ffff88015834bec0] kthread at ffffffff8106f429
#17 [ffff88015834bf50] ret_from_fork at ffffffff818d50c8

>
>> It was a NULL pointer dereference, the state->port.tty was NULL so when we go to
>> check tty->stopped in uart_tx_stopped() we panic.  Looking at the other CPU's we
>> were in the middle of uart_open(), and the core actually had a valid pointer in
>> state->port.tty, which points to a race between either close or hangup (the only
>> two places that set state->port.tty to NULL) and open.  Close already flushes
>> the ldisc but hangup does not, which means we could have some characters in the
>> receive buffer in between the hangup and the open, and we end up in this
>> situation.
>
> Yeah, the race is that the ldisc should not be attempting i/o to
> the driver at all. This problem is fixed in -next already, but in the
> tty core rather than in each individual tty driver.
>

Great!  Which patch/patches fix this?  I looked at linux-next and 
there's a lot of refactoring stuff, do I need all the things or is there 
a specific one that fixes this problem?  Thanks,

Josef