linux-kernel - Re: race between flush_to_ldisc and pty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190201142642.GB3211@kroah.com>
Date:   Fri, 1 Feb 2019 15:26:42 +0100
From:   "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
To:     Maninder Singh <maninder1.s@...sung.com>
Cc:     "peter@...leysoftware.com" <peter@...leysoftware.com>,
        "jslaby@...e.com" <jslaby@...e.com>,
        "keun-o.park@...kmatter.ae" <keun-o.park@...kmatter.ae>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        AMIT SAHRAWAT <a.sahrawat@...sung.com>,
        Vaneet Narang <v.narang@...sung.com>,
        Rohit Thapliyal <r.thapliyal@...sung.com>,
        Ayush Mittal <ayush.m@...sung.com>
Subject: Re: race between flush_to_ldisc and pty_cleanup

On Fri, Feb 01, 2019 at 07:03:26PM +0530, Maninder Singh wrote:
> Hi,
> 
> 
> There is some race condition between tty_port_put and flush_to_ldisc
> which lead to use after free case:
> (Kernel 4.1)
> 
> [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83
> ...
> ...
> ...
> 
> [1403.5132] [<c036a65c>] (ldsem_down_read_trylock) from [<c0367db4>] (tty_ldisc_ref+0x24/0x60)
> [1403.5132] [<c0367d90>] (tty_ldisc_ref) from [<c03690cc>] (flush_to_ldisc+0x6c/0x21c)
> [1403.5132]  r5:dbcd4a84 r4:00000000
> [1403.5132] [<c0369060>] (flush_to_ldisc) from [<c0049f18>] (process_one_work+0x214/0x570)
> [1403.5132]  r10:00000000 r9:ddab0000 r8:e3d6e000 r7:00000000 r6:e453f740 r5:cb37b780
> [1403.5132]  r4:dbcd4a84
> [1403.5132] [<c0049d04>] (process_one_work) from [<c004a2d4>] (worker_thread+0x60/0x580)
> [1403.5132]  r10:e453f740 r9:ddab0000 r8:e453f764 r7:00000088 r6:e453f740 r5:cb37b798
> [1403.5132]  r4:cb37b780
> [1403.5132] [<c004a274>] (worker_thread) from [<c005022c>] (kthread+0xec/0x104)
> [1403.5132]  r10:00000000 r9:00000000 r8:00000000 r7:c004a274 r6:cb37b780 r5:d8a3fc80
> [1403.5132]  r4:00000000
> [1403.5132] [<c0050140>] (kthread) from [<c00107d8>] (ret_from_fork+0x14/0x3c)
> 
> 
> for checking further we entered some debug prints and added delay in flush_to_ldisc to reproduce
> and seems there is some issue with workqueue implementation of TTY:
> 
> bool tty_buffer_cancel_work(struct tty_port *port)
> {
>         bool ret;
>         ret = cancel_work_sync(&port->buf.work); // Check return value of cancel_work_sync
>         pr_emerg("Work cancelled is 0x%x %pS %d\n", (unsigned int)&port->buf.work, (void *)_RET_IP_, ret); 
>         return ret;
> }
> 
> static void flush_to_ldisc(struct work_struct *work)
> {
> ...
>         mdelay(100);   // Added Delay to reproduce race
> 
>         if (flag_work_cancel) {
>                 pr_emerg("scheduled work after stopping work %x\n", (unsigned int)work);
> 
> ....
> }
> 
> static void pty_cleanup(struct tty_struct *tty)
> {
> ...
>         flag_work_cancel = 1;
> 		...
> }
> 
> 
> [1403.4158]Work cancelled is dbcd4a84 tty_port_destroy+0x1c/0x6c  0       // Since return is 0 so no work is pending 
> 
> [1403.5129] scheduled work after stopping work dbcd4a84    // Still same work is scheduled after cancelled
> [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83   // Kernel OOPs occured because of use after free

Ok, after my initial "use a newer kernel" comment, this really does look
strange.  There has also been a lot of workqueue fixes and rework since
4.1, and that might be the thing that fixes this issue here.

However, are you sure you are not just calling flush_to_ldisc() directly
through some codepath somehow?  If you look at the stack in the
pr_emerg() message, where did it come from?  From the same workqueue
that you already stopped?

Testing on a newer kernel would be great, if possible.

thanks,

greg k-h