linux-kernel - Re: [PATCH RESEND] ttyprintk: fix a potential sleeping in interrupt context issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFH1YnN4_BSP8OywYpLfBHnRRThpk27PEmfYe4baaOkO6FQaNA@mail.gmail.com>
Date:   Mon, 6 Jan 2020 10:44:42 +0800
From:   Zhenzhong Duan <zhenzhong.duan@...il.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH RESEND] ttyprintk: fix a potential sleeping in interrupt
 context issue

On Fri, Jan 3, 2020 at 4:43 PM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
>
> On Fri, Jan 03, 2020 at 11:45:41AM +0800, Zhenzhong Duan wrote:
> > Google syzbot reports:
> > BUG: sleeping function called from invalid context at
> > kernel/locking/mutex.c:938
> > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1
> > 1 lock held by swapper/1/0:
> > ...
> > Call Trace:
> >   <IRQ>
> >   dump_stack+0x197/0x210
> >   ___might_sleep.cold+0x1fb/0x23e
> >   __might_sleep+0x95/0x190
> >   __mutex_lock+0xc5/0x13c0
> >   mutex_lock_nested+0x16/0x20
> >   tpk_write+0x5d/0x340
> >   resync_tnc+0x1b6/0x320
> >   call_timer_fn+0x1ac/0x780
> >   run_timer_softirq+0x6c3/0x1790
> >   __do_softirq+0x262/0x98c
> >   irq_exit+0x19b/0x1e0
> >   smp_apic_timer_interrupt+0x1a3/0x610
> >   apic_timer_interrupt+0xf/0x20
> >   </IRQ>
> >
> > Fix it by using spinlock in process context instead of mutex and having
> > interrupt disabled in critical section.
> >
> > Signed-off-by: Zhenzhong Duan <zhenzhong.duan@...il.com>
> > Cc: Arnd Bergmann <arnd@...db.de>
> > Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > ---
> >  drivers/char/ttyprintk.c | 15 +++++++++------
> >  1 file changed, 9 insertions(+), 6 deletions(-)
>
> Why was this resent?  What differs from the first version that required
> it to be resent?
>
> Always give us a clue here please :)
Sorry, I should have done that.
patch-bot told me my last version is malformed(tabs converted to
spaces) which may be due to I used gmail web browser to send patch.
Now I have direct access to smtp.gmail.com and use 'git send-email',
so that's not an issue now. No functional changes compared to last
version.

>
>
> >
> > diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
> > index 4f24e46ebe7c..56db949a7b70 100644
> > --- a/drivers/char/ttyprintk.c
> > +++ b/drivers/char/ttyprintk.c
> > @@ -15,10 +15,11 @@
> >  #include <linux/serial.h>
> >  #include <linux/tty.h>
> >  #include <linux/module.h>
> > +#include <linux/spinlock.h>
> >
> >  struct ttyprintk_port {
> >       struct tty_port port;
> > -     struct mutex port_write_mutex;
> > +     spinlock_t spinlock;
> >  };
> >
> >  static struct ttyprintk_port tpk_port;
> > @@ -99,11 +100,12 @@ static int tpk_open(struct tty_struct *tty, struct file *filp)
> >  static void tpk_close(struct tty_struct *tty, struct file *filp)
> >  {
> >       struct ttyprintk_port *tpkp = tty->driver_data;
> > +     unsigned long flags;
> >
> > -     mutex_lock(&tpkp->port_write_mutex);
> > +     spin_lock_irqsave(&tpkp->spinlock, flags);
> >       /* flush tpk_printk buffer */
> >       tpk_printk(NULL, 0);
>
> Are you sure you can call this with a spinlock held?
I think so.

>
> Doesn't your trace above show the opposite?
That's why I use spin_lock_irqsave() variants rather than spin_lock()

The issue here is tpk_write()/tpk_close() could be interrupted when
holding a mutex, then in timer handler tpk_write() is called again
trying to acquire same mutex, lead to dead lock.

With spin_lock_irqsave(), interrupt is disabled in process context, so
no such issue.

>
> What is wrong with sleeping during the mutex you currently have?  How is
> syzbot reporting this error, is there a reproducer somewhere?
See https://syzkaller.appspot.com/bug?extid=2eeef62ee31f9460ad65

I didn't reproduce the dead lock locally, not even for the warning
syzbot reported, but syzbot does.

Thanks
Zhenzhong