linux-kernel - Re: [PATCH] xen/xenbus: fix self-deadlock after killing user process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191007132412.GA27773@dingwall.me.uk>
Date:   Mon, 7 Oct 2019 13:24:12 +0000
From:   James Dingwall <james@...gwall.me.uk>
To:     xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org
Cc:     Stefano Stabellini <sstabellini@...nel.org>,
        stable@...r.kernel.org, Juergen Gross <jgross@...e.com>
Subject: Re: [PATCH] xen/xenbus: fix self-deadlock after killing user process

On Tue, Oct 01, 2019 at 01:37:24PM -0400, Boris Ostrovsky wrote:
> On 10/1/19 11:03 AM, Juergen Gross wrote:
> > In case a user process using xenbus has open transactions and is killed
> > e.g. via ctrl-C the following cleanup of the allocated resources might
> > result in a deadlock due to trying to end a transaction in the xenbus
> > worker thread:
> >
> > [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
> > [ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
> > [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 2551.528585] xenbus          D    0    37      2 0x80000080
> > [ 2551.528590] Call Trace:
> > [ 2551.528603]  __schedule+0x2c0/0x870
> > [ 2551.528606]  ? _cond_resched+0x19/0x40
> > [ 2551.528632]  schedule+0x2c/0x70
> > [ 2551.528637]  xs_talkv+0x1ec/0x2b0
> > [ 2551.528642]  ? wait_woken+0x80/0x80
> > [ 2551.528645]  xs_single+0x53/0x80
> > [ 2551.528648]  xenbus_transaction_end+0x3b/0x70
> > [ 2551.528651]  xenbus_file_free+0x5a/0x160
> > [ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
> > [ 2551.528657]  xenbus_thread+0x7de/0x880
> > [ 2551.528660]  ? wait_woken+0x80/0x80
> > [ 2551.528665]  kthread+0x121/0x140
> > [ 2551.528667]  ? xb_read+0x1d0/0x1d0
> > [ 2551.528670]  ? kthread_park+0x90/0x90
> > [ 2551.528673]  ret_from_fork+0x35/0x40
> >
> > Fix this by doing the cleanup via a workqueue instead.
> >
> > Reported-by: James Dingwall <james@...gwall.me.uk>
> > Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
> > Cc: <stable@...r.kernel.org> # 4.11
> > Signed-off-by: Juergen Gross <jgross@...e.com>
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@...cle.com>
> 

Tested-by: James Dingwall <james@...gwall.me.uk>

This patch does resolve the observed issue although for my (extreme and 
not representative of our normal workload) test case the worker still 
gets blocked for some time if the xenstore-rm is interrupted and no 
concurrent xenstore commands can run.  I assume that the worker 
completes the rm and then does a rollback in the background rather than 
being interrupted early as a result of the userspace program being 
terminated.

Thanks,
James