[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b95d9a22-727e-f919-ae3b-6567f7ba5543@suse.com>
Date: Tue, 1 Oct 2019 13:33:36 +0200
From: Jürgen Groß <jgross@...e.com>
To: James Dingwall <james@...gwall.me.uk>, linux-kernel@...r.kernel.org
Cc: Stefano Stabellini <sstabellini@...nel.org>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: xenbus hang after userspace ctrl-c of xenstore-rm
On 01.10.19 11:57, James Dingwall wrote:
> Hi,
>
> I have been investigating a problem where xenstore becomes unresponsive
> during domain shutdowns. My test script seems to trigger the problem
> but without definitively being the same. It is possible to replicate
> the issue in dom0 or a domU. If the test script is run in dom0 it seems
> that it is possible to affect xenstore access in domUs but I have not
> observed any negative impact in dom0 or other guests when running in a
> domU.
>
> The environment is a default Ubuntu 5.0.0-29-generic kernel, xen
> 4.11.3-pre (built from current head of staging-4.11), xenstore is
> running in a stubdom. I did try a kernel with
> d10e0cc113c9e1b64b5c6e3db37b5c839794f3df "xenbus: Avoid deadlock during
> suspend due to open transactions" but that didn't help, this stack trace
> is with that patch applied.
>
> [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
> [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5
> [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2551.528585] xenbus D 0 37 2 0x80000080
> [ 2551.528590] Call Trace:
> [ 2551.528603] __schedule+0x2c0/0x870
> [ 2551.528606] ? _cond_resched+0x19/0x40
> [ 2551.528632] schedule+0x2c/0x70
> [ 2551.528637] xs_talkv+0x1ec/0x2b0
> [ 2551.528642] ? wait_woken+0x80/0x80
> [ 2551.528645] xs_single+0x53/0x80
> [ 2551.528648] xenbus_transaction_end+0x3b/0x70
> [ 2551.528651] xenbus_file_free+0x5a/0x160
> [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220
> [ 2551.528657] xenbus_thread+0x7de/0x880
> [ 2551.528660] ? wait_woken+0x80/0x80
> [ 2551.528665] kthread+0x121/0x140
> [ 2551.528667] ? xb_read+0x1d0/0x1d0
> [ 2551.528670] ? kthread_park+0x90/0x90
> [ 2551.528673] ret_from_fork+0x35/0x40
Yes, this is a self-deadlock when cleaning up a user's file context.
Thanks for the nice debug data. :-)
I need to do the cleanup via a workqueue instead of calling it directly.
Cooking up a patch now...
Juergen
Powered by blists - more mailing lists