linux-kernel - Re: xenbus hang after userspace ctrl-c of xenstore-rm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b95d9a22-727e-f919-ae3b-6567f7ba5543@suse.com>
Date:   Tue, 1 Oct 2019 13:33:36 +0200
From:   Jürgen Groß <jgross@...e.com>
To:     James Dingwall <james@...gwall.me.uk>, linux-kernel@...r.kernel.org
Cc:     Stefano Stabellini <sstabellini@...nel.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: xenbus hang after userspace ctrl-c of xenstore-rm

On 01.10.19 11:57, James Dingwall wrote:
> Hi,
> 
> I have been investigating a problem where xenstore becomes unresponsive
> during domain shutdowns.  My test script seems to trigger the problem
> but without definitively being the same.  It is possible to replicate
> the issue in dom0 or a domU.  If the test script is run in dom0 it seems
> that it is possible to affect xenstore access in domUs but I have not
> observed any negative impact in dom0 or other guests when running in a
> domU.
> 
> The environment is a default Ubuntu 5.0.0-29-generic kernel, xen
> 4.11.3-pre (built from current head of staging-4.11), xenstore is
> running in a stubdom.  I did try a kernel with
> d10e0cc113c9e1b64b5c6e3db37b5c839794f3df "xenbus: Avoid deadlock during
> suspend due to open transactions" but that didn't help, this stack trace
> is with that patch applied.
> 
> [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
> [ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
> [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2551.528585] xenbus          D    0    37      2 0x80000080
> [ 2551.528590] Call Trace:
> [ 2551.528603]  __schedule+0x2c0/0x870
> [ 2551.528606]  ? _cond_resched+0x19/0x40
> [ 2551.528632]  schedule+0x2c/0x70
> [ 2551.528637]  xs_talkv+0x1ec/0x2b0
> [ 2551.528642]  ? wait_woken+0x80/0x80
> [ 2551.528645]  xs_single+0x53/0x80
> [ 2551.528648]  xenbus_transaction_end+0x3b/0x70
> [ 2551.528651]  xenbus_file_free+0x5a/0x160
> [ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
> [ 2551.528657]  xenbus_thread+0x7de/0x880
> [ 2551.528660]  ? wait_woken+0x80/0x80
> [ 2551.528665]  kthread+0x121/0x140
> [ 2551.528667]  ? xb_read+0x1d0/0x1d0
> [ 2551.528670]  ? kthread_park+0x90/0x90
> [ 2551.528673]  ret_from_fork+0x35/0x40

Yes, this is a self-deadlock when cleaning up a user's file context.
Thanks for the nice debug data. :-)

I need to do the cleanup via a workqueue instead of calling it directly.

Cooking up a patch now...


Juergen