lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 1 Oct 2019 13:33:36 +0200
From:   Jürgen Groß <jgross@...e.com>
To:     James Dingwall <james@...gwall.me.uk>, linux-kernel@...r.kernel.org
Cc:     Stefano Stabellini <sstabellini@...nel.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: xenbus hang after userspace ctrl-c of xenstore-rm

On 01.10.19 11:57, James Dingwall wrote:
> Hi,
> 
> I have been investigating a problem where xenstore becomes unresponsive
> during domain shutdowns.  My test script seems to trigger the problem
> but without definitively being the same.  It is possible to replicate
> the issue in dom0 or a domU.  If the test script is run in dom0 it seems
> that it is possible to affect xenstore access in domUs but I have not
> observed any negative impact in dom0 or other guests when running in a
> domU.
> 
> The environment is a default Ubuntu 5.0.0-29-generic kernel, xen
> 4.11.3-pre (built from current head of staging-4.11), xenstore is
> running in a stubdom.  I did try a kernel with
> d10e0cc113c9e1b64b5c6e3db37b5c839794f3df "xenbus: Avoid deadlock during
> suspend due to open transactions" but that didn't help, this stack trace
> is with that patch applied.
> 
> [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
> [ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
> [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2551.528585] xenbus          D    0    37      2 0x80000080
> [ 2551.528590] Call Trace:
> [ 2551.528603]  __schedule+0x2c0/0x870
> [ 2551.528606]  ? _cond_resched+0x19/0x40
> [ 2551.528632]  schedule+0x2c/0x70
> [ 2551.528637]  xs_talkv+0x1ec/0x2b0
> [ 2551.528642]  ? wait_woken+0x80/0x80
> [ 2551.528645]  xs_single+0x53/0x80
> [ 2551.528648]  xenbus_transaction_end+0x3b/0x70
> [ 2551.528651]  xenbus_file_free+0x5a/0x160
> [ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
> [ 2551.528657]  xenbus_thread+0x7de/0x880
> [ 2551.528660]  ? wait_woken+0x80/0x80
> [ 2551.528665]  kthread+0x121/0x140
> [ 2551.528667]  ? xb_read+0x1d0/0x1d0
> [ 2551.528670]  ? kthread_park+0x90/0x90
> [ 2551.528673]  ret_from_fork+0x35/0x40

Yes, this is a self-deadlock when cleaning up a user's file context.
Thanks for the nice debug data. :-)

I need to do the cleanup via a workqueue instead of calling it directly.

Cooking up a patch now...


Juergen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ