linux-kernel - Re: [PATCH v2] xen/xenbus: make xs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2deac9ce-0c27-a472-7d51-b91a640d92ed@citrix.com>
Date:   Tue, 15 Dec 2020 20:59:32 +0000
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Juergen Gross <jgross@...e.com>, <xen-devel@...ts.xenproject.org>,
        <linux-kernel@...r.kernel.org>
CC:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Stefano Stabellini <sstabellini@...nel.org>
Subject: Re: [PATCH v2] xen/xenbus: make xs_talkv() interruptible

On 15/12/2020 11:10, Juergen Gross wrote:
> In case a process waits for any Xenstore action in the xenbus driver
> it should be interruptible by signals.
>
> Signed-off-by: Juergen Gross <jgross@...e.com>
> ---
> V2:
> - don't special case SIGKILL as libxenstore is handling -EINTR fine
> ---
>  drivers/xen/xenbus/xenbus_xs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
> index 3a06eb699f33..17c8f8a155fd 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -205,8 +205,15 @@ static bool test_reply(struct xb_req_data *req)
>  
>  static void *read_reply(struct xb_req_data *req)
>  {
> +	int ret;
> +
>  	do {
> -		wait_event(req->wq, test_reply(req));
> +		ret = wait_event_interruptible(req->wq, test_reply(req));
> +
> +		if (ret == -ERESTARTSYS && signal_pending(current)) {
> +			req->msg.type = XS_ERROR;
> +			return ERR_PTR(-EINTR);
> +		}

So now I can talk fully about the situations which lead to this, I think
there is a bit more complexity.

It turns out there are a number of issues related to running a Xen
system with no xenstored.

1) If a xenstore-write occurs during startup before init-xenstore-domain
runs, the former blocks on /dev/xen/xenbus waiting for xenstored to
reply, while the latter blocks on /dev/xen/xenbus_backend when trying to
tell the dom0 kernel that xenstored is in dom1.  This effectively
deadlocks the system.

2) If xenstore-watch is running when xenstored dies, it spins at 100%
cpu usage making no system calls at all.  This is caused by bad error
handling from xs_watch(), and attempting to debug found:

3) (this issue).  If anyone starts xenstore-watch with no xenstored
running at all, it blocks in D in the kernel.

The cause is the special handling for watch/unwatch commands which,
instead of just queuing up the data for xenstore, explicitly waits for
an OK for registering the watch.  This causes a write() system call to
block waiting for a non-existent entity to reply.

So while this patch does resolve the major usability issue I found (I
can't even SIGINT and get my terminal back), I think there are issues.

The reason why XS_WATCH/XS_UNWATCH are special cased is because they do
require special handling.  The main kernel thread for processing
incoming data from xenstored does need to know how to associate each
async XS_WATCH_EVENT to the caller who watched the path.

Therefore, depending on when this cancellation hits, we might be in any
of the following states:

1) the watch is queued in the kernel, but not even sent to xenstored yet
2) the watch is queued in the xenstored ring, but not acted upon
3) the watch is queued in the xenstored ring, and the xenstored has seen
it but not replied yet
4) the watch has been processed, but the XS_WATCH reply hasn't been
received yet
5) the watch has been processed, and the XS_WATCH reply received

State 5 (and a little bit) is the normal success path when xenstored has
acted upon the request, and the internal kernel infrastructure is set up
appropriately to handle XS_WATCH_EVENTs.

States 1 and 2 can be very common if there is no xenstored (or at least,
it hasn't started up yet).  In reality, there is either no xenstored, or
it is up and running (and for a period of time during system startup,
these cases occur in sequence).

As soon as the XS_WATCH event has been written into the xenstored ring,
it is not safe to cancel.  You've committed to xenstored processing the
request (if it is up).

If xenstored is actually up and running, its fine and necessary to
block.  The request will be processed in due course (timing subject to
the client and server load).  If xenstored isn't up, blocking isn't ok.

Therefore, I think we need to distinguish "not yet on the ring" from "on
the ring", as our distinction as to whether cancelling is safe, and
ensure we don't queue anything on the ring before we're sure xenstored
has started up.

Does this make sense?

~Andrew