lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8737y728r4.fsf@vitty.brq.redhat.com>
Date:	Mon, 21 Sep 2015 15:37:51 +0200
From:	Vitaly Kuznetsov <vkuznets@...hat.com>
To:	Olaf Hering <olaf@...fle.de>
Cc:	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Greg KH <gregkh@...uxfoundation.org>,
	linux-kernel@...r.kernel.org, devel@...uxdriverproject.org,
	apw@...onical.com, jasowang@...hat.com
Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state

Olaf Hering <olaf@...fle.de> writes:

> On Mon, Sep 21, Vitaly Kuznetsov wrote:
>
>> I'd like to see a trace from the hang, it is not obvious to me how it
>> happened and what caused it. (or if you have such hang scenario in your
>> head, can you please reveal it?)
>
> There is no trace. I think fcopy_respond_to_host notifies the host,
> which in turn triggers an interrupt right away which is processed while
> fcopy_on_msg is executing somewhere between the return from
> fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback.
>

I think it is fcopy_transaction.fcopy_context which gets out of sync.

When we're done processing some request we have the following code:

		fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
		fcopy_respond_to_host(*val);
		fcopy_transaction.state = HVUTIL_READY;
		hv_poll_channel(fcopy_transaction.fcopy_context,
				hv_fcopy_onchannelcallback);

If interrupt happens after we did fcopy_respond_to_host()
fcopy_transaction.state will still be HVUTIL_USERSPACE_RECV or even its
previous HVUTIL_USERSPACE_REQ but it's OK as we have the following in
hv_fcopy_onchannelcallback()

	if (fcopy_transaction.state > HVUTIL_READY) {
		/*
		 * We will defer processing this callback once
		 * the current transaction is complete.
		 */
		fcopy_transaction.fcopy_context = context;
		return;
	}

And we're supposed to process the work with hv_poll_channel(). The
problem is (I guess) that fcopy_transaction.fcopy_context gets out of
sync and it still has its previous value (possibly NULL). We call
hv_poll_channel() with NULL and everything gets stuck as we'll never
process the request.

AFAICS proper locking is requred here (and probably in all three
drivers), we need to protect not only .state but the whole transaction.

[...]

-- 
  Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ