linux-kernel - Re: [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160725121121.6ym4wjjgqyp7akgx@mac>
Date:	Mon, 25 Jul 2016 14:11:21 +0200
From:	Roger Pau Monné <roger.pau@...rix.com>
To:	Bob Liu <bob.liu@...cle.com>
CC:	<linux-kernel@...r.kernel.org>, <xen-devel@...ts.xenproject.org>,
	<konrad.wilk@...cle.com>, <jgross@...e.com>
Subject: Re: [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd
 resources

On Mon, Jul 25, 2016 at 07:08:36PM +0800, Bob Liu wrote:
> 
> On 07/25/2016 06:53 PM, Roger Pau Monné wrote:
> ..[snip..]
> >>>>  * We get the device lock and blk_mq_freeze_queue() in dynamic_reconfig_device(),
> >>>>    and then have to release in blkif_recover() asynchronously which makes the code more difficult to readable.
> >>>
> >>> I'm not sure I'm following here, do you mean that you will pick the lock in 
> >>> dynamic_reconfig_device and release it in blkif_recover? Why wouldn't you 
> >>
> >> Yes.
> >>
> >>> release the lock in dynamic_reconfig_device after doing whatever is needed?
> >>>
> >>
> >> Both 'dynamic configuration' and migration:xenbus_dev_resume() use { blkfront_resume(); blkif_recover() } and depends on the change of xbdev->state.
> >> If they happen simultaneously, the State machine of xbdev->state is going to be a mess and very difficult to track.
> >>
> >> The lock(actually a mutex) is like a big lock to make sure no race would happen at all.
> > 
> > So the only thing that you should do is set the frontend state to closed and 
> > wait for the backend to also switch to state closed, and then switch the
> > frontend state to init again in order to trigger a reconnection.
> > 
> 
> But if migration:xenbus_dev_resume() is triggered at the same time, any state be set might be changed.
> =====
> E.g
> Dynamic_reconfig_device:                                Migration:xenbus_dev_resume()
> 
> Set the frontend state to closed       
>                  
> 							^^^^
> 							frontend state will be reset to XenbusStateInitialising by xenbus_dev_resume()
> 
> Wait for the backend to also switch to state closed

This is not really a race, the backend will not switch to state closed, and 
instead will appear at state init again, which is what we wanted anyway in 
order to reconnect, so I don't see an issue with this flow.

> =====
> Similar situation may happen at any other place regarding set the state.

Right, but I don't see how your proposed patch also avoids this issues. I 
see that you pick the device lock in dynamic_reconfig_device, but I don't 
see xenbus_dev_resume picking the lock at all, and in any case I don't think 
you should prevent the VM from migrating.

Depending on the toolstack it might decide to kill the VM because it has not 
been able to migrate it, in which case the result is not better.

Roger.