linux-kernel - Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120828175716.GA4595@redhat.com>
Date:	Tue, 28 Aug 2012 20:57:16 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Rafael Aquini <aquini@...hat.com>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org,
	Rusty Russell <rusty@...tcorp.com.au>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mel@....ul.ie>,
	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Minchan Kim <minchan@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to
 balloon pages

On Tue, Aug 28, 2012 at 02:37:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 28, 2012 at 06:54:10PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 27, 2012 at 04:47:13PM -0300, Rafael Aquini wrote:
> > > On Sun, Aug 26, 2012 at 10:42:44AM +0300, Michael S. Tsirkin wrote:
> > > > 
> > > > Reading two atomics and doing math? Result can even be negative.
> > > > I did not look at use closely but it looks suspicious.
> > > Doc on atomic_read says:
> > > "
> > > The read is atomic in that the return value is guaranteed to be one of the
> > > values initialized or modified with the interface operations if a proper
> > > implicit or explicit memory barrier is used after possible runtime
> > > initialization by any other thread and the value is modified only with the
> > > interface operations.
> > > "
> > > 
> > > There's no runtime init by other thread than balloon's itself at device register,
> > > and the operations (inc, dec) are made by the proper interface operations
> > > only when protected by the spinlock pages_lock. It does not look suspicious, IMHO.
> > 
> > Any use of multiple atomics is suspicious.
> > Please just avoid it if you can. What's wrong with locking?
> > 
> > > I'm failing to see how it could become a negative on that case, since you cannot
> > > isolate more pages than what was previoulsy inflated to balloon's list.
> > 
> > There is no order guarantee. So in
> > A - B you can read B long after both A and B has been incremented.
> > Maybe it is safe in this case but it needs careful documentation
> > to explain how ordering works. Much easier to keep it all simple.
> > 
> > > 
> > > > It's already the case everywhere except __wait_on_isolated_pages,
> > > > so just fix that, and then we can keep using int instead of atomics.
> > > > 
> > > Sorry, I quite didn't get you here. fix what?
> > 
> > It's in the text you removed above. Access values under lock.
> >
> 
> So, you prefer this way:
> 
> /*
>  * __wait_on_isolated_pages - check if leak_balloon() must wait on isolated
>  *                            pages before proceeding with the page release.
>  * @vb         : pointer to the struct virtio_balloon describing this device.
>  * @leak_target: how many pages we are attempting to release this round.
>  */
> static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
>                                             size_t leak_target)
> {
>         unsigned int num_pages, isolated_pages;
>         spin_lock(&vb->pages_lock);
>         num_pages = vb->num_pages;
>         isolated_pages = vb->num_isolated_pages;
>         spin_unlock(&vb->pages_lock);
>         /*
>          * If isolated pages are making our leak target bigger than the
>          * total pages that we can release this round. Let's wait for
>          * migration returning enough pages back to balloon's list.
>          */
>         wait_event(vb->config_change,
>                    (!isolated_pages ||
>                     leak_target <= (num_pages - isolated_pages)));

This logic looks strange too - it does not 100% match the comment.

> }
> 
> ?

Except that it does not work. You need to do the lock/unlock
dance and retest within wait_event.


> > >  
> > > > That's 1K on stack - and can become more if we increase
> > > > VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
> > > > we use vb->pfns.
> > > >
> > > If we want to use vb->pfns we'll have to make leak_balloon mutual exclusive with
> > > page migration (as it was before), but that will inevictably bring us back to
> > > the discussion on breaking the loop when isolated pages make leak_balloon find
> > > less pages than it wants to release at each leak round.
> > > 
> > 
> > I don't think this is an issue. The issue was busy waiting in that case.
> >
> But, in fact, it is. 
> As we couldn't drop the mutex that prevents migration from happening, otherwise
> the migration threads would screw up with our vb->pfns array, there will be no point
> on keep waiting for isolated pages being reinserted on balloon's list, cause the
> migration threads that will accomplish that task are also waiting on us dropping
> the mutex.
> 
> You may argue that we could flag virtballoon_migratepage() to give up and return
> before even trying to aquire the mutex, if a leak is ongoing -- deferring work
> to virtballoon_putbackpage(). However, I'm eager to think that for this case,
> the CPU time we spent isolating pages for compaction would be simply wasted and,
>  perhaps, no effective compaction was even reached.
> And that makes me think it would have been better to stick with the old logics of
> breaking the loop since leak_balloon(), originally, also remains busy waiting
> while pursuing its target, anyway.
> 
> That's the trade here, IMO. If one really wants to wait on potentially isolated
> pages getting back to the list before proceeding, we'll have to burn a little
> more stack space with local variables, unfortunately.


Sorry I do not understand what you are saying here. So find
a different locking strategy.

For example something like:

         wait_event(vb->config_change,
		({ 
		   lock
		   if (target <= (num_pages - isolated_pages))
			   leak balloon
		   cond = target <= (num_pages - isolated_pages));
		   unlock;
		   cond;
		})
		)

seems to have no issues?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/