linux-kernel - Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to balloon pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120828173713.GA1750@t510.redhat.com>
Date:	Tue, 28 Aug 2012 14:37:13 -0300
From:	Rafael Aquini <aquini@...hat.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org,
	Rusty Russell <rusty@...tcorp.com.au>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mel@....ul.ie>,
	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Minchan Kim <minchan@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH v9 3/5] virtio_balloon: introduce migration primitives to
 balloon pages

On Tue, Aug 28, 2012 at 06:54:10PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 27, 2012 at 04:47:13PM -0300, Rafael Aquini wrote:
> > On Sun, Aug 26, 2012 at 10:42:44AM +0300, Michael S. Tsirkin wrote:
> > > 
> > > Reading two atomics and doing math? Result can even be negative.
> > > I did not look at use closely but it looks suspicious.
> > Doc on atomic_read says:
> > "
> > The read is atomic in that the return value is guaranteed to be one of the
> > values initialized or modified with the interface operations if a proper
> > implicit or explicit memory barrier is used after possible runtime
> > initialization by any other thread and the value is modified only with the
> > interface operations.
> > "
> > 
> > There's no runtime init by other thread than balloon's itself at device register,
> > and the operations (inc, dec) are made by the proper interface operations
> > only when protected by the spinlock pages_lock. It does not look suspicious, IMHO.
> 
> Any use of multiple atomics is suspicious.
> Please just avoid it if you can. What's wrong with locking?
> 
> > I'm failing to see how it could become a negative on that case, since you cannot
> > isolate more pages than what was previoulsy inflated to balloon's list.
> 
> There is no order guarantee. So in
> A - B you can read B long after both A and B has been incremented.
> Maybe it is safe in this case but it needs careful documentation
> to explain how ordering works. Much easier to keep it all simple.
> 
> > 
> > > It's already the case everywhere except __wait_on_isolated_pages,
> > > so just fix that, and then we can keep using int instead of atomics.
> > > 
> > Sorry, I quite didn't get you here. fix what?
> 
> It's in the text you removed above. Access values under lock.
>

So, you prefer this way:

/*
 * __wait_on_isolated_pages - check if leak_balloon() must wait on isolated
 *                            pages before proceeding with the page release.
 * @vb         : pointer to the struct virtio_balloon describing this device.
 * @leak_target: how many pages we are attempting to release this round.
 */
static inline void __wait_on_isolated_pages(struct virtio_balloon *vb,
                                            size_t leak_target)
{
        unsigned int num_pages, isolated_pages;
        spin_lock(&vb->pages_lock);
        num_pages = vb->num_pages;
        isolated_pages = vb->num_isolated_pages;
        spin_unlock(&vb->pages_lock);
        /*
         * If isolated pages are making our leak target bigger than the
         * total pages that we can release this round. Let's wait for
         * migration returning enough pages back to balloon's list.
         */
        wait_event(vb->config_change,
                   (!isolated_pages ||
                    leak_target <= (num_pages - isolated_pages)));
}

?

> >  
> > > That's 1K on stack - and can become more if we increase
> > > VIRTIO_BALLOON_ARRAY_PFNS_MAX.  Probably too much - this is the reason
> > > we use vb->pfns.
> > >
> > If we want to use vb->pfns we'll have to make leak_balloon mutual exclusive with
> > page migration (as it was before), but that will inevictably bring us back to
> > the discussion on breaking the loop when isolated pages make leak_balloon find
> > less pages than it wants to release at each leak round.
> > 
> 
> I don't think this is an issue. The issue was busy waiting in that case.
>
But, in fact, it is. 
As we couldn't drop the mutex that prevents migration from happening, otherwise
the migration threads would screw up with our vb->pfns array, there will be no point
on keep waiting for isolated pages being reinserted on balloon's list, cause the
migration threads that will accomplish that task are also waiting on us dropping
the mutex.

You may argue that we could flag virtballoon_migratepage() to give up and return
before even trying to aquire the mutex, if a leak is ongoing -- deferring work
to virtballoon_putbackpage(). However, I'm eager to think that for this case,
the CPU time we spent isolating pages for compaction would be simply wasted and,
 perhaps, no effective compaction was even reached.
And that makes me think it would have been better to stick with the old logics of
breaking the loop since leak_balloon(), originally, also remains busy waiting
while pursuing its target, anyway.

That's the trade here, IMO. If one really wants to wait on potentially isolated
pages getting back to the list before proceeding, we'll have to burn a little
more stack space with local variables, unfortunately.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/