linux-kernel - Re: [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59EC7FF5.6070906@intel.com>
Date:   Sun, 22 Oct 2017 19:24:37 +0800
From:   Wei Wang <wei.w.wang@...el.com>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>, mst@...hat.com
CC:     mhocko@...nel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock

On 10/22/2017 01:20 PM, Tetsuo Handa wrote:
> Wei Wang wrote:
>> The balloon_lock was used to synchronize the access demand to elements
>> of struct virtio_balloon and its queue operations (please see commit
>> e22504296d). This prevents the concurrent run of the leak_balloon and
>> fill_balloon functions, thereby resulting in a deadlock issue on OOM:
>>
>> fill_balloon: take balloon_lock and wait for OOM to get some memory;
>> oom_notify: release some inflated memory via leak_balloon();
>> leak_balloon: wait for balloon_lock to be released by fill_balloon.
>>
>> This patch breaks the lock into two fine-grained inflate_lock and
>> deflate_lock, and eliminates the unnecessary use of the shared data
>> (i.e. vb->pnfs, vb->num_pfns). This enables leak_balloon and
>> fill_balloon to run concurrently and solves the deadlock issue.
>>
>> @@ -162,20 +160,20 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>>   			msleep(200);
>>   			break;
>>   		}
>> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>> -		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>> +		set_page_pfns(vb, pfns + num_pfns, page);
>>   		if (!virtio_has_feature(vb->vdev,
>>   					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>>   			adjust_managed_page_count(page, -1);
>>   	}
>>   
>> -	num_allocated_pages = vb->num_pfns;
>> +	mutex_lock(&vb->inflate_lock);
>>   	/* Did we get any? */
>> -	if (vb->num_pfns != 0)
>> -		tell_host(vb, vb->inflate_vq);
>> -	mutex_unlock(&vb->balloon_lock);
>> +	if (num_pfns != 0)
>> +		tell_host(vb, vb->inflate_vq, pfns, num_pfns);
>> +	mutex_unlock(&vb->inflate_lock);
>> +	atomic64_add(num_pfns, &vb->num_pages);
> Isn't this addition too late? If leak_balloon() is called due to
> out_of_memory(), it will fail to find up to dated vb->num_pages value.

Not really. I think the old way of implementation above:
"vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE"
isn't quite accurate, because "vb->num_page" should reflect the number of
pages that have already been inflated, which means those pages have
already been given to the host via "tell_host()".

If we update "vb->num_page" earlier before tell_host(), then it will 
include the pages
that haven't been given to the host, which I think shouldn't be counted 
as inflated pages.

On the other hand, OOM will use leak_balloon() to release the pages that 
should
have already been inflated.

In addition, I think we would also need to move balloon_page_insert(), 
which puts the
page onto the inflated page list, after tell_host().



>>   
>> -	return num_allocated_pages;
>> +	return num_pfns;
>>   }
>>   
>>   static void release_pages_balloon(struct virtio_balloon *vb,
>> @@ -194,38 +192,39 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>>   
>>   static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>>   {
>> -	unsigned num_freed_pages;
>>   	struct page *page;
>>   	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>>   	LIST_HEAD(pages);
>> +	unsigned int num_pfns;
>> +	__virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
> This array consumes 1024 bytes of kernel stack, doesn't it?
> leak_balloon() might be called from out_of_memory() where kernel stack
> is already largely consumed before entering __alloc_pages_nodemask().
> For reducing possibility of stack overflow, since out_of_memory() is
> serialized by oom_lock, I suggest using static (maybe kmalloc()ed as
> vb->oom_pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]) buffer when called from
> out_of_memory().

In that case, we might as well to use
vb->inflate_pfns = kmalloc(VIRTIO_BALLOON_ARRAY_PFNS_MAX..);
vb->deflate_pfns = kmalloc(VIRTIO_BALLOON_ARRAY_PFNS_MAX..);
which are allocated in probe().

>>   
>>   	/* We can only do one array worth at a time. */
>> -	num = min(num, ARRAY_SIZE(vb->pfns));
>> +	num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX);
>>   
>> -	mutex_lock(&vb->balloon_lock);
>>   	/* We can't release more pages than taken */
>> -	num = min(num, (size_t)vb->num_pages);
>> -	for (vb->num_pfns = 0; vb->num_pfns < num;
>> -	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
>> +	num = min_t(size_t, num, atomic64_read(&vb->num_pages));
>> +	for (num_pfns = 0; num_pfns < num;
>> +	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
>>   		page = balloon_page_dequeue(vb_dev_info);
> If balloon_page_dequeue() can be concurrently called by both host's request
> and guest's OOM event, is (!dequeued_page) test in balloon_page_dequeue() safe?


I'm not sure about the question. The "dequeue_page" is a local variable
in the function, why would it be unsafe for two invocations (the shared
b_dev_info->pages are operated under a lock)?



> Is such concurrency needed?

Thanks for this question, it triggers another optimization, which I want to
introduce if this direction could be accepted:

I think it is not quite necessary to deflate pages in OOM-->leak_balloon()
when the host request leak_ballon() is running. In that case, I think OOM
can just count the pages that are deflated by the host request.

The implementation logic will be simple, here is the major part:

1) Introduce a "vb->deflating" flag, to tell whether deflating is in 
progress

2) At the beginning of leak_balloon():
     if (READ_ONCE(vb->deflating)) {
            npages = atomic64_read(&vb->num_pages);
            /* Wait till the other run of leak_balloon() returns */
            while (READ_ONCE(vb->deflating));
            npages = npages - atomic64_read(&vb->num_pages)
     } else {
         WRITE_ONCE(vb->deflating, true);
     }
     ...

3) At the end of leak_balloon():
     WRITE_ONCE(vb->deflating, false);

(The above vb->deflating doesn't have to be in vb though, it can be a 
static variable inside leak_balloon(). we can
discuss more about the implementation when reaching that step)


Best,
Wei