linux-kernel - Re: xen-blkfront: weird behavior of "iostat" after VM live-migrate which xen-blkfront module has indirect descriptors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <54CB42BF.5050003@huawei.com>
Date:	Fri, 30 Jan 2015 16:37:19 +0800
From:	"Ouyang Zhaowei (Charles)" <ouyangzhaowei@...wei.com>
To:	Roger Pau Monné <roger.pau@...rix.com>
CC:	<linux-kernel@...r.kernel.org>, <weiping.ding@...wei.com>,
	xen-devel <xen-devel@...ts.xenproject.org>,
	David Vrabel <david.vrabel@...rix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: xen-blkfront: weird behavior of "iostat" after VM live-migrate
 which xen-blkfront module has indirect descriptors



On 2015.1.26 10:30, Ouyang Zhaowei (Charles) wrote:
> 
> On 2015.1.23 19:15, Roger Pau Monné wrote:
>> Hello,
>>
>> El 23/01/15 a les 8.59, Ouyang Zhaowei (Charles) ha escrit:
>>> Hi Roger,
>>>
>>> We are testing the indirect feature of xen-blkfront module these days.
>>> And we found that, after VM live-migrate a couple of times, the "%util" of iostat keeps being 100%, and there are several requests stock in "avgqu-sz".
>>> We have checked some later version of Linux, and it happens on Ubuntu 14.04, Ubuntu 14.10 and RHEL 7.0.
>>>
>>> The iostat shows like below:
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>            0.00    0.00    0.00    0.00    0.00  100.00
>>>
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>> xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     4.00    0.00    0.00    0.00   0.00 100.00
>>> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>>> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
>>>
>>> Could you tell us that why is this happening, is this a bug?
>>
>> It is a bug indeed, thanks for reporting it. The problem seems to be 
>> that blk_put_request (which is used to discard the old requests before 
>> requeuing them) doesn't update the queue statistics. The following 
>> patch solves the problem for me, could you try it and report back?

Hi Roger,

After near 1000 times migrate test, the "%util" of iostat did not become 100% anymore, seems like the patch fix this bug

Thanks

Ouyang Zhaowei

>>
>> ---
>> commit bb4317c051ca81a2906edb7ccc505cbd6d1d80c7
>> Author: Roger Pau Monne <roger.pau@...rix.com>
>> Date:   Fri Jan 23 12:10:51 2015 +0100
>>
>>     xen-blkfront: fix accounting of reqs when migrating
>>     
>>     Current migration code uses blk_put_request in order to finish a request
>>     before requeuing it. This function doesn't update the statistics of the
>>     queue, which completely screws accounting. Use blk_end_request_all instead
>>     which properly updates the statistics of the queue.
>>     
>>     Signed-off-by: Roger Pau Monné <roger.pau@...rix.com>
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 5ac312f..aac41c1 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1493,7 +1493,7 @@ static int blkif_recover(struct blkfront_info *info)
>>  		merge_bio.tail = copy[i].request->biotail;
>>  		bio_list_merge(&bio_list, &merge_bio);
>>  		copy[i].request->bio = NULL;
>> -		blk_put_request(copy[i].request);
>> +		blk_end_request_all(copy[i].request, 0);
>>  	}
>>  
>>  	kfree(copy);
>> @@ -1516,7 +1516,7 @@ static int blkif_recover(struct blkfront_info *info)
>>  		req->bio = NULL;
>>  		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA))
>>  			pr_alert("diskcache flush request found!\n");
>> -		__blk_put_request(info->rq, req);
>> +		__blk_end_request_all(req, 0);
>>  	}
>>  	spin_unlock_irq(&info->io_lock);
>>  
>>
> 
> Hi Roger,
> 
> Thanks for answering this question. Sure, I'll try this patch and test VM migrating, so far it seems this patch has solved this bug (after 10 times migrate).
> I'll keep testing it for more times and will let you know if it's OK.
> 
> Regards,
> Ouyang Zhaowei
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/