[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E7760AA.3050004@bootc.net>
Date: Mon, 19 Sep 2011 16:32:58 +0100
From: Chris Boot <bootc@...tc.net>
To: James Bottomley <James.Bottomley@...senPartnership.com>
CC: "Woodhouse, David" <david.woodhouse@...el.com>,
adam radford <aradford@...il.com>,
lkml <linux-kernel@...r.kernel.org>,
Adam Radford <linuxraid@....com>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: iommu_iova leak [inside 3w-9xxx]
On 18/09/2011 17:19, Chris Boot wrote:
> On 18/09/2011 15:56, James Bottomley wrote:
>> On Sun, 2011-09-18 at 18:25 +0400, James Bottomley wrote:
>>> On Sun, 2011-09-18 at 15:05 +0100, Chris Boot wrote:
>>>>> Hardly ... all it's saying is that twa_exit doesn't wait for
>>>>> pending I/O
>>>>> to complete, so when you remove the module it tears down in the
>>>>> middle
>>>>> of an I/O. A bug, yes, but it's not indicative of any sort of
>>>>> leak in
>>>>> the maps/unmaps.
>>>>
>>>> James,
>>>>
>>>> I don't think that's the case - I had unmounted all filesystems,
>>>> deactivated all volume groups, and performed a sync before waiting
>>>> a few seconds and running rmmod. Next time I'll also 'echo 1>
>>>> /sys/block/sdX/device/delete' if that's helpful.
>>> Actually, I take all that back: the driver has a bug in QUEUE_FULL
>>> handling: twa_scsi_queue() calls twa_scsiop_execute_scsi(), which maps
>>> the dma buffer, but if the card responds QUEUE_FULL it just returns
>>> SCSI_MLQUEUE_HOST_BUSY without ever unmapping. That leg in the code
>>> frees the request but also doesn't unmap it. In fact any error return
>>> from twa_scsiop_execute_scsi() seems to have the same problem (but
>>> QUEUE_FULL is the only silent one).
>>>
>>> I trust Adam will fix this.
>> Actually, while Adam's mulling this, try the following. It should at
>> least confirm we're on the right track.
>>
>> James
>>
>> ---
>>
>> diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
>> index b7bd5b0..3868ab2 100644
>> --- a/drivers/scsi/3w-9xxx.c
>> +++ b/drivers/scsi/3w-9xxx.c
>> @@ -1800,10 +1800,12 @@ static int twa_scsi_queue_lck(struct
>> scsi_cmnd *SCpnt, void (*done)(struct scsi_
>> switch (retval) {
>> case SCSI_MLQUEUE_HOST_BUSY:
>> twa_free_request_id(tw_dev, request_id);
>> + twa_unmap_scsi_data(tw_dev, request_id);
>> break;
>> case 1:
>> tw_dev->state[request_id] = TW_S_COMPLETED;
>> twa_free_request_id(tw_dev, request_id);
>> + twa_unmap_scsi_data(tw_dev, request_id);
>> SCpnt->result = (DID_ERROR<< 16);
>> done(SCpnt);
>> retval = 0;
>
> James,
>
> After testing quite thoroughly on my test install (lots of parallel
> e2fsck and an rsync from one LV to another) I couldn't trigger the
> warning. It's probably too early to tell for sure if this fixes things
> on my normal workload (I'll be able to tell definitely tomorrow), but
> it certainly seems much better right now. I'll keep you posted.
The patch above certainly seems to fix things for me. After almost 24
hours uptime my iommu_iova looks entirely reasonable:
iommu_iova 592 885 64 59 1 : tunables 120 60
8 : slabdata 15 15 60
We'll see in a few days' time if this resolves my entire performance
problem, but I'm nearly 100% confident it does.
Thanks.
Chris
--
Chris Boot
bootc@...tc.net
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists