[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E761A2B.1020509@bootc.net>
Date: Sun, 18 Sep 2011 17:19:55 +0100
From: Chris Boot <bootc@...tc.net>
To: James Bottomley <James.Bottomley@...senPartnership.com>
CC: "Woodhouse, David" <david.woodhouse@...el.com>,
adam radford <aradford@...il.com>,
lkml <linux-kernel@...r.kernel.org>,
Adam Radford <linuxraid@....com>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: iommu_iova leak [inside 3w-9xxx]
On 18/09/2011 15:56, James Bottomley wrote:
> On Sun, 2011-09-18 at 18:25 +0400, James Bottomley wrote:
>> On Sun, 2011-09-18 at 15:05 +0100, Chris Boot wrote:
>>>> Hardly ... all it's saying is that twa_exit doesn't wait for pending I/O
>>>> to complete, so when you remove the module it tears down in the middle
>>>> of an I/O. A bug, yes, but it's not indicative of any sort of leak in
>>>> the maps/unmaps.
>>>
>>> James,
>>>
>>> I don't think that's the case - I had unmounted all filesystems, deactivated all volume groups, and performed a sync before waiting a few seconds and running rmmod. Next time I'll also 'echo 1> /sys/block/sdX/device/delete' if that's helpful.
>> Actually, I take all that back: the driver has a bug in QUEUE_FULL
>> handling: twa_scsi_queue() calls twa_scsiop_execute_scsi(), which maps
>> the dma buffer, but if the card responds QUEUE_FULL it just returns
>> SCSI_MLQUEUE_HOST_BUSY without ever unmapping. That leg in the code
>> frees the request but also doesn't unmap it. In fact any error return
>> from twa_scsiop_execute_scsi() seems to have the same problem (but
>> QUEUE_FULL is the only silent one).
>>
>> I trust Adam will fix this.
> Actually, while Adam's mulling this, try the following. It should at
> least confirm we're on the right track.
>
> James
>
> ---
>
> diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
> index b7bd5b0..3868ab2 100644
> --- a/drivers/scsi/3w-9xxx.c
> +++ b/drivers/scsi/3w-9xxx.c
> @@ -1800,10 +1800,12 @@ static int twa_scsi_queue_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_
> switch (retval) {
> case SCSI_MLQUEUE_HOST_BUSY:
> twa_free_request_id(tw_dev, request_id);
> + twa_unmap_scsi_data(tw_dev, request_id);
> break;
> case 1:
> tw_dev->state[request_id] = TW_S_COMPLETED;
> twa_free_request_id(tw_dev, request_id);
> + twa_unmap_scsi_data(tw_dev, request_id);
> SCpnt->result = (DID_ERROR<< 16);
> done(SCpnt);
> retval = 0;
James,
After testing quite thoroughly on my test install (lots of parallel
e2fsck and an rsync from one LV to another) I couldn't trigger the
warning. It's probably too early to tell for sure if this fixes things
on my normal workload (I'll be able to tell definitely tomorrow), but it
certainly seems much better right now. I'll keep you posted.
Thanks,
Chris
--
Chris Boot
bootc@...tc.net
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists