lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49iq62lak7.fsf@segfault.boston.devel.redhat.com>
Date:	Tue, 01 Jun 2010 17:14:48 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Sergey Temerkhanov <temerkhanov@...dex.ru>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-aio" <linux-aio@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Benjamin LaHaise <bcrl@...ck.org>,
	Zach Brown <zach.brown@...cle.com>,
	Suparna Bhattacharya <suparna@...ibm.com>
Subject: Re: [PATCH][RFC] AIO: always reinitialize iocb->ki_run_list at the end of aio_run_iocb()

Sergey Temerkhanov <temerkhanov@...dex.ru> writes:

> On Wednesday 26 May 2010 23:38:35 Jeff Moyer wrote:
> ...
>> I can vaguely recall discussion surrounding the reference counting of
>> cancel methods, but I have no idea what the actual contents of those
>> discussions were.  Sorry, my memory has failed me.  Either Zach or
>> Suparna might remember better.
>> 
>> Sergey, the cancellation path, unfortunately, is not well exercised as
>> I'm sure you are aware.  As you pointed out, the only implementation of
>> a cancel method is the usb gadget interface.  Now, given that they've
>> worked fine with the extra put in their cancel method, I'm not sure why
>> you can't do the same. 
> Well, in fact, they have only one aio_put_req() in their cancel method. This 
> is the code from 2.6.34:

I was referring to the aio_put_req done deeper in the call chain by the
completion methods for the usb gadgetfs request.

> And adding extra aio_put_req() to the cancel method will not fix failing 
> kick_iocb() which is another problem and this patch is supposed to address it.

I guess I'm confused.  You wrote the following:

> I've written the driver code which implements a zero-copy DMA char device. It 
> has aio_read() and aio_write() methods which return -EIOCBQUEUED after the 
> successful preparation of the buffers described by kiocb and posting it to the 
> descriptor chain. When the descriptors are processed, the DMA engine raises 
> the interrupt and the cleanup work is done in the handler, including 
> aio_complete() for the completed kiocbs.
>
> This works fine, however, there is a problem with canceling the queued 
> requests, espesially on io_destroy() syscall. Since there is no simple way to 
> remove single kiocb from the descriptor chain, I'm removing all of them from 
> the queue using aio_complete() or aio_put_req() in the ki_cancel() callback 
> routine of my driver. The main problem is the reference counting in 
> aio_cancel_all():
>
> 		if (cancel) {
> 			iocb->ki_users++;
> 			spin_unlock_irq(&ctx->ctx_lock);
> 			cancel(iocb, &res);
> 			spin_lock_irq(&ctx->ctx_lock);
> 		}
>
> Here the iocb->ki_users gets incremented which already has the value 1 at this 
> point (after the io_submit_one() completion) and it's never released (). So I 
> have to call aio_put_req() twice for the given kiocb (this seems to be the 
> hack to me) or I'll end up with the unkillable process stuck in 
> wait_for_all_aios() at the io_schedule(). I've posted the patches where I've 
> added aio_put_req() but I think it needs more testing. 

OK, you tried two aio_put_req() calls, and it worked, but you thought
maybe it wasn't the right approach.  So:

> So, I've tried another approach (hack) - requeue the kiocb with
> kick_iocb() before calling aio_put_req() in the ki_cancel() callback
> (that's because aio_run_iocb() takes some special actions for the
> canceled kiocbs). And I've found out that kick_iocb() fails because
> aio_run_iocb() does this:
> 	iocb->ki_run_list.next = iocb->ki_run_list.prev = NULL;
> and only reinitializes iocb->ki_run_list when iocb->ki_retry() returns 
> -EIOCBRETRY but kick_iocb() is exported and looks like intended for usage 
> (though not recommended).

You implemented this other hack, and ran into troubles, so you're
modifying the aio core to fix it.  Am I wrong in concluding that if you
keep your first solution above, you no longer need this second?

You may also find the following an interesting read:

http://permalink.gmane.org/gmane.linux.kernel.aio.general/2571

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ