linux-kernel - Re: [PATCH AIO 2/2] Add listio support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1158245022.3890.68.camel@frecb000686>
Date:	Thu, 14 Sep 2006 16:43:42 +0200
From:	Sébastien Dugué <sebastien.dugue@...l.net>
To:	Zach Brown <zach.brown@...cle.com>
Cc:	linux-kernel@...r.kernel.org, linux-aio@...ck.org,
	drepper@...hat.com, suparna@...ibm.com, pbadari@...ibm.com,
	hch@...radead.org, johnpol@....mipt.ru, davem@...emloft.net,
	Jean Pierre Dion <jean-pierre.dion@...l.net>
Subject: Re: [PATCH AIO 2/2] Add listio support

On Tue, 2006-09-12 at 17:41 -0700, Zach Brown wrote:
> >   As io_submit already accepts an array of iocbs, as part of listio submission,
> > the user process prepends to a list of requests an empty special aiocb with
> > an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields.
> 
> Hmm, overloading an iocb as the _GROUP indication doesn't make much
> sense to me, other than being an expedient way to shoe-horn the
> semantics into the existing API.  That we have to worry about multiple
> _GROUP iocbs in the array (ignoring them currently) seems like
> complexity that we shouldn't even have to consider.
> 
> Am I just missing the discussion that lead us to avoid a syscall?  It
> seems like we're getting a pretty funky API in return.

  The original intent was to use the io_submit() interface which already
accepts a list of iocbs without having to resort on yet another syscall.
  But I agree that it's kludgy and a new syscall might end up being much
cleaner.

  Discussion on this issue would be much welcomed.

> 
> I guess I could also see an iocb command whose buf/bytes pointed to an
> array of iocb pointers that it should issue and wait for completion on.
>  You could call it, uh, IOCB_CMD_IO_SUBMIT :).  I'm only sort of kidding
> here :).  With a sys_io_wait_for_one_iocb() that might not be entirely
> ridiculous.  Then you could get an io_getevents() completion of a group
> instead of only being able to get a signal or waiting on sync
> completion.  It makes it less of a special case.
> 
> In any case, this current approach doesn't seem complete.  It doesn't
> deal with the case where canceled iocbs don't come through
> aio_complete() and so don't drop their lio_users count.

  Yep, will have to think a bit more about this.

> 
> Also, it seems that the lio_submit() interface allows NULL iocb pointers
> by skipping them.  sys_io_submit() looks like it'd fault on them.  It
> sounds unpleasant to have an app allow nulls in the array and have the
> library copy and compress the members, or something.  Maybe we could fix
> up that mismatch in an API that explicitly took the sigevent struct?  Am
> I making up this problem?

  Effectively, right now the NULL pointers have to be handled by the 
library and the new syscall approach would allow to solve this in
an elegant way.

> 
> more inline..
> 
> > +static void lio_wait(struct kioctx *ctx, struct lio_event *lio)
> > +{
> > +       struct task_struct *tsk = current;
> > +       DECLARE_WAITQUEUE(wait, tsk);
> > +
> > +       add_wait_queue(&ctx->wait, &wait);
> > +       set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> > +
> > +       while ( atomic_read(&lio->lio_users) ) {
> > +               schedule();
> > +               set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> > +       }
> > +       __set_task_state(tsk, TASK_RUNNING);
> > +       remove_wait_queue(&ctx->wait, &wait);
> 
> wait_event(ctx->wait, atomic_read(&lio->lio_users));

  Nice shortcut, thanks.

> 
> > +static inline void lio_check(struct lio_event *lio)
> > +{
> > +	int ret;
> > +
> > +	ret = atomic_dec_and_test(&(lio->lio_users));
> > +
> > +	if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
> > +		/* last one -> notify process */
> > +		aio_send_signal(&lio->lio_notify);
> > +		kfree(lio);
> 
> Since io_submit() doesn't hold a ref on lio_users this can race during
> submission to send the signal and free the lio before all the operations
> have been submitted.

  Right.

> 
> > +	if (*lio) {
> > +               printk (KERN_DEBUG "lio_create: already have an lio\n");
> 
> We should get rid of the printk so that the user doesn't have a trivial
> way to flood the kernel message log.

  OK.

> 
> > +	*lio = kmalloc(sizeof(*lio), GFP_KERNEL);
> > +
> > +	if (!*lio)
> > +		return -EAGAIN;
> > +
> > +	memset (*lio, 0, sizeof(struct lio_event));
> 
> kzalloc(), and I imagine it would be a lot cleaner to return the lio or
> ERR_PTR() instead of passing in the pointer to the struct and filling it in.

  Makes sense.

> 
> And is it legal to memset(&atomic_t, 0, )?  I would have expected a
> atomic_set(&lio->lio_users, 0).  Maybe I'm stuck in the bad old days of
> 24 bit atomic_t and embedded locks :).

  Well, it's just an allocation so lio->lio_users is not even ready to 
be used. So I don't think it makes any difference whether this field
is cleared during the memset (or kzalloc) or using an explicit
atomic_set(). Or are there any architectures still out there that stuff
state bits in an atomic_t?

> 
> > +		if (lio && ((tmp.aio_lio_opcode == IOCB_CMD_PREAD) ||
> > +			    (tmp.aio_lio_opcode == IOCB_CMD_PWRITE))) {
> 
> It doesn't make sense to me to restrict which ops are allowed to be in a
> list or not.  Why not PREADV?  An iocb is an iocb, I think.

  Right, I wrongly focused on the POSIX aspects, but we should be more
versatile here.

> 
> > +	IOCB_CMD_GROUP = 7,
> 
> 7 is already taken in -mm by IOCB_CMD_PREADV .

  Argh, forgot to check -mm. Sorry.

> 
> > +struct lio_event {
> > +	atomic_t		lio_users;
> > +	int			lio_wait;
> > +	struct aio_notify	lio_notify;
> 
> I'm pretty sure you don't need lio_wait here, given that it's only
> tested in the submission path.
> 

  Yes, we can do without.

  Thanks,

  Sébastien.

-- 
-----------------------------------------------------

  Sébastien Dugué                BULL/FREC:B1-247
  phone: (+33) 476 29 77 70      Bullcom: 229-7770

  mailto:sebastien.dugue@...l.net

  Linux POSIX AIO: http://www.bullopensource.org/posix
                   http://sourceforge.net/projects/paiol

-----------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/