linux-kernel - Re: [patch] convert aio event reap to use atomic-op instead of spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070411180038.GN28322@mami.zabbo.net>
Date:	Wed, 11 Apr 2007 11:00:38 -0700
From:	Zach Brown <zach.brown@...cle.com>
To:	Ken Chen <kenchen@...gle.com>
Cc:	akpm@...ux-foundation.org, linux-aio@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [patch] convert aio event reap to use atomic-op instead of spin_lock

First, I'll NAK this and all AIO patches until the patch description
says that it's been run through the regression tests that we've started
collecting in autotest.  They're trivial to run, never fear:

 cd /usr/local
 svn checkout svn://test.kernel.org/autotest/trunk/client autotest
 cd autotest
 bin/autotest tests/aio_dio_bugs/control

(We'll be moving this to LTP soon, I think, but that will just change
the commands to cut and paste.) 

> Resurrect an old patch that uses atomic operation to update ring buffer
> index on AIO event queue.  This work allows futher application/libaio
> optimization to run fast path io_getevents in user space.

I have mixed feelings.  I think the userspace getevents support was
poorly designed and the simple fact that we've gone this long without it
says just how desperately the feature isn't needed.

At the same time, it's a small change and the code is already there.  So
it's hard to get too worked up about it.  We can always remove support
in the future by never changing the indexes if we require that apps fall
back to sys_io_getevents().

In any case, let's see a test which exercises userspace event collection
before we claim to support it.  I'd prefer a little stand-alone C app
like the ones in autotest for simple unit testing.  A fio patch to
stress test it would probably also be well received.

> I've also added one more change on top of old implementation that rounds
> ring buffer size to power of 2 to allow fast head/tail calculation. With
> the new scheme, there is no more modulo operation on them and we simply
> increment either pointer index directly.  

Please quantify the improvement.  We're allocating more ring elements
than userspace asked for and than were accounted for in aio_max_nr.
That cost, however teeny, will be measured against the benefit.

> -	/* Compensate for the ring buffer's head/tail overlap entry */
> -	nr_events += 2;	/* 1 is required, 2 for good luck */
> +	/* round nr_event to next power of 2 */
> +	nr_events = roundup_pow_of_two(nr_events);

Is that buggy?  How will the code tell if head == tail means a full ring
or an empty ring?  The old code added that extra event to let it tell
the ring was full before head == tail and it would think it's empty
again, I think.  I'm guessing roundup(nr_events + 1) is needed.

> -#define aio_ring_event(info, nr, km) ({					\
> -	unsigned pos = (nr) + AIO_EVENTS_OFFSET;			\
> +#define aio_ring_event(info, __nr, km) ({				\
> +	unsigned pos = ((__nr) & ((info)->nr - 1)) + AIO_EVENTS_OFFSET;	\
>  	struct io_event *__event;					\
>  	__event = kmap_atomic(						\
>  			(info)->ring_pages[pos / AIO_EVENTS_PER_PAGE], km); \

(info) is now evaluated twice.  __info = (info), please, just like
__event there.

>  	put_aio_ring_event(event, KM_IRQ0);
>  	kunmap_atomic(ring, KM_IRQ1);

The ring page, which is mmaped to userspace at some weird address, was
just written through a kmap.  Do we need a flush_dcache_page()?  This
isn't this patch's problem, but it'd need to be addressed if we're using
the ring from userspace.

> -
> -	struct io_event		io_events[0];
> +	struct io_event	io_events[0];

Try to minimize gratuitous reformatting, please.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/