lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 Apr 2015 08:36:20 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Jens Axboe <axboe@...com>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@....de>, Theodore Ts'o <tytso@....edu>,
	"Elliott, Robert (Server Storage)" <elliott@...com>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file
 systems

On Wed, Apr 15, 2015 at 04:01:36PM -0600, Jens Axboe wrote:
> do_blockdev_direct_IO() increments and decrements the inode
> ->i_dio_count for each IO operation. It does this to protect against
> truncate of a file. Block devices don't need this sort of protection.
> 
> For a capable multiqueue setup, this atomic int is the only shared
> state between applications accessing the device for O_DIRECT, and it
> presents a scaling wall for that. In my testing, as much as 30% of
> system time is spent incrementing and decrementing this value. A mixed
> read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
> better latencies too. Before:
.....
> diff --git a/fs/inode.c b/fs/inode.c
> index f00b16f45507..c4901c40ad65 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
>  EXPORT_SYMBOL(inode_dio_wait);
>  
>  /*
> - * inode_dio_done - signal finish of a direct I/O requests
> + * inode_dio_begin - signal start of a direct I/O requests
>   * @inode: inode the direct I/O happens on
>   *
>   * This is called once we've finished processing a direct I/O request,
>   * and is used to wake up callers waiting for direct I/O to be quiesced.
>   */
> -void inode_dio_done(struct inode *inode)
> +void inode_dio_inc(struct inode *inode)

function name does not match docbook comment....

> +{
> +	atomic_inc(&inode->i_dio_count);
> +}
> +EXPORT_SYMBOL(inode_dio_inc);
> +
> +/*
> + * inode_dio_dec - signal finish of a direct I/O requests
> + * @inode: inode the direct I/O happens on
> + *
> + * This is called once we've finished processing a direct I/O request,
> + * and is used to wake up callers waiting for direct I/O to be quiesced.
> + */
> +void inode_dio_dec(struct inode *inode)
>  {
>  	if (atomic_dec_and_test(&inode->i_dio_count))
>  		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
>  }
> -EXPORT_SYMBOL(inode_dio_done);
> +EXPORT_SYMBOL(inode_dio_dec);

Bikeshedding: I think this would be better suited to inode_dio_begin()
and inode_dio_end() because now we are trying to say "this is where
the DIO starts, and this is where it ends". It's not really
"reference counting" interface, we're trying to annotate the
boundaries of where DIO iis protected against truncate....

And, realistically, if we are pushing this up into the filesystems
again, we should push it up into *all* filesystems and get rid of it
completely from the DIO layer. That way no new twisty passages in
the direct IO code are needed.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ