lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Jan 2021 18:19:43 +0100
From:   Jan Kara <jack@...e.cz>
To:     Xiaoguang Wang <xiaoguang.wang@...ux.alibaba.com>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>,
        joseph qi <joseph.qi@...ux.alibaba.com>
Subject: Re: code questions about ext4_inode_datasync_dirty()

Hi!

On Tue 12-01-21 19:45:06, Xiaoguang Wang wrote:
> I use io_uring to evaluate ext4 randread performance(direct io), observed
> obvious overhead in jbd2_transaction_committed():
> Samples: 124K of event 'cycles:ppp', Event count (approx.): 80630088951
> Overhead  Command          Shared Object      Symbol
>    7.02%  io_uring-sq-per  [kernel.kallsyms]  [k] jbd2_transaction_committed

Hum, that's quite a bit. Likely due to cacheline contention on
j_state_lock. Thanks for reporting this!

> The codes:
> 	/*
> 	 * Writes that span EOF might trigger an I/O size update on completion,
> 	 * so consider them to be dirty for the purpose of O_DSYNC, even if
> 	 * there is no other metadata changes being made or are pending.
> 	 */
> 	iomap->flags = 0;
> 	if (ext4_inode_datasync_dirty(inode) ||
> 	    offset + length > i_size_read(inode))
> 		iomap->flags |= IOMAP_F_DIRTY;
> 
> ext4_inode_datasync_dirty() calls jbd2_transaction_committed(). Sorry, I
> don't spend much time to learn iomap codes yet, just ask a quick question
> here. Do we need to call ext4_inode_datasync_dirty() for a read
> operation?

So strictly speaking we don't need to know the value of IOMAP_F_DIRTY in
that path (or any read path for that matter) but I'm somewhat reluctant to
optimize out setting of this flag because later some user might start to
depend on it being set correctly.

> If we must call ext4_inode_datasync_dirty() for a read operation, can we improve
> jbd2_transaction_committed() a bit, for example, have a quick check between
> inode->i_datasync_tid and j_commit_sequence, if inode->i_datasync_tid is less than
> or equal to j_commit_sequence, we also don't call jbd2_transaction_committed()?

Well, the modification would belong directly to
jbd2_transaction_committed(). Using j_commit_sequence is somewhat awkward
since TIDs can wrap around and so very old TIDs can suddently start to give
false positives leading to a strange results (this was one of motivations
to switch jbd2_transaction_committed() to a comparison for equality). But
it could certainly be done.

But j_state_lock is a scalability bottleneck in other cases as well. So
what I rather have in mind is that we could change transactions to be RCU
freed and then a majority of places where we use

  read_lock(&journal->j_state_lock);

can be switched to using plain RCU which should significantly reduce the
contention on the lock.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ