linux-kernel - Re: trying to understand READ_META, READ_SYNC, WRITE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100621213618.GC6474@redhat.com>
Date:	Mon, 21 Jun 2010 17:36:18 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Christoph Hellwig <hch@....de>
Cc:	Jens Axboe <axboe@...nel.dk>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: trying to understand READ_META, READ_SYNC, WRITE_SYNC & co

On Mon, Jun 21, 2010 at 09:14:10PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 21, 2010 at 08:56:55PM +0200, Jens Axboe wrote:
> > FWIW, Windows marks meta data writes and they go out with FUA set
> > on SATA disks. And SATA firmware prioritizes FUA writes, it's essentially
> > a priority bit as well as a platter access bit. So at least we have some
> > one else using a meta data boost. I agree that it would be a lot more
> > trivial to add the annotations if they didn't have scheduler impact
> > as well, but I still think it's a sane thing.
> 
> And we still disable the FUA bit in libata unless people set a
> non-standard module option..
> 
> > >> Reads are sync by nature in the block layer, so they don't get that
> > >> special annotation.
> > > 
> > > Well, we do give them this special annotation in a few places, but we
> > > don't actually use it.
> > 
> > For unplugging?
> 
> We use the explicit unplugging, yes - but READ_META also includes
> REQ_SYNC which is not used anywhere.
> 
> > > But that leaves the question why disabling the idling logical for
> > > data integrity ->writepage is fine?  This gets called from ->fsync
> > > or O_SYNC writes and will have the same impact as O_DIRECT writes.
> > 
> > We have never enabled idling for those. O_SYNC should get a nice
> > boost too, it just needs to be benchmarked and tested and then
> > there would be no reason not to add it.
> 
> We've only started using any kind of sync tag last year in ->writepage in
> commit a64c8610bd3b753c6aff58f51c04cdf0ae478c18 "block_write_full_page:
> Use synchronous writes for WBC_SYNC_ALL writebacks", switching from
> WRITE_SYNC to WRITE_SYNC_PLUG a bit later in commit
> 6e34eeddf7deec1444bbddab533f03f520d8458c "block_write_full_page: switch
> synchronous writes to use WRITE_SYNC_PLUG"
> 
> Before that we used plain WRITE, which had the normal idling logic.

I have got very little understanding of file system layer, but if I had
to guess, i think following might have happened.

- We switched from WRITE to WRITE_SYNC for fsync() path.

- This might have caused issues with idling as for SYNC_WRITE we will idle
  in CFQ but probably it is not desirable in certain cases where next set
  of WRITES is going to come from journaling thread. 

- That might have prompted us to introduce the rq_noidle() to make sure
  we don't idle in WRITE_SYNC path but direct IO path was avoided to make
  sure good throughput is maintained. But this left one question open 
  and that is it good to disable idling on all WRITE_SYNC path in kernel.

- Slowly cfq code emerged and as it stands today, to me rq_noidle() is
  practically of not much use. For sync-idle tree (where idling is
  enabled), we are ignoring the rq_noidle() and always arming the timer.
  For sync-noidle, we choose not to idle based on if there was some other
  thread who did even a single IO with rq_noidle=0.

  I think in practice, there is on thread of other which is doing some
  read or write with rq_noidle=0 and if that's the case, we will end up
  idling on sync-noidle tree also and rq_noidle() practically does
  not take effect.

So if rq_noidle() was introduced to solve the issue of not idling on 
fsync() path (as jbd thread will send more data now), then probably slice
yielding patch of jeff might come handy here and and we can get rid of
rq_noidle() logic. This is just a guess work and I might be completely
wrong here...

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/