linux-kernel - Re: io-scheduler tuning for better read/write ratio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Jun 2009 09:25:55 +0200
From:	Ralf Gross <Ralf-Lists@...fgross.de>
To:	linux-kernel@...r.kernel.org, fengguang.wu@...el.com
Subject: Re: io-scheduler tuning for better read/write ratio

Jeff Moyer schrieb:
> Ralf Gross <Ralf-Lists@...fgross.de> writes:
> 
> > Jeff Moyer schrieb:
> >> Ralf Gross <rg@...-Softwaretechnik.com> writes:
> >> 
> >> > Jeff Moyer schrieb:
> >> >> Jeff Moyer <jmoyer@...hat.com> writes:
> >> >> 
> >> >> > Ralf Gross <rg@...-softwaretechnik.com> writes:
> >> >> >
> >> >> >> Casey Dahlin schrieb:
> >> >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> >> >> >>> > David Newall schrieb:
> >> >> >>> >> Ralf Gross wrote:
> >> >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> >> >> >>> >>> read, 90 MB/s write).
> >> >> >>> > 
> >> >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> >> >> >>> > to the device at the same time.
> >> >> >>> > 
> >> >> >>> > Ralf
> >> >> >>> 
> >> >> >>> How specifically are you testing? It could depend a lot on the
> >> >> >>> particular access patterns you're using to test.
> >> >> >>
> >> >> >> I did the basic tests with tiobench. The real test is a test backup
> >> >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> >> >> >> The jobs partially write to the device in parallel. Depending which
> >> >> >> spool file reaches the 30 GB first, one starts reading from that file
> >> >> >> and writing to tape, while to other is still spooling.
> >> >> >
> >> >> > We are missing a lot of details, here.  I guess the first thing I'd try
> >> >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> >> >> > that your backup application isn't driving very deep queue depths.  If
> >> >> > that doesn't work, then please provide exact invocations of tiobench
> >> >> > that reprduce the problem or some blktrace output for your real test.
> >> >> 
> >> >> Any news, Ralf?
> >> >
> >> > sorry for the delay. atm there are large backups running and using the
> >> > raid device for spooling. So I can't do any tests.
> >> >
> >> > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> >> > didn't help.
> >> >
> >> > I'll do some more tests when the backups are done (3-4 more days).
> >> 
> >> The default is 128KB, I believe, so it's strange that you would test
> >> smaller values.  ;)  I would try something along the lines of 1 or 2 MB.
> >
> > Err, yes this should have been MB not KB.
> >
> >
> > $cat /sys/block/sdc/queue/read_ahead_kb 
> > 16384
> > $cat /sys/block/sdd/queue/read_ahead_kb 
> > 16384
> >
> > I also tried different values for max_sectors_kb, nr_requests. But the
> > trend that writes were much faster than reads while there was read and
> > write load on the device didn't change.
> >
> > Changing the deadline parameter writes_starved, write_expire,
> > read_expire, front_merges or fifo_batch didn't change this behavoir.
> 
> OK, bumping up readahead and changing the deadline parameters listed
> should have give some better results, I would think.  Can you give the
> invocation of tiobench you used so I can try to reproduce this?

The main problem is with bacula. It reads/writes from/to two
spoolfiles on the same device.

I get the same behavior with 2 dd processes, one reading from disk, one writing
to it.

Here's the output from dstat (5 sec intervall).

--dsk/md1--
_read _writ
  26M   95M
  31M   96M
  20M   85M
  31M  108M
  28M   89M
  24M   95M
  26M   79M
  32M  115M
  50M   74M
 129M   15k
 147M 1638B
 147M    0 
 147M    0 
 113M    0


At the end I stopped the dd process that is writing to the device, so you can
see that the md device is capable of reading with >120 MB/s.

I did this with these two commands.

dd if=/dev/zero of=test bs=1MB
dd if=/dev/md1 of=/dev/null bs=1M


Maybe this is too simple, but with a real world application I see the same
behavior. md1 is a md raid 0 device with 2 disks.


md1 : active raid0 sdc[0] sdd[1]
      781422592 blocks 64k chunks

sdc:

/sys/block/sdc/queue/hw_sector_size
512
/sys/block/sdc/queue/max_hw_sectors_kb
32767
/sys/block/sdc/queue/max_sectors_kb
512
/sys/block/sdc/queue/nomerges
0
/sys/block/sdc/queue/nr_requests
128
/sys/block/sdc/queue/read_ahead_kb
16384
/sys/block/sdc/queue/scheduler
noop anticipatory [deadline] cfq

/sys/block/sdc/queue/iosched/fifo_batch
16
/sys/block/sdc/queue/iosched/front_merges
1
/sys/block/sdc/queue/iosched/read_expire
500
/sys/block/sdc/queue/iosched/write_expire
5000
/sys/block/sdc/queue/iosched/writes_starved
2


sdd:

/sys/block/sdd/queue/hw_sector_size
512
/sys/block/sdd/queue/max_hw_sectors_kb
32767
/sys/block/sdd/queue/max_sectors_kb
512
/sys/block/sdd/queue/nomerges
0
/sys/block/sdd/queue/nr_requests
128
/sys/block/sdd/queue/read_ahead_kb
16384
/sys/block/sdd/queue/scheduler
noop anticipatory [deadline] cfq 


/sys/block/sdd/queue/iosched/fifo_batch
16
/sys/block/sdd/queue/iosched/front_merges
1
/sys/block/sdd/queue/iosched/read_expire
500
/sys/block/sdd/queue/iosched/write_expire
5000
/sys/block/sdd/queue/iosched/writes_starved
2


The deadline parameters are the default ones. Setting writes_starved much
higher I expected a change in the read/write ratio, but didn't see any change.



Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/