linux-kernel - Re: [RFC] ext3: per-process soft-syncing data=ordered mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200802052220.33135.a1426z@gawab.com>
Date:	Tue, 5 Feb 2008 22:20:33 +0300
From:	Al Boldi <a1426z@...ab.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Chris Mason <chris.mason@...cle.com>,
	Andreas Dilger <adilger@....com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode

Jan Kara wrote:
> On Tue 05-02-08 10:07:44, Al Boldi wrote:
> > Jan Kara wrote:
> > > On Sat 02-02-08 00:26:00, Al Boldi wrote:
> > > > Chris Mason wrote:
> > > > > Al, could you please compare the write throughput from vmstat for
> > > > > the data=ordered vs data=writeback runs?  I would guess the
> > > > > data=ordered one has a lower overall write throughput.
> > > >
> > > > That's what I would have guessed, but it's actually going up 4x fold
> > > > for mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
> > >
> > >   So you say we write 4-times as much data in ordered mode as in
> > > writeback mode. Hmm, probably possible because we force all the dirty
> > > data to disk when committing a transation in ordered mode (and don't
> > > do this in writeback mode). So if the workload repeatedly dirties the
> > > whole DB, we are going to write the whole DB several times in ordered
> > > mode but in writeback mode we just keep the data in memory all the
> > > time. But this is what you ask for if you mount in ordered mode so I
> > > wouldn't consider it a bug.
> >
> > Ok, maybe not a bug, but a bit inefficient.  Check out this workload:
> >
> > sync;
> >
> > while :; do
> >   dd < /dev/full > /mnt/sda2/x.dmp bs=1M count=20
> >   rm -f /mnt/sda2/x.dmp
> >   usleep 10000
> > done
:
:
> > Do you think these 12mb redundant writeouts could be buffered?
>
>   No, I don't think so. At least when I run it, number of blocks written
> out varies which confirms that these 12mb are just data blocks which
> happen to be in the file when transaction commits (which is every 5
> seconds).

Just a thought, but maybe double-buffering can help?

> And to satisfy journaling gurantees in ordered mode you must
> write them so you really have no choice...

Making this RFC rather useful.

What we need now is an implementation, which should be easy.

Maybe something on these lines:

<< in ext3_ordered_write_end >>
  if (current->soft_sync & 1)
    return ext3_writeback_write_end;

<< in ext3_ordered_writepage >>
  if (current->soft_sync & 2)
    return ext3_writeback_writepage;

<< in ext3_sync_file >>
  if (current->soft_sync & 4)
    return ret;

<< in ext3_file_write >>
  if (current->soft_sync & 8)
    return ret;

As you can see soft_sync is masked and bits are ordered by importance.

It would be neat if somebody interested could cook-up a patch.


Thanks!

--
Al

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/