linux-ext4 - Re: fsck performance.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110220215531.GA21917@bitwizard.nl>
Date:	Sun, 20 Feb 2011 22:55:31 +0100
From:	Rogier Wolff <R.E.Wolff@...Wizard.nl>
To:	Ted Ts'o <tytso@....edu>
Cc:	Rogier Wolff <R.E.Wolff@...Wizard.nl>, linux-ext4@...r.kernel.org
Subject: Re: fsck performance.


Hi Ted, 

Thanks for looking into this... 

On Sun, Feb 20, 2011 at 02:34:06PM -0500, Ted Ts'o wrote:
> On Sun, Feb 20, 2011 at 12:09:31PM -0500, Ted Ts'o wrote:
> > 
> > Ah, you're using tdb.  Tdb can be really slow.  It's been on my todo
> > list to replace tdb with something else, but I haven't gotten around
> > to it.
> 
> Hmm... after taking a quick look at the TDB sources, why don't you try
> this.  In lib/ext2fs/icount.c and e2fsck/dirinfo.c, try replacing the
> flag TDB_CLEAR_IF_FIRST with TDB_NOLOCK | TDB_NOSYNC.  i.e., try
> replacing:
> 
> 	icount->tdb = tdb_open(fn, 0, TDB_CLEAR_IF_FIRST,
> 			       O_RDWR | O_CREAT | O_TRUNC, 0600);
> 
> with:
> 
> 	icount->tdb = tdb_open(fn, 0, TDB_NOLOCK | TDB_NOSYNC,
> 			       O_RDWR | O_CREAT | O_TRUNC, 0600);

I looked into this myself as well. Suspecting the locking calls I put
a "return 0" in the first line of the tdb locking function. This makes
all locking requests a noop. Doing it the proper way as you suggest
may be nicer, but this was a method that existed within my
abilities...

Ayway, this removed all the fcntl calls to lock and unlock the
database.... It didn't solve the performance issue though.... 

Here is an strace... 

0.000379  .525531 munmap(0x8d03e000, 108937216) = 0
0.008008  .533540 ftruncate(5, 108941312) = 0
0.000207  .533748 pwrite64(5, "BBBBBBBBBB"..., 1024, 108937216) = 1024
0.000235  .533983 pwrite64(5, "BBBBBBBBBB"..., 1024, 108938240) = 1024
0.000108  .534092 pwrite64(5, "BBBBBBBBBB"..., 1024, 108939264) = 1024
0.000138  .534230 pwrite64(5, "BBBBBBBBBB"..., 1024, 108940288) = 1024
0.000106  .534336 mmap2(NULL, 108941312, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x8d03d000
1.994850 2.529190 fstat64(6, {st_mode=S_IFREG|0600, st_size=92045312, ...}) = 0

The first column is the difference of the timestamp on THIS line
compared to the previous one. Consider that mostly CPUtime. 

The system calls all take between 17 and 127 microseconds. i.e.  fast.
The exception is the munmap call, which takes 7
milliseconds. Acceptable.

The performance killer is the almost two seconds of CPU time spent
before the fstat of the 5 or 6 file descriptors. 

It seems wasteful to mmap and munmap the whole 100M of those two
files all the time. 

The "BBBBB" strings in the pwrite calls are the padding. 0x42, get it?

I checked... The full 4x1024 bytes are just padding. Nothing else.


> Could you let me know what this does to the performance of e2fsck
> with scratch files enabled?

I apparently have scratch files enabled, right? I just typed

./configure ; ./make ; scp e2fsck/e2fsck othermachine:e2fsck.test 

so I didn't mess with the configuration. 


I just straced 

1298236533.396622 _llseek(3, 522912374784, [], SEEK_SET) = 0 <0.000038>
1298236540.311416 _llseek(3, 522912407552, [], SEEK_SET) = 0 <0.000035>
1298236547.288401 _llseek(3, 522912440320, [], SEEK_SET) = 0 <0.000035>

and I see it seeking to somewhere in the 486Gb range. Does this mean
it has 6x more to go? I don't really see the numbers increasing
significantly. Although out-of-order numbers appear in the llseek
outputs, the most common numbers are slowly increasing. 

I had first estimated thee ETA around the end of this century, but that
seems to be a bit overly pessimistic. I probably missed a factor of
1000 somewhere. I now get about 9 days. That means I'm likely to live
long enough to see the end of this..... :-)

Whenever the time to completion seems longer than optimizing it a bit
and then restarting, I'll restart. But in this case, if I keep
estimating the "normal fsck time" as 8 hours, and "a bit of coding" as
2 hours, I'm afraid It will never finish.

To estimate the time-to-run, would it be safe to suspend the running
fsck, and start an fsck -n ? I've invested 10 CPU hours in this fsck
instance already, I would like it to finish eventually... 9 days seems
doable...


out-of-order example: 

1298236950.540958 _llseek(3, 523986247680, [], SEEK_SET) = 0 <0.000035>
1298236950.646999 _llseek(3, 523986280448, [], SEEK_SET) = 0 <0.000038>
1298236952.813587 _llseek(3, 630728769536, [], SEEK_SET) = 0 <0.000036>
1298236953.947109 _llseek(3, 523986313216, [], SEEK_SET) = 0 <0.000035>
1298236953.948982 _llseek(3, 523986345984, [], SEEK_SET) = 0 <0.000015>

(I've deleted the number in the brackets, it's the same as the number
before.)


> Oh, and BTW, it would be useful if you tried configuring
> tests/test_config so that it sets E2FSCK_CONFIG with a test
> e2fsck.conf that enables the scratch files somewhere in tmp, and then
> run the regression test suite with these changes.

I'm not sure I understand correctly. Although undocumented you're
saying that e2fsck honors an environment variable E2FSCK_CONFIG, that
allows me to specify a different config file from /etc/e2fsck.conf.

I've created a e2fsck.conf file in the tests directory and changed it
to: 
[options]
        buggy_init_scripts = 1
[scratch_files]
  directory=/tmp

I've then pointed E2FSCK_CONFIG to this file (absolute pathname). I
then chickend out and edited my system /etc/e2fsck.conf to be the
same.

Next I typed "make" and got: 
102 tests succeeded     0 tests failed

> If they work, and it solves the performance problem, let me know and
> send me patches.  If we can figure out some way of improving the
> performance without needing to replace tdb, that would be great...

The system where the large filesystem is running already has an
e2fsck.conf that holds:

[scratch_files]
        directory = /var/cache/e2fsck

With "send me patches" you mean with the NOSYNC option enabled?


	Roger. 

-- 
** R.E.Wolff@...Wizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html