lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140717180908.GB8628@birch.djwong.org>
Date:	Thu, 17 Jul 2014 11:09:08 -0700
From:	"Darrick J. Wong" <darrick.wong@...cle.com>
To:	Xiaoguang Wang <wangxg.fnst@...fujitsu.com>
Cc:	linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
	a-fujita@...jp.nec.com, t-sato@...jp.nec.com, sandeen@...hat.com
Subject: Re: Question about e2fsprogs/e4defrag

On Thu, Jul 17, 2014 at 11:29:26AM +0800, Xiaoguang Wang wrote:
> Hi,
> 
> On 07/17/2014 03:22 AM, Darrick J. Wong wrote:
> > On Wed, Jul 16, 2014 at 12:16:37PM +0800, Xiaoguang Wang wrote:
> >> Hi,
> >>
> >> When I run xfstests/tests/generic/018 for ext4 file system in RHEL7.0GA,
> >> sometimes it fails and sometime it succeeds. After looking into this case, I
> >> think it's not a kernel ext4 bug, it maybe an e4dfrag bug. I compiled the
> >> newest e2fsprogs to have test, it seems this issue still exits, so I still
> >> send a mail to this list to look for some help, thanks.
> >>
> >> The issue is that sometimes e4defrag does not defrag file correctly.
> >> Steps to reproduce this issue:
> >> 1. cd mntpoint
> >> 2. rm -f lege
> >> 3. for I in `seq 9 -1 0`;
> >>        do dd if=/dev/zero of=lege bs=4k count=1 conv=notrunc seek=$I oflag=sync &>/dev/null;
> >>    done;
> >> 4. e4defrag -c -v lege
> >>
> >> Repeatedly execute the 2, 3, 4 steps until you get a file which have the similar extent layout like below:
> >> ################################################################
> >> [root@...alhost test_e2fsprogs]# e4defrag -c -v lege
> >> <File>
> >> [ext 1]:        start 49365571: logical 0:      len 1
> >> [ext 2]:        start 49365570: logical 1:      len 1
> >> [ext 3]:        start 49365569: logical 2:      len 1
> >> [ext 4]:        start 49365568: logical 3:      len 1
> >> [ext 5]:        start 49365567: logical 4:      len 1
> >> [ext 6]:        start 49365566: logical 5:      len 1
> >> [ext 7]:        start 49365565: logical 6:      len 1
> >> [ext 8]:        start 49365564: logical 7:      len 1
> >> [ext 9]:        start 49365563: logical 8:      len 1
> >> [ext 10]:       start 49365562: logical 9:      len 1
> >>
> >>  Total/best extents                             10/1
> >>  Average size per extent                        4 KB
> >>  Fragmentation score                            98
> >>  [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
> >>  This file (lege) needs defragmentation.
> >>  Done.
> >> ################################################################
> >> The physical blocks are continuous but reversed.
> >>
> >> If we call e4defrag against this file, the output would be:
> >> ################################################################
> >> [root@...alhost test_e2fsprogs]# /tmp/e4defrag -v lege 
> >> ext4 defragmentation for lege
> >> [1/1]lege:      100%  extents: 10 -> 10 [ OK ]
> >>  Success:                       [1/1]
> >> [root@...alhost test_e2fsprogs]# /tmp/e4defrag -v -c lege 
> >> <File>
> >> [ext 1]:        start 49365571: logical 0:      len 1
> >> [ext 2]:        start 49365570: logical 1:      len 1
> >> [ext 3]:        start 49365569: logical 2:      len 1
> >> [ext 4]:        start 49365568: logical 3:      len 1
> >> [ext 5]:        start 49365567: logical 4:      len 1
> >> [ext 6]:        start 49365566: logical 5:      len 1
> >> [ext 7]:        start 49365565: logical 6:      len 1
> >> [ext 8]:        start 49365564: logical 7:      len 1
> >> [ext 9]:        start 49365563: logical 8:      len 1
> >> [ext 10]:       start 49365562: logical 9:      len 1
> >>
> >>  Total/best extents                             10/1
> >>  Average size per extent                        4 KB
> >>  Fragmentation score                            98
> >>  [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
> >>  This file (lege) needs defragmentation.
> >>  Done.
> >> ################################################################
> >> According to my understanding, this file is not defraged correctly and should
> >> be convert into one extent.  Or because if the physical blocks are continuous
> >> though reversed, we do not need to do defragment?
> > 
> > Oh, I think we /do/ need to defragment.  Granted, file readahead might paper
> > over the symptoms, but since the user explicitly ran e4defrag we can try
> > to do better.
> 
> Yeah, agree.
> > 
> >> I have checked the e4defrag source code, whether to do real defragment
> >> depends on some conditions, please
> >> see this code(e4defrag.c).
> >> --main
> >>   --file_defrag
> >>     
> >> In file_defrag(), there is such a judgement:
> >> "if (file_frags_start <= best || orig_physical_cnt <= donor_physical_cnt)", If this returns true, the e4defrag will
> >> not call call_defrag() to do real defragment work.
> >>
> >> Here file_frags_start: number of file fragments before defrag
> >>      orig_physical_cnt: number of original file's continuous physical region
> >>      donor_physical_cnt: number of donor file's continuous physical region
> >>
> >> In this "lege" file, the orig_physical_cnt is 1, and donor_physical_cnt is also 1, so the "if" is satisfied and
> >> call_defrag() won't be called.
> > 
> > This is a curious corner case of e4defrag -- if you look in get_file_extents(),
> > the list of extents is insertion-sorted by physical block, which means that
> > get_physical_count() (stupidly) looks only for gaps in the runs of physical
> > blocks.  Therefore, e4defrag thinks that this "lege" file has one physical
> > extent.  Ignoring logical block ordering, this is true, but as you point out,
> > this leaves the "file written backwards" case in a fragmented state.  So let's
> > not ignore the logical block ordering:
> > 
> > What I think we really need to do here is make get_physical_count() smarter --
> > if there's a gap either in the physical or logical offsets of extents, then we
> > need to increment *_physical_cnt so that we later decide to defragment the
> > file.
> > 
> > (Please keep reading)
> I checked the code again, you are right, thanks for your explanation
>  
> > 
> >> Here I'd like to know the comparison "orig_physical_cnt <=
> >> donor_physical_cnt" is useful? According to my understanding, what should we
> >> have comparison are number of extents or average extent size.
> >>
> >> When I have this change:
> >> diff --git a/misc/e4defrag.c b/misc/e4defrag.c
> >> index a204793..cd95698 100644
> >> --- a/misc/e4defrag.c
> >> +++ b/misc/e4defrag.c
> >> @@ -1598,8 +1598,7 @@ check_improvement:
> >>                 extents_before_defrag += file_frags_start;
> >>         }
> >>  
> >> -       if (file_frags_start <= best ||
> >> -                       orig_physical_cnt <= donor_physical_cnt) {
> >> +       if (file_frags_start <= best) { 
> > 
> > This is incorrect, since the point of the "orig_physical_cnt <=
> > donor_physical_cnt" check is to ensure that we don't increase the fragmentation
> > of a file by swapping it with pieces from a donor file whose contents are
> > spread out over a larger number of runs of physical blocks.
> 
> Ah, I see. I hadn't realized that, thanks.
> > 
> > (It does, however, force defragmentation for all files, so you get the results
> > you wanted.)
> > 
> > Please try the patch at the end of this message on for size.  It fixes things
> > on my test VM; does it fix yours?
> 
> Yeah, it works, thanks.
> Would you send a new version patch to fix this issue, or should I do it?

I'll send the patch (with a proper changelog) along in my -maint fixes rollup
in a few days.

--D

> 
> Regards,
> Xiaoguang Wang
> > 
> > --D
> > 
> >>                 printf("\033[79;0H\033[K[%u/%u]%s:\t%3d%%",
> >>                         defraged_file_count, total_count, file, 100);
> >>                 if (mode_flag & DETAIL)
> >>
> >> Then the "lege" file could be defraged correctly.
> >> ##################################################################
> >> [root@...alhost test_e2fsprogs]# /tmp/e4defrag -v  lege
> >> ext4 defragmentation for lege
> >> [1/1]lege:      100%  extents: 10 -> 1  [ OK ]
> >>  Success:                       [1/1]
> >> [root@...alhost test_e2fsprogs]# /tmp/e4defrag -v -c lege
> >> <File>
> >> [ext 1]:        start 49366583: logical 0:      len 10
> >>
> >>  Total/best extents                             1/1
> >>  Average size per extent                        40 KB
> >>  Fragmentation score                            0
> >>  [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
> >>  This file (lege) does not need defragmentation.
> >>  Done.
> >> ##################################################################
> >>
> >> Any opinion or suggestions will be appreciated!
> >> If I'm wrong, please correct me, thanks!
> >>
> >> Regards,
> >> Xiaoguang Wang
> > 
> > diff --git a/misc/e4defrag.c b/misc/e4defrag.c
> > index a204793..d0eac60 100644
> > --- a/misc/e4defrag.c
> > +++ b/misc/e4defrag.c
> > @@ -888,7 +888,9 @@ static int get_physical_count(struct fiemap_extent_list *physical_list_head)
> >  
> >  	do {
> >  		if ((ext_list_tmp->data.physical + ext_list_tmp->data.len)
> > -				!= ext_list_tmp->next->data.physical) {
> > +				!= ext_list_tmp->next->data.physical ||
> > +		    (ext_list_tmp->data.logical + ext_list_tmp->data.len)
> > +				!= ext_list_tmp->next->data.logical) {
> >  			/* This extent and next extent are not continuous. */
> >  			ret++;
> >  		}
> > .
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ