linux-ext4 - Re: [PATCH] ext4: fix 50% disk write performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C7C8DAE.50902@redhat.com>
Date:	Tue, 31 Aug 2010 00:05:50 -0500
From:	Eric Sandeen <sandeen@...hat.com>
To:	Bill Fink <billfink@...dspring.com>
CC:	tytso@....edu, adilger@....com, linux-ext4@...r.kernel.org,
	bill.fink@...a.gov
Subject: Re: [PATCH] ext4: fix 50% disk write performance regression

Bill Fink wrote:
> On Mon, 30 Aug 2010, Eric Sandeen wrote:
> 
>> Can you give this a shot?
>>
>> The first hunk is, I think, the biggest problem.  Even if
>> we get the max number of pages we need, we keep scanning forward
>> until "done" without doing any more actual, useful work.
>>
>> The 2nd hunk is an oddity, some places assign nr_to_write
>> to LONG_MAX, and we get here and multiply -that- by 8... giving
>> us "-8" for nr_to_write, that can't help things when we
>> do later comparisons on that number...
>>
>> I also see us asking to find pages starting at "idx" and
>> the first dirty page we find is well ahead of that,
>> I'm not sure if that's indicative of a problem or not.
>>
>> Anyway, want to give this a shot, in place of the patch you sent,
>> and see how it fares compared to stock and/or with your patch?
>>
>> It's build-and-sanity tested but not really performance tested here.
>>
>> Thanks,
>> -Eric
> 
> Great!  It looks like that does the trick.
> 
> 2.6.35 + your patch:
> 
> i7test7% dd if=/dev/zero of=/i7raid/bill/testfile1 bs=1M count=32768
> 32768+0 records in
> 32768+0 records out
> 34359738368 bytes (34 GB) copied, 50.6702 s, 678 MB/s
> 
> That's the same performance as with my patch, and pretty darn
> close to the original 2.6.31 performance.

hah, that's good esp. considering my followup email that found
what I think is a problem with my patch.  ;)

What happens if you change:

	if (!range_cyclic && range_whole && wbc->nr_to_write != LONG_MAX)
		desired_nr_to_write = wbc->nr_to_write * 8;
  	else
  		desired_nr_to_write = ext4_num_dirty_pages(inode, index,

to:

        if (!range_cyclic && range_whole) {
                if (wbc->nr_to_write != LONG_MAX)
                        desired_nr_to_write = wbc->nr_to_write * 8;
                else
                        desired_nr_to_write = wbc->nr_to_write;
        } else
  		desired_nr_to_write = ext4_num_dirty_pages(inode, index,

and see how that fares?  I think that makes a little more sense, if we
got there with LONG_MAX that means "write everything" and there's no need
to bump it up or to go counting pages.  It may not make any real difference.

But I'm seeing really weird behavior in writeback, it starts out nicely
writing 32768 pages at a time, and then goes all wonky, revisiting pages
it's already done and doing IO in little chunks.   This is going to take
some staring I think.

-Eric



> 						-Thanks a bunch
> 
> 						-Bill
> 
> 
> 
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 4b8debe..33c2167 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1207,8 +1207,10 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
>>  				break;
>>  			idx++;
>>  			num++;
>> -			if (num >= max_pages)
>> -				break;
>> +			if (num >= max_pages) {
>> +				pagevec_release(&pvec);
>> +				return num;
>> +			}
>>  		}
>>  		pagevec_release(&pvec);
>>  	}
>> @@ -3002,7 +3004,7 @@ static int ext4_da_writepages(struct address_space *mapping,
>>  	 * sbi->max_writeback_mb_bump whichever is smaller.
>>  	 */
>>  	max_pages = sbi->s_max_writeback_mb_bump << (20 - PAGE_CACHE_SHIFT);
:

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html