linux-kernel - Re: [PATCH v3 1/2] writeback: add dirty_background

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Sep 2012 15:54:03 +0900
From:	Namjae Jeon <linkinjeon@...il.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Fengguang Wu <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
	linux-kernel@...r.kernel.org,
	Namjae Jeon <namjae.jeon@...sung.com>,
	Vivek Trivedi <t.vivek@...sung.com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable

2012/9/25, Namjae Jeon <linkinjeon@...il.com>:
> 2012/9/25, Dave Chinner <david@...morbit.com>:
>> On Thu, Sep 20, 2012 at 04:44:22PM +0800, Fengguang Wu wrote:
>>> [ CC FS and MM lists ]
>>>
>>> Patch looks good to me, however we need to be careful because it's
>>> introducing a new interface. So it's desirable to get some acks from
>>> the FS/MM developers.
>>>
>>> Thanks,
>>> Fengguang
>>>
>>> On Sun, Sep 16, 2012 at 08:25:42AM -0400, Namjae Jeon wrote:
>>> > From: Namjae Jeon <namjae.jeon@...sung.com>
>>> >
>>> > This patch is based on suggestion by Wu Fengguang:
>>> > https://lkml.org/lkml/2011/8/19/19
>>> >
>>> > kernel has mechanism to do writeback as per dirty_ratio and
>>> > dirty_background
>>> > ratio. It also maintains per task dirty rate limit to keep balance of
>>> > dirty pages at any given instance by doing bdi bandwidth estimation.
>>> >
>>> > Kernel also has max_ratio/min_ratio tunables to specify percentage of
>>> > writecache to control per bdi dirty limits and task throttling.
>>> >
>>> > However, there might be a usecase where user wants a per bdi writeback
>>> > tuning
>>> > parameter to flush dirty data once per bdi dirty data reach a
>>> > threshold
>>> > especially at NFS server.
>>> >
>>> > dirty_background_centisecs provides an interface where user can tune
>>> > background writeback start threshold using
>>> > /sys/block/sda/bdi/dirty_background_centisecs
>>> >
>>> > dirty_background_centisecs is used alongwith average bdi write
>>> > bandwidth
>>> > estimation to start background writeback.
>>> >
>>> > One of the use case to demonstrate the patch functionality can be
>>> > on NFS setup:-
>>> > We have a NFS setup with ethernet line of 100Mbps, while the USB
>>> > disk is attached to server, which has a local speed of 25MBps. Server
>>> > and client both are arm target boards.
>>> >
>>> > Now if we perform a write operation over NFS (client to server), as
>>> > per the network speed, data can travel at max speed of 100Mbps. But
>>> > if we check the default write speed of USB hdd over NFS it comes
>>> > around to 8MB/sec, far below the speed of network.
>>> >
>>> > Reason being is as per the NFS logic, during write operation,
>>> > initially
>>> > pages are dirtied on NFS client side, then after reaching the dirty
>>> > threshold/writeback limit (or in case of sync) data is actually sent
>>> > to NFS server (so now again pages are dirtied on server side). This
>>> > will be done in COMMIT call from client to server i.e if 100MB of data
>>> > is dirtied and sent then it will take minimum 100MB/10Mbps ~ 8-9
>>> > seconds.
>>> >
>>> > After the data is received, now it will take approx 100/25 ~4 Seconds
>>> > to
>>> > write the data to USB Hdd on server side. Hence making the overall
>>> > time
>>> > to write this much of data ~12 seconds, which in practically comes out
>>> > to
>>> > be near 7 to 8MB/second. After this a COMMIT response will be sent to
>>> > NFS
>>> > client.
>>> >
>>> > However we may improve this write performace by making the use of NFS
>>> > server idle time i.e while data is being received from the client,
>>> > simultaneously initiate the writeback thread on server side. So
>>> > instead
>>> > of waiting for the complete data to come and then start the writeback,
>>> > we can work in parallel while the network is still busy in receiving
>>> > the
>>> > data. Hence in this way overall performace will be improved.
>>> >
>>> > If we tune dirty_background_centisecs, we can see there
>>> > is increase in the performace and it comes out to be ~ 11MB/seconds.
>>> > Results are:-
>>> >
>>> > Write test(create a 1 GB file) result at 'NFS client' after changing
>>> > /sys/block/sda/bdi/dirty_background_centisecs
>>> > on  *** NFS Server only - not on NFS Client ****
>>
>
> Hi. Dave.
>
>> What is the configuration of the client and server? How much RAM,
>> what their dirty_* parameters are set to, network speed, server disk
>> speed for local sequential IO, etc?
> these results are on ARM, 512MB RAM and XFS over NFS with default
> writeback settings(only our writeback setting - dirty_background_cen
> tisecs changed at nfs server only). Network speed is ~100MB/sec and
Sorry, there is typo:)
                    ^^100Mb/sec
> local disk speed is ~25MB/sec.
>
>>
>>> > ---------------------------------------------------------------------
>>> > |WRITE Test with various 'dirty_background_centisecs' at NFS Server |
>>> > ---------------------------------------------------------------------
>>> > |          | default = 0 | 300 centisec| 200 centisec| 100 centisec |
>>> > ---------------------------------------------------------------------
>>> > |RecSize   |  WriteSpeed |  WriteSpeed |  WriteSpeed |  WriteSpeed  |
>>> > ---------------------------------------------------------------------
>>> > |10485760  |  8.44MB/sec |  8.60MB/sec |  9.30MB/sec |  10.27MB/sec |
>>> > | 1048576  |  8.48MB/sec |  8.87MB/sec |  9.31MB/sec |  10.34MB/sec |
>>> > |  524288  |  8.37MB/sec |  8.42MB/sec |  9.84MB/sec |  10.47MB/sec |
>>> > |  262144  |  8.16MB/sec |  8.51MB/sec |  9.52MB/sec |  10.62MB/sec |
>>> > |  131072  |  8.48MB/sec |  8.81MB/sec |  9.42MB/sec |  10.55MB/sec |
>>> > |   65536  |  8.38MB/sec |  9.09MB/sec |  9.76MB/sec |  10.53MB/sec |
>>> > |   32768  |  8.65MB/sec |  9.00MB/sec |  9.57MB/sec |  10.54MB/sec |
>>> > |   16384  |  8.27MB/sec |  8.80MB/sec |  9.39MB/sec |  10.43MB/sec |
>>> > |    8192  |  8.52MB/sec |  8.70MB/sec |  9.40MB/sec |  10.50MB/sec |
>>> > |    4096  |  8.20MB/sec |  8.63MB/sec |  9.80MB/sec |  10.35MB/sec |
>>> > ---------------------------------------------------------------------
>>
>> While this set of numbers looks good, it's a very limited in scope.
>> I can't evaluate whether the change is worthwhile or not from this
>> test. If I was writing this patch, the questions I'd be seeking to
>> answer before proposing it for inclusion are as follows....
>>
>> 1. what's the comparison in performance to typical NFS
>> server writeback parameter tuning? i.e. dirty_background_ratio=5,
>> dirty_ratio=10, dirty_expire_centiseconds=1000,
>> dirty_writeback_centisecs=1? i.e. does this give change give any
>> benefit over the current common practice for configuring NFS
>> servers?
>>
>> 2. what happens when you have 10 clients all writing to the server
>> at once? Or a 100? NFS servers rarely have a single writer to a
>> single file at a time, so what impact does this change have on
>> multiple concurrent file write performance from multiple clients?
>>
>> 3. Following on from the multiple client test, what difference does it
>> make to file fragmentation rates? Writing more frequently means
>> smaller allocations and writes, and that tends to lead to higher
>> fragmentation rates, especially when multiple files are being
>> written concurrently. Higher fragmentation also means lower
>> performance over time as fragmentation accelerates filesystem aging
>> effects on performance.  IOWs, it may be faster when new, but it
>> will be slower 3 months down the track and that's a bad tradeoff to
>> make.
>>
>> 4. What happens for higher bandwidth network links? e.g. gigE or
>> 10gigE? Are the improvements still there? Or does it cause
>> regressions at higher speeds? I'm especially interested in what
>> happens to multiple writers at higher network speeds, because that's
>> a key performance metric used to measure enterprise level NFS
>> servers.
>>
>> 5. Are the improvements consistent across different filesystem
>> types?  We've had writeback changes in the past cause improvements
>> on one filesystem but significant regressions on others.  I'd
>> suggest that you need to present results for ext4, XFS and btrfs so
>> that we have a decent idea of what we can expect from the change to
>> the generic code.
>>
>> Yeah, I'm asking a lot of questions. That's because the generic
>> writeback code is extremely important to performance and the impact
>> of a change cannot be evaluated from a single test.
> Yes, I agree.
> I will share patch behavior in gigabit Ethernet, different
> filesystems(e.g. ext4, xfs and btrfs) and multiple NFS clients setup.
>
> Thanks.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@...morbit.com
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/