[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <675877DB-FCD7-40AF-940D-4AD91288B4BD@dilger.ca>
Date: Wed, 16 May 2018 12:12:00 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: "Theodore Y. Ts'o" <tytso@....edu>
Cc: Wang Shilong <wangshilong1991@...il.com>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Wang Shilong <wshilong@....com>
Subject: Re: [PATCH] ext4: add an interface to load block bitmaps
On May 15, 2018, at 8:58 PM, Theodore Y. Ts'o <tytso@....edu> wrote:
>
> On Tue, May 15, 2018 at 10:06:23PM +0900, Wang Shilong wrote:
>> From: Wang Shilong <wshilong@....com>
>>
>> During our benchmarking, we found sometimes writing
>> performances are not stable enough and there are some
>> small read during write which could drop throughput(~30%).
>
> Out of curiosity, what sort of benchmarks are you doing?
This is related to Lustre, though I can't comment on the specific
benchmarks. We've had a variety of reports about this issue. The
current workaround (hack) is "dumpe2fs /dev/XXX > /dev/null" at
mount time and periodically at runtime via cron to keep the bitmaps
loaded, but we wanted to get a better solution in place that works
for everyone. Being able to load all of the bitmaps at mount time
also helps get the server performance "up to speed" quickly, rather
than on-demand loading of a few hundred MB of bitmaps over time.
>> It turned out that block bitmaps loading could make
>> some latency here,also for a heavy fragmented filesystem,
>> we might need load many bitmaps to find some free blocks.
>>
>> To improve above situation, we had a patch to load block
>> bitmaps to memory and pin those bitmaps memory until umount
>> or we release the memory on purpose, this could stable write
>> performances and improve performances of a heavy fragmented
>> filesystem.
>
> This is true, but I wonder how realistic this is on real production
> systems. For a 1 TiB file system, pinning all of the block bitmaps
> will require 32 megabytes of memory. Is that really realistic for
> your use case?
In the case of Lustre servers, they typically have 128GB of RAM or
more, and are serving a few hundred TB of storage each these days,
but do not have any other local users, so the ~6GB RAM usage isn't a
huge problem if it improves the performance/consistency. The real
issue is that the bitmaps do not get referenced often enough compared
to the 10GB/s of data flowing through the server, so they are pushed
out of memory too quickly.
> So is this something just for benchmarking (in which case what are you
> trying to benchmark)? Or is this something that you want to use in
> production? And if so, perhaps something to consider is analyzing how
> fragmented and how full you want to run your file system.
Fullness and fragmentation is up to the end user, and not really under
our control. I think filesystems are typically always 75-80% full, and
when they pass 90% there is a "purge" of old files to get it back under
the 75% range.
> Something else to investigate is *why* is the file system getting so
> fragmented in the first place, and are there things we can do to
> prevent the file system from getting that fragmented in the first
> place....
I think it is just filesystem aging, and some users aren't good at keeping
the space usage below 90-95%, but then are unhappy when the performance
goes down afterward.
> Something to perhaps consider doing is storing the bitmap in memory in
> a compressed form. For example, you could use a run length encoding
> scheme where 2 bytes is used to encoding the starting block of a free
> extent, and 2 bytes to encode the length of the free extent. For a
> large number of mostly full (or mostly empty, for that matter) block
> allocation bitmaps, this will be a much more efficient way to cache
> the information in memory if you really want to keep all of the
> allocation information in memory.
I was thinking of something like this for the inode bitmaps, especially
since they are often going to be mostly unused (default is 1MB/inode for
Lustre OSTs, so only 16/4096 bytes used in each inode bitmap). Another
option is to store only in-use bytes instead of the full inode bitmap.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists