[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201124024459.GN132317@mit.edu>
Date: Mon, 23 Nov 2020 21:44:59 -0500
From: "Theodore Y. Ts'o" <tytso@....edu>
To: Saranya Muruganandam <saranyamohan@...gle.com>
Cc: linux-ext4@...r.kernel.org, adilger.kernel@...ger.ca,
Wang Shilong <wshilong@....com>
Subject: Re: [RFC PATCH v3 46/61] ext2fs: parallel bitmap loading
On Wed, Nov 18, 2020 at 07:39:32AM -0800, Saranya Muruganandam wrote:
> From: Wang Shilong <wshilong@....com>
>
> In our benchmarking for PiB size filesystem, pass5 takes
> 10446s to finish and 99.5% of time takes on reading bitmaps.
>
> It makes sense to reading bitmaps using multiple threads,
> a quickly benchmark show 10446s to 626s with 64 threads.
>
> Signed-off-by: Wang Shilong <wshilong@....com>
> Signed-off-by: Saranya Muruganandam <saranyamohan@...gle.com>
Note: This patch will *explode* with much hilarity if num_threads is
greater than the number of block groups. That's because the
ext2fs_get_avg_group() will return 1 if fs_num_threads is greater than
group_desc_count.
This will result in the group start and end limits to go beyond the
array boundaries.... and then *boom*. So there will probably need to
be some kind of safety checks if the caller has set fs_num_threads to
a value which is much larger than is appropriate for a given file
system.
Speaking of which, relying on fs_num_threads being set by e2fsck means
that we won't get the benefits of the parallel block bitmap reads for
debugfs and dumpe2fs. So we should think about how the other tools
should trigger read bitmaps. And this might be something that we want
to do independent of whether we are doing parallel fsck.
Suggested approach:
1) Create create ext2fs_is_device_rotational() which returns whether
or not a particular device is a HDD, or a non-rotational device (e.g.,
SSD, GCE PD, AWS EBS, etc.), using rotational as a proxy for "reading
using multiple threads is a good thing".
2) Create an ext2fs_get_num_procs() which calls
sysconf(_SC_NPROCESSOR_ONLN) if sysconf and _SC_NPROESSOR_ONLN is
available. If not, there may be other OS-specific ways of determining
the number of CPU's available.
3) If HAVE_PTHREADS and the number of block groups is greater than the
number of CPU's * 2, a function that (for now) we drop in libsupport
will set fs->fs_num_threads to the number of processors as the
default. There may not be a reason to change the default for debugfs
and dumpe2fs, but for e2fsck, this would be used for the default, but
it could be over-ridden via "-E multithread=<number of threads>".
- Ted
Powered by blists - more mailing lists