[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c56e7e9c-90ca-29ca-2003-1a9a88d75fa6@huaweicloud.com>
Date: Mon, 29 May 2023 16:50:32 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Yu Kuai <yukuai1@...weicloud.com>
Cc: song@...nel.org, akpm@...l.org, neilb@...e.de,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH -next v2 7/7] md/raid1-10: limit the number of plugged bio
Hi,
在 2023/05/29 15:57, Xiao Ni 写道:
> On Mon, May 29, 2023 at 11:18 AM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/29 11:10, Xiao Ni 写道:
>>> On Mon, May 29, 2023 at 10:20 AM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/05/29 10:08, Xiao Ni 写道:
>>>>> Hi Kuai
>>>>>
>>>>> There is a limitation of the memory in your test. But for most
>>>>> situations, customers should not set this. Can this change introduce a
>>>>> performance regression against other situations?
>>>>
>>>> Noted that this limitation is just to triggered writeback as soon as
>>>> possible in the test, and it's 100% sure real situations can trigger
>>>> dirty pages write back asynchronously and continue to produce new dirty
>>>> pages.
>>>
>>> Hi
>>>
>>> I'm confused here. If we want to trigger write back quickly, it needs
>>> to set these two values with a smaller number, rather than 0 and 60.
>>> Right?
>>
>> 60 is not required, I'll remove this setting.
>>
>> 0 just means write back if there are any dirty pages.
>
> Hi Kuai
>
> Does 0 mean disabling write back? I tried to find the doc that
> describes the meaning when setting dirty_background_ratio to 0, but I
> didn't find it.
> In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it
> doesn't describe this. But it says something like this
>
> Note:
> dirty_background_bytes is the counterpart of dirty_background_ratio. Only
> one of them may be specified at a time. When one sysctl is written it is
> immediately taken into account to evaluate the dirty memory limits and the
> other appears as 0 when read.
>
> Maybe you can specify dirty_background_ratio to 1 if you want to
> trigger write back ASAP.
The purpose here is to trigger write back ASAP, I'm not an expert here,
but based on test result, 0 obviously doesn't mean disable write back.
Set dirty_background_bytes to a value, dirty_background_ratio will be
set to 0 together, which means dirty_background_ratio is disabled.
However, change dirty_background_ratio from default value to 0, will end
up both dirty_background_ratio and dirty_background_bytes to be 0, and
based on following related code, I think 0 just means write back if
there are any dirty pages.
domain_dirty_limits:
bg_bytes = dirty_background_bytes -> 0
bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100 -> 0
if (bg_bytes)
bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
else
bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; -> 0
dtc->bg_thresh = bg_thresh; -> 0
balance_dirty_pages
nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh &&
!writeback_in_progress(wb))
wb_start_background_writeback(wb); -> writeback ASAP
Thanks,
Kuai
>
>>>>
>>>> If a lot of bio is not plugged, then it's the same as before; if a lot
>>>> of bio is plugged, noted that before this patchset, these bio will spent
>>>> quite a long time in plug, and hence I think performance should be
>>>> better.
>>>
>>> Hmm, it depends on if it's sequential or not? If it's a big io
>>> request, can it miss the merge opportunity?
>>
>> The bio will still be merged to underlying disks' rq(if it's rq based),
>> underlying disk won't flush plug untill the number of request exceed
>> threshold.
>
> Thanks for this.
>
> Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>>
>>> Regards
>>> Xiao
>>>
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>> Best Regards
>>>>> Xiao
>>>>>
>>>>> On Wed, Apr 26, 2023 at 4:24 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>>>>>
>>>>>> From: Yu Kuai <yukuai3@...wei.com>
>>>>>>
>>>>>> bio can be added to plug infinitely, and following writeback test can
>>>>>> trigger huge amount of plugged bio:
>>>>>>
>>>>>> Test script:
>>>>>> modprobe brd rd_nr=4 rd_size=10485760
>>>>>> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean
>>>>>> echo 0 > /proc/sys/vm/dirty_background_ratio
>>>>>> echo 60 > /proc/sys/vm/dirty_ratio
>>>>>> fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test
>>>>>>
>>>>>> Test result:
>>>>>> Monitor /sys/block/md0/inflight will found that inflight keep increasing
>>>>>> until fio finish writing, after running for about 2 minutes:
>>>>>>
>>>>>> [root@...ora ~]# cat /sys/block/md0/inflight
>>>>>> 0 4474191
>>>>>>
>>>>>> Fix the problem by limiting the number of plugged bio based on the number
>>>>>> of copies for original bio.
>>>>>>
>>>>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>>>>> ---
>>>>>> drivers/md/raid1-10.c | 9 ++++++++-
>>>>>> drivers/md/raid1.c | 2 +-
>>>>>> drivers/md/raid10.c | 2 +-
>>>>>> 3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>>>>>> index 98d678b7df3f..35fb80aa37aa 100644
>>>>>> --- a/drivers/md/raid1-10.c
>>>>>> +++ b/drivers/md/raid1-10.c
>>>>>> @@ -21,6 +21,7 @@
>>>>>> #define IO_MADE_GOOD ((struct bio *)2)
>>>>>>
>>>>>> #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
>>>>>> +#define MAX_PLUG_BIO 32
>>>>>>
>>>>>> /* for managing resync I/O pages */
>>>>>> struct resync_pages {
>>>>>> @@ -31,6 +32,7 @@ struct resync_pages {
>>>>>> struct raid1_plug_cb {
>>>>>> struct blk_plug_cb cb;
>>>>>> struct bio_list pending;
>>>>>> + unsigned int count;
>>>>>> };
>>>>>>
>>>>>> static void rbio_pool_free(void *rbio, void *data)
>>>>>> @@ -127,7 +129,7 @@ static inline void md_submit_write(struct bio *bio)
>>>>>> }
>>>>>>
>>>>>> static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>>>>> - blk_plug_cb_fn unplug)
>>>>>> + blk_plug_cb_fn unplug, int copies)
>>>>>> {
>>>>>> struct raid1_plug_cb *plug = NULL;
>>>>>> struct blk_plug_cb *cb;
>>>>>> @@ -147,6 +149,11 @@ static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
>>>>>>
>>>>>> plug = container_of(cb, struct raid1_plug_cb, cb);
>>>>>> bio_list_add(&plug->pending, bio);
>>>>>> + if (++plug->count / MAX_PLUG_BIO >= copies) {
>>>>>> + list_del(&cb->list);
>>>>>> + cb->callback(cb, false);
>>>>>> + }
>>>>>> +
>>>>>>
>>>>>> return true;
>>>>>> }
>>>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>>>>> index 639e09cecf01..c6066408a913 100644
>>>>>> --- a/drivers/md/raid1.c
>>>>>> +++ b/drivers/md/raid1.c
>>>>>> @@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>>>>> r1_bio->sector);
>>>>>> /* flush_pending_writes() needs access to the rdev so...*/
>>>>>> mbio->bi_bdev = (void *)rdev;
>>>>>> - if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
>>>>>> + if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
>>>>>> spin_lock_irqsave(&conf->device_lock, flags);
>>>>>> bio_list_add(&conf->pending_bio_list, mbio);
>>>>>> spin_unlock_irqrestore(&conf->device_lock, flags);
>>>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>>>>> index bd9e655ca408..7135cfaf75db 100644
>>>>>> --- a/drivers/md/raid10.c
>>>>>> +++ b/drivers/md/raid10.c
>>>>>> @@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
>>>>>>
>>>>>> atomic_inc(&r10_bio->remaining);
>>>>>>
>>>>>> - if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
>>>>>> + if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) {
>>>>>> spin_lock_irqsave(&conf->device_lock, flags);
>>>>>> bio_list_add(&conf->pending_bio_list, mbio);
>>>>>> spin_unlock_irqrestore(&conf->device_lock, flags);
>>>>>> --
>>>>>> 2.39.2
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>> .
>>>
>>
>
> .
>
Powered by blists - more mailing lists