lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37fc3178-c812-ee5e-bd90-34f8e0816a3d@huaweicloud.com>
Date:   Tue, 30 May 2023 09:19:50 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     Xiao Ni <xni@...hat.com>, Yu Kuai <yukuai1@...weicloud.com>
Cc:     song@...nel.org, akpm@...l.org, neilb@...e.de,
        linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        yi.zhang@...wei.com, yangerkun@...wei.com,
        "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH -next v2 7/7] md/raid1-10: limit the number of plugged bio

Hi,

在 2023/05/30 8:58, Xiao Ni 写道:
> On Mon, May 29, 2023 at 4:50 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/29 15:57, Xiao Ni 写道:
>>> On Mon, May 29, 2023 at 11:18 AM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/05/29 11:10, Xiao Ni 写道:
>>>>> On Mon, May 29, 2023 at 10:20 AM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 在 2023/05/29 10:08, Xiao Ni 写道:
>>>>>>> Hi Kuai
>>>>>>>
>>>>>>> There is a limitation of the memory in your test. But for most
>>>>>>> situations, customers should not set this. Can this change introduce a
>>>>>>> performance regression against other situations?
>>>>>>
>>>>>> Noted that this limitation is just to triggered writeback as soon as
>>>>>> possible in the test, and it's 100% sure real situations can trigger
>>>>>> dirty pages write back asynchronously and continue to produce new dirty
>>>>>> pages.
>>>>>
>>>>> Hi
>>>>>
>>>>> I'm confused here. If we want to trigger write back quickly, it needs
>>>>> to set these two values with a smaller number, rather than 0 and 60.
>>>>> Right?
>>>>
>>>> 60 is not required, I'll remove this setting.
>>>>
>>>> 0 just means write back if there are any dirty pages.
>>>
>>> Hi Kuai
>>>
>>> Does 0 mean disabling write back? I tried to find the doc that
>>> describes the meaning when setting dirty_background_ratio to 0, but I
>>> didn't find it.
>>> In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it
>>> doesn't describe this. But it says something like this
>>>
>>> Note:
>>>     dirty_background_bytes is the counterpart of dirty_background_ratio. Only
>>>     one of them may be specified at a time. When one sysctl is written it is
>>>     immediately taken into account to evaluate the dirty memory limits and the
>>>     other appears as 0 when read.
>>>
>>> Maybe you can specify dirty_background_ratio to 1 if you want to
>>> trigger write back ASAP.
>>
>> The purpose here is to trigger write back ASAP, I'm not an expert here,
>> but based on test result, 0 obviously doesn't mean disable write back.
>>
>> Set dirty_background_bytes to a value, dirty_background_ratio will be
>> set to 0 together, which means dirty_background_ratio is disabled.
>> However, change dirty_background_ratio from default value to 0, will end
>> up both dirty_background_ratio and dirty_background_bytes to be 0, and
>> based on following related code, I think 0 just means write back if
>> there are any dirty pages.
>>
>> domain_dirty_limits:
>>    bg_bytes = dirty_background_bytes -> 0
>>    bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100 -> 0
>>
>>    if (bg_bytes)
>>           bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
>>    else
>>           bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; -> 0
>>
>>    dtc->bg_thresh = bg_thresh; -> 0
>>
>> balance_dirty_pages
>>    nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
>>    if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh &&
>>         !writeback_in_progress(wb))
>>      wb_start_background_writeback(wb); -> writeback ASAP
>>
>> Thanks,
>> Kuai
> 
> Hi Kuai
> 
> I'm not an expert about this either. Thanks for all your patches, I
> can study more things too. But I still have some questions.
> 
> I did a test in my environment something like this:
> modprobe brd rd_nr=4 rd_size=10485760
> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean
> echo 0 > /proc/sys/vm/dirty_background_ratio
> fio -filename=/dev/md0 -ioengine=libaio -rw=write -thread -bs=1k-8k
> -numjobs=1 -iodepth=128 --runtime=10 -name=xxx
> It will cause OOM and the system hangs

OOM means you trigger this problem... Plug hold lots of bios and cost
lots of memory, it's not that write back is disabled, you can verify
this by monitor md inflight, noted that don't use too much memory for
ramdisk(rd_nr * rd_size) in the test so that OOM won't be triggered.

Have you tried to test with this patchset?

> 
> modprobe brd rd_nr=4 rd_size=10485760
> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean
> echo 1 > /proc/sys/vm/dirty_background_ratio (THIS is the only different place)
> fio -filename=/dev/md0 -ioengine=libaio -rw=write -thread -bs=1k-8k
> -numjobs=1 -iodepth=128 --runtime=10 -name=xxx
> It can finish successfully.  The value of dirty_background_ration is 1
> here means it flushes ASAP

This really doesn't mean flushes ASAP, our test report this problem in
the real test that doesn't modify dirty_background_ratio. I guess
somewhere triggers io_scheduler(), probably background thread think
dirty pages doesn't match threshold, but I'm not sure for now.

Thanks,
Kuai
> 
> So your method should be the opposite way as you designed. All the
> memory can't be flushed in time, so it uses all memory very soon and
> the memory runs out and the system hangs. The reason I'm looking at
> the test is that do we really need this change. Because in the real
> world, most customers don't disable write back. Anyway, it depends on
> Song's decision and thanks for your patches again. I'll review V3 and
> try to do some performance tests.
> 
> Best Regards
> Xiao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ