[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87ilj9ul7k.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Mon, 21 Nov 2022 09:53:35 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Chen Wandun <chenwandun@...wei.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: <hugh@...itas.com>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>, <wangkefeng.wang@...wei.com>,
<sunnanyong@...wei.com>, <xialonglong1@...wei.com>
Subject: Re: [RFC PATCH] swapfile: fix soft lockup in scan_swap_map_slots
Andrew Morton <akpm@...ux-foundation.org> writes:
> On Fri, 18 Nov 2022 21:38:50 +0800 Chen Wandun <chenwandun@...wei.com> wrote:
>
>> A soft lockup occur in scan free swap slot by constructing
>> huge memory pressure.
>> The test scenario is: 64 CPU cores, 64GB memory, and 28
>> zram devices, the disksize of each zram device is 50MB.
>>
>> LATENCY_LIMIT is used to prevent soft lockup in function
>> scan_swap_map_slots, but the real loop number would more
>> than LATENCY_LIMIT because of "goto checks and goto scan"
>> repeatly without decrease of latency limit.
>>
>> In order to fix it, move decrease latency_ration code in advance.
>>
>> There is also a suspicious place that will cause soft lockup in
>> function get_swap_pages, in this function, the "goto start_over"
>> may result in continuous scanning of swap partition, if there is
>> no cond_sched in scan_swap_map_slots, it would cause soft lockup
>> (I am not sure about this).
>>
>> ...
>>
>
> Looks sensible.
Yes. LGTM.
Reviewed-by: "Huang, Ying" <ying.huang@...el.com>
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -972,23 +972,23 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>> scan:
>> spin_unlock(&si->lock);
>> while (++offset <= READ_ONCE(si->highest_bit)) {
>> - if (swap_offset_available_and_locked(si, offset))
>> - goto checks;
>> if (unlikely(--latency_ration < 0)) {
>> cond_resched();
>> latency_ration = LATENCY_LIMIT;
>> scanned_many = true;
>> }
>> + if (swap_offset_available_and_locked(si, offset))
>> + goto checks;
>> }
>> offset = si->lowest_bit;
>> while (offset < scan_base) {
>> - if (swap_offset_available_and_locked(si, offset))
>> - goto checks;
>> if (unlikely(--latency_ration < 0)) {
>> cond_resched();
>> latency_ration = LATENCY_LIMIT;
>> scanned_many = true;
>> }
>> + if (swap_offset_available_and_locked(si, offset))
>> + goto checks;
>> offset++;
>> }
>> spin_lock(&si->lock);
>
> But this does somewhat alter the `scanned_many' logic. We'll now set
> 'scanned_many` earlier. What are the effects of this?
>
> The ed43af10975eef7e changelog outlines tests which could be performed
> to ensure we aren't regressing from this.
Per my understanding, this will not influence `scanned_many` logic
much. Because `scanned_many` flag will be set just a little earlier
(one less slot).
Best Regards,
Huang, Ying
Powered by blists - more mailing lists