[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8734tosyb9.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Mon, 19 Feb 2024 15:33:30 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Daniel Bristot de Oliveira <bristot@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann
<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben
Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Daniel
Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider
<vschneid@...hat.com>, linux-kernel@...r.kernel.org, Luca Abeni
<luca.abeni@...tannapisa.it>, Tommaso Cucinotta
<tommaso.cucinotta@...tannapisa.it>, Thomas Gleixner
<tglx@...utronix.de>, Joel Fernandes <joel@...lfernandes.org>, Vineeth
Pillai <vineeth@...byteword.org>, Shuah Khan <skhan@...uxfoundation.org>,
Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lu@...el.com>, Kairui Song
<kasong@...cent.com>, Guo Ziliang <guo.ziliang@....com.cn>
Subject: Re: [PATCH v5 0/7] SCHED_DEADLINE server infrastructure
Hi, Daniel,
Thanks a lot for your great patchset!
We have a similar starvation issue in mm subsystem too. Details are in
the patch description of the below commit. In short, task A is busy
looping on some event, while task B will signal the event after some
work. If the priority of task A is higher than that of task B, task B
may be starved.
IIUC, if task A is RT task while task B is fair task, then your patchset
will solve the issue. If both task A and task B is RT tasks, is there
some way to solve the issue?
Now, we use a ugly schedule_timeout_uninterruptible(1) in the loop to
resolve the issue, is there something better?
Best Regards,
Huang, Ying
--------------------------8<---------------------------------------
commit 029c4628b2eb2ca969e9bf979b05dc18d8d5575e
Author: Guo Ziliang <guo.ziliang@....com.cn>
Date: Wed Mar 16 16:15:03 2022 -0700
mm: swap: get rid of livelock in swapin readahead
In our testing, a livelock task was found. Through sysrq printing, same
stack was found every time, as follows:
__swap_duplicate+0x58/0x1a0
swapcache_prepare+0x24/0x30
__read_swap_cache_async+0xac/0x220
read_swap_cache_async+0x58/0xa0
swapin_readahead+0x24c/0x628
do_swap_page+0x374/0x8a0
__handle_mm_fault+0x598/0xd60
handle_mm_fault+0x114/0x200
do_page_fault+0x148/0x4d0
do_translation_fault+0xb0/0xd4
do_mem_abort+0x50/0xb0
The reason for the livelock is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop. We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the
priority of the task stuck in a livelock so that the task that clears
the SWAP_HAS_CACHE flag will run. The results show that the system
returns to normal after the priority is lowered.
In our testing, multiple real-time tasks are bound to the same core, and
the task in the livelock is the highest priority task of the core, so
the livelocked task cannot be preempted.
Although cond_resched() is used by __read_swap_cache_async, it is an
empty function in the preemptive system and cannot achieve the purpose
of releasing the CPU. A high-priority task cannot release the CPU
unless preempted by a higher-priority task. But when this task is
already the highest priority task on this core, other tasks will not be
able to be scheduled. So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving
up the CPU and prevent it from running in kernel mode for too long.
(akpm: ugly hack becomes uglier. But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something
better)
Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com
Signed-off-by: Guo Ziliang <guo.ziliang@....com.cn>
Reported-by: Zeal Robot <zealci@....com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@....com.cn>
Reviewed-by: Jiang Xuexin <jiang.xuexin@....com.cn>
Reviewed-by: Yang Yang <yang.yang29@....com.cn>
Acked-by: Hugh Dickins <hughd@...gle.com>
Cc: Naoya Horiguchi <naoya.horiguchi@....com>
Cc: Michal Hocko <mhocko@...nel.org>
Cc: Minchan Kim <minchan@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Roger Quadros <rogerq@...nel.org>
Cc: Ziliang Guo <guo.ziliang@....com.cn>
Cc: <stable@...r.kernel.org>
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 8d4104242100..ee67164531c0 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
* __read_swap_cache_async(), which has set SWAP_HAS_CACHE
* in swap_map, but not yet added its page to swap cache.
*/
- cond_resched();
+ schedule_timeout_uninterruptible(1);
}
/*
Powered by blists - more mailing lists