[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4891cca-4da3-4411-bc9c-669118bf825a@intel.com>
Date: Thu, 3 Jul 2025 20:04:19 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Michal Hocko <mhocko@...e.com>, Peter Zijlstra <peterz@...radead.org>
CC: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
<vschneid@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, Tim Chen
<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>, Jirka Hladky
<jhladky@...hat.com>, Srikanth Aithal <Srikanth.Aithal@....com>, Suneeth D
<Suneeth.D@....com>, Libo Chen <libo.chen@...cle.com>
Subject: Re: [PATCH] sched/numa: Fix NULL pointer access to mm_struct durng
task swap
On 7/3/2025 8:01 PM, Michal Hocko wrote:
> On Thu 03-07-25 13:50:06, Peter Zijlstra wrote:
>> On Thu, Jul 03, 2025 at 11:28:46AM +0200, Michal Hocko wrote:
>>
>>> But thinking about this some more, this would be racy same as the
>>> PF_EXITING check. This is not my area but is this performance sensitive
>>> path that couldn't live with the proper find_lock_task_mm?
>>
>> find_lock_task_mm() seems eminently unsuitable for accounting --
>> iterating the task group is insane.
>>
>> Looking at this, the mm_struct lifetimes suck.. task_struct reference
>> doesn't help, rcu doesn't help :-(
>>
>> Also, whatever the solution it needs to be inside this count_memcg_*()
>> nonsense, because nobody wants this overhead, esp. not for something
>> daft like accounting.
>>
>> My primary desire at this point is to just revert the patch that caused
>> this. Accounting just isn't worth it. Esp. not since there is already a
>> tracepoint in this path -- people that want to count crap can very well
>> get their numbers from that.
>
> I would tend to agree with this. Doing the accounting race free on a
> remote task is nasty and if this is a rare event that could be avoided
> then it should be just dropped than racy and oops prone.
>
OK, Michal and Peter,
how about keeping the per task schedstat and drop the memcg statistics?
The user can still get the per task information without having to filter
the ftrace log.
thanks,
Chenyu
Powered by blists - more mailing lists