[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1d4ef30a-69c5-c4dc-c3bd-8d7c0c99b3f3@sony.com>
Date: Sun, 25 Apr 2021 08:42:05 +0200
From: peter enderborg <peter.enderborg@...y.com>
To: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Guenter Roeck <linux@...ck-us.net>,
Wim Van Sebroeck <wim@...ux-watchdog.org>,
Andrew Morton <akpm@...ux-foundation.org>,
<linux-watchdog@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-mm@...ck.org>, Shakeel Butt <shakeelb@...gle.com>
Subject: Re: [RFC PATCH] watchdog: Adding softwatchdog
On 4/25/21 3:08 AM, Tetsuo Handa wrote:
> On 2021/04/25 1:19, peter enderborg wrote:
>>> I don't think this proposal is a watchdog. I think this proposal is
>>> a timer based process killer, based on an assumption that any slowdown
>>> which prevents the monitor process from pinging for more than 0.5 seconds
>>> (if HZ == 1000) is caused by memory pressure.
>> You missing the point. The oom killer is a example of a work that it can do.
>> it is one policy. The idea is that you should have a policy that fits your needs.
> Implementing policy which can run in kernel from timer interrupt context is
> quite limited, for it is not allowed to perform operations that might sleep. See
>
> [RFC] memory reserve for userspace oom-killer
> https://urldefense.com/v3/__https://lkml.kernel.org/r/CALvZod7vtDxJZtNhn81V=oE-EPOf=4KZB2Bv6Giz*u3bFFyOLg@mail.gmail.com__;Kw!!JmoZiZGBv3RvKRSx!tqBFKAdfydRJ5M0oP4xCRvSscrBwChj5MWuj1YUNAk05uORWkbcz-iodFCHYjKdOytmHoO4$
>
> for implementing possibly useful policy.
I you need to do a more complex approach you might need to
have a work queue. For example a SIGTERM solution might
be like that. You send sigterm wait some time and then send a sigkill.
>> oom_score_adj is suitable for a android world. But it might be based on
>> uid's if your priority is some users over other. Or a memcg. Or as
>> Christophe Leroy want the current. The policy is only a example that
>> fits a one area.
> Horrible idea. Imagine a kernel module that randomly sends SIGTERM/SIGKILL
> to "current" thread. How normal systems can survive? A normal system is not
> designed to survive random signals.
I think you need to see it in the context of a watchdog. It might be
problematic, but it has a good statistical change to hit a cpu hogger.
And seeing as watchdog, the alternative is a system reset. You
take a chance. Reboot should be the last resort.
I can imagine a kernel module that randomly sends SIGTERM/SIGKILL,
we already have that. It is called oom-kill. This is *exactly* the problem.
>
>> You need to describe your prioritization, in android it is
>> oom_score_adj. For example I would very much have a policy that sends
>> sigterm instead of sigkill.
> That's because Android framework is designed to survive random signals
> (in order to survive memory pressure situation).
It using a lot to control the system. It use it differently than you would
with a shell or window-manager.
>
>> But the integration with oom is there because
>> it is needed. Maybe a bad choice for political reasons but I don't it a
>> good idea to hide the intention. Please don't focus on the oom part.
> I wonder what system other than Android framework can utilize this module.
I think it will be useful for embedded systems as well.
> By the way, there already is "Software Watchdog" ( drivers/watchdog/softdog.c )
> which some people might call it "soft watchdog". It is very confusing to name
> your module as "softwatchdog". Please find a different name.
>
It is mention in the patch-set. I had as an idea to add this function to that one,
but I decided that it was better to separate so point out the feature that is to
be "Soft" rather than so hard.
Powered by blists - more mailing lists