lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9ebzUL+fqGeQuAj@alley>
Date:   Mon, 30 Jan 2023 11:28:29 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Weiping Zhang <zwp10758@...il.com>
Cc:     akpm@...ux-foundation.org, peterz@...radead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] hung_task: show sysctl_hung_task_warnings

On Sun 2023-01-29 17:08:29, Weiping Zhang wrote:
> Sorry for the late reply.
> 
> On Fri, Jan 13, 2023 at 9:43 PM Petr Mladek <pmladek@...e.com> wrote:
> >
> > On Thu 2023-01-12 17:17:45, Weiping Zhang wrote:
> > > This patch try to add more debug info to detect lost kernel log or no
> > > hung task was detected.
> > >
> > > The user set 10 to the hung_task_timeout_secs, the kernel log:
> > >
> > > [ 3942.642220] INFO: task mount:19066 blocked for more than 10 seconds.
> > > [ 3952.876768] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3952.877088] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3952.878212] INFO: task mount:19066 blocked for more than 10 seconds.
> > > [ 3963.116805] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3963.117137] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3963.118275] INFO: task mount:19066 blocked for more than 10 seconds.
> > > [ 3973.356837] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3973.357148] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3973.358247] INFO: task mount:19066 blocked for more than 10 seconds.
> 
> Expect but no hung task log at this moment.
> 
> > > [ 3993.836899] INFO: task kworker/u81:0:7 blocked for more than 10 seconds.
> > > [ 3993.837238] INFO: task scsi_eh_0:506 blocked for more than 10 seconds.
> > > [ 3993.838356] INFO: task mount:19066 blocked for more than 10 seconds.
> > >
> > > There is no any log at about 3983, it's hard to know if kernel log was
> > > lost or there is no hung task was detected at that moment. So this patch
> > > print sysctl_hung_task_warnings to distinguish the above two cases.
> > >
> > > Signed-off-by: Weiping Zhang <zhangweiping@...iglobal.com>
> > > ---
> > >  kernel/hung_task.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > index c71889f3f3fc..ca917931473d 100644
> > > --- a/kernel/hung_task.c
> > > +++ b/kernel/hung_task.c
> > > @@ -127,8 +127,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> > >        * complain:
> > >        */
> > >       if (sysctl_hung_task_warnings) {
> > > -             if (sysctl_hung_task_warnings > 0)
> > > +             if (sysctl_hung_task_warnings > 0) {
> > >                       sysctl_hung_task_warnings--;
> > > +                     pr_err("sysctl_hung_task_warnings: %d\n",
> > > +                             sysctl_hung_task_warnings);
> > > +             }
> >
> > It is too much noise. But it might make sense to report it when
> > the counter gets down to zero. Something like:
> >
> >                 if (sysctl_hung_task_warnings)
> >                         pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
> >
> > and move this down after printing all the details for this hung task report.
> I'm fine to print this warning when it gets down to 0.

I prefer this variant.

> The warning counter is useful to detect kernel log lost or not, if add
> a new line to print this count is too noise,
> how about append this counter at the end of the following line:
>         pr_err("INFO: task %s:%d blocked for more than %ld seconds,
> sysctl_hung_task_warnings: %d\n",
>                 t->comm, t->pid, (jiffies - t->last_switch_time) / HZ,
> sysctl_hung_task_warnings);

Is it really important to print this on every line, please? IMHO, most people
do not care. IMHO. it would add more harm (confusion, noise) than good.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ