lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFCwf12E7CQFG1=etvYogp7rvg8+R=hYR0OBD3j1Zqba1u5Hug@mail.gmail.com>
Date:   Sat, 10 Aug 2019 20:45:54 +0300
From:   Oded Gabbay <oded.gabbay@...il.com>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     "Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
        Omer Shpigelman <oshpigelman@...ana.ai>,
        Tomer Tayar <ttayar@...ana.ai>
Subject: Re: [PATCH] habanalabs: print to kernel log when reset is finished

On Sat, Aug 10, 2019 at 8:23 PM Greg KH <gregkh@...uxfoundation.org> wrote:
>
> On Sat, Aug 10, 2019 at 06:29:16PM +0300, Oded Gabbay wrote:
> > On Sat, Aug 10, 2019 at 3:53 PM Greg KH <gregkh@...uxfoundation.org> wrote:
> > >
> > > On Sat, Aug 10, 2019 at 03:38:08PM +0300, Oded Gabbay wrote:
> > > > Now that we don't print the queue testing messages, we need to print when
> > > > the reset is finished so whoever looks at the kernel log will know the
> > > > reset process was finished successfully and the driver is not stuck.
> > > >
> > > > Signed-off-by: Oded Gabbay <oded.gabbay@...il.com>
> > > > ---
> > > >  drivers/misc/habanalabs/device.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
> > > > index 9a5926888b99..1fac808c2546 100644
> > > > --- a/drivers/misc/habanalabs/device.c
> > > > +++ b/drivers/misc/habanalabs/device.c
> > > > @@ -907,6 +907,8 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
> > > >       else
> > > >               hdev->soft_reset_cnt++;
> > > >
> > > > +     dev_info(hdev->dev, "Successfully finished resetting the device\n");
> > >
> > > Really?  For doing things "properly" there is no need to spam the kernel
> > > log.  Only spit stuff out if an error happens.
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> > I beg to differ for two reasons:
> > 1. Reset happens very rarely, if at all. So this message (that get
> > printed after reset is done) will definitely not spam the kernel log.
> > 2. When a reset starts we print an appropriate error message. I think
> > it is expected by the user that we will also print if and when the
> > reset has finished successfully. I really believe that lack of this
> > printing might be deceiving for users.
>
> How is anyone going to parse the kernel log for anything to know if
> something happens?

Actually our jenkins server parse the kernel log and look for bad
messages from the driver...
But the main reason for this, IMO, is for developers (in userspace)
that run with a terminal open on the kernel log to see if something
bad happens while they run their applications.

>
> How do you trigger a reset?  Is it done by userspace?  If so, just
> notify them then.

Reset can be triggered by two main reasons, but a userspace
application is *not* one of them.
It can be due to a command submission not finishing in time or from a
major error in the device (e.g. double ECC error).
That's why I think it is important to tell them the reset +
re-initialization flow was finished successfully.
Of course all this information, let's call it operational status, can
be acquired by the INFO IOCTL the driver provides, but it is *really*
convenient, IMHO, to see these type of messages in the kernel log
while they happen.

>
> thanks,
>
> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ