lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Sep 2018 14:42:33 -0600
From:   Shuah Khan <shuah@...nel.org>
To:     Jerry.Hoemann@....com
Cc:     erosca@...adit-jv.com, linux-kselftest@...r.kernel.org,
        linux-kernel@...r.kernel.org, Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH] selftests: watchdog: Add gettimeout and get|set
 pretimeout

On 09/23/2018 07:47 PM, Jerry Hoemann wrote:
> On Fri, Sep 21, 2018 at 05:42:00PM -0600, Shuah Khan wrote:
>> Hi Jerry,
>>
>> Thanks for the patch. A few comments below:
> 
>   Replies inline.
> 
>>
>> On 09/21/2018 04:55 PM, Jerry Hoemann wrote:
>>> Add command line arguments to call ioctl WDIOC_GETTIMEOUT,
>>> WDIOC_GETPRETIMEOUT and WDIOC_SETPRETIMEOUT.
>>>
>>> Signed-off-by: Jerry Hoemann <jerry.hoemann@....com>
>>> ---
>>>  tools/testing/selftests/watchdog/watchdog-test.c | 30 +++++++++++++++++++++++-
>>>  1 file changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c
>>> index 6e29087..4861e2c 100644
>>> --- a/tools/testing/selftests/watchdog/watchdog-test.c
>>> +++ b/tools/testing/selftests/watchdog/watchdog-test.c
>>> @@ -19,7 +19,7 @@
>>>  
>>>  int fd;
>>>  const char v = 'V';
>>> -static const char sopts[] = "bdehp:t:";
>>> +static const char sopts[] = "bdehp:t:Tn:N";
>>>  static const struct option lopts[] = {
>>>  	{"bootstatus",          no_argument, NULL, 'b'},
>>>  	{"disable",             no_argument, NULL, 'd'},
>>> @@ -27,6 +27,9 @@
>>>  	{"help",                no_argument, NULL, 'h'},
>>>  	{"pingrate",      required_argument, NULL, 'p'},
>>>  	{"timeout",       required_argument, NULL, 't'},
>>> +	{"gettimeout",          no_argument, NULL, 'T'},
>>> +	{"pretimeout",    required_argument, NULL, 'n'},
>>> +	{"getpretimeout",       no_argument, NULL, 'N'},
>>>  	{NULL,                  no_argument, NULL, 0x0}
>>>  };
>>>  
>>> @@ -71,6 +74,9 @@ static void usage(char *progname)
>>>  	printf(" -h, --help          Print the help message\n");
>>>  	printf(" -p, --pingrate=P    Set ping rate to P seconds (default %d)\n", DEFAULT_PING_RATE);
>>>  	printf(" -t, --timeout=T     Set timeout to T seconds\n");
>>> +	printf(" -T, --gettimeout    Get the timeout\n");
>>> +	printf(" -n, --pretimeout    Set the pretimeout to T seconds\n");
>>> +	printf(" -N, --getpretimeout Get the pretimeout\n");
>>
>> How are the new arguments used?
> 
> I forgot the param.  Should be:
> 
>     +	printf(" -n, --pretimeout=T  Set the pretimeout to T seconds\n");
> 
> 
> I'll update in v2.

Okay.

> 
> Is this what you mean?  Or did I misunderstand?
> > 
>>
>>>  	printf("\n");
>>>  	printf("Parameters are parsed left-to-right in real-time.\n");
>>>  	printf("Example: %s -d -t 10 -p 5 -e\n", progname);
>>
>> Please add an example usage for each of these new arguments.
> 
> Will do.

okay.

> 
>>
>>> @@ -135,6 +141,28 @@ int main(int argc, char *argv[])
>>>  			else
>>>  				printf("WDIOC_SETTIMEOUT errno '%s'\n", strerror(errno));
>>>  			break;
>>> +		case 'T':
>>> +			ret = ioctl(fd, WDIOC_GETTIMEOUT, &flags);
>>> +			if (!ret)
>>> +				printf("Watchdog timeout set to %u seconds.\n", flags);
>>
>> It would good to make this message different from the WDIOC_SETTIMEOUT message.
>> Please update it to reflect that this is the result of a WDIOC_GETTIMEOUT.
> 
> Will update message to make distinct.
> 
>>
>> What would user intend to do with this GETTIMEOUT? Shouldn't this be the case that it
>> prints the current value and exits instead of the same logic as SETTIMEOUT option?
> 
> Are you suggesting setting the "oneshot" flag so the test app doesn't actually
> go into the while(1) keep_alive loop?
> 
> Watchdog drivers may adjust the requested value to match hardware constraints.
> Callers of set timeout (and set pretimeout) should call get timeout to see what
> value was actually set.
> 
> B/c of above, I just got into the habit of specifying both flags: first set,
> then get to make sure value set was what I intended.
> 
> But I can make the "Get" a one shot.  Just let me know if this is your preference.

I prefer that both GETs be oneshot. GETs should just print the current value and go
follow oneshot path. It doesn't make sense for them to do more.
> 
> 
>>
>>> +			else
>>> +				printf("WDIOC_GETTIMEOUT errno '%s'\n", strerror(errno))
>>
>> Shouldn't this error be an exit condition?
> 
> Hmmm, I don't see this error path much different than the error path for the
> other failing ioctl.  Am I missing something?

Yeah that is what I don't understand with the new code as well as the existing.
Shouldn't error path be handled differently. What is the point in doing more
other than gracefully exit closing the file? I don't think existing error paths
are doing this, probably they should.
> 
> 
> But, If we make the "GET" a one shot, then we wouldn't really need to
> special case the failure case as we wouldn't go into the keep_alive
> loop in either case.
> 
> 

Right.

> 
>>
>>> +			break;
>>> +		case 'n':
>>> +			flags = strtoul(optarg, NULL, 0);
>>> +			ret = ioctl(fd, WDIOC_SETPRETIMEOUT, &flags);
>>> +			if (!ret)
>>> +				printf("Watchdog pretimeout set to %u seconds.\n", flags);
>>> +			else
>>> +				printf("WDIOC_SETPRETIMEOUT errno '%s'\n", strerror(errno));
>>> +			break;
>>> +		case 'N':
>>> +			ret = ioctl(fd, WDIOC_GETPRETIMEOUT, &flags);
>>> +			if (!ret)
>>> +				printf("Watchdog pretimeout set to %u seconds.\n", flags);
>>
>> It would good to make this message different from the WDIOC_GETPRETIMEOUT message.
>> Please update it to reflect that this is the result of a WDIOC_GETPRETIMEOUT
> 
> will do.
> 

Okay.

>>
>> What would user intend to do with this GETTIMEOUT? Shouldn't this be the case that it
>> prints the current value and exits instead of the same logic as WDIOC_SETPRETIMEOUT?
> 
> I think you're just asking me to set the "oneshot" flag on this,
> which I can certainly do.

Correct. For couple of reasons. GET/SET_PRETIMEOUG might not be supported on all
platforms/drivers. It would make sense to handle error paths correctly.

> 
> But, some background on pretimeout that (I think) is interesting:
> 
> The underling HW for the watchdog on proliants allows for the pre-timeout to
> be enabled or disabled.  But if the pretimeout is enabled, the value of
> the pretimeout is hard coded by HW (9 seconds.)
> 
> The hpwdt driver allows for setting pretimeout by passing in a value
> 	0 < pretimeout < timeout
> to enable a pretimeout.  The user then needs to call get pretimeout to
> determine the actual value.
> 
> Failure to take into account the pretimeout when pinging the WD can lead to
> unexpected system crashes.
> 
> I've handled the following issue multiple times:
> 
> A user wants to set the timeout to value T and ping the WD every T/2 seconds.
> He fails to take into account the pretimeout value of P.  The system crashes
> with the pretimeout NMI when (T/2) < P.
> 
> The basic misunderstanding is that to prevent the WD from acting, the WD
> only needs to be pinged at least once every T seconds, when in actuality the
> WD needs to be pinged at least once every (T-P) seconds.
> 
> Specifically for Proliants, I've seen people set the timeout to 10 seconds
> thinking they had plenty of time to ping the WD only to be surprised when
> the pretimeout NMI takes the system down 1 second later.

In this case, this patch really doesn't solve the problem. You will still run
into this problem if user does a set. You are providing a way to check pretimeout,
however that is a separate operation. So I am not clear on how this patch solves
the issue of pretimeout NMI takes the system down.

> 
> Note: a WD doesn't need to support the pretimeout feature.

It isn't clear what this means?

> 
>>
>>> +			else
>>> +				printf("WDIOC_GETPRETIMEOUT errno '%s'\n", strerror(errno));
>>
>> Shouldn't this error be an exit condition?
> 
> Similar to above.  I can make GETPRETIMEOUT a "oneshot" to handle both the
> success/failing case of the ioctl call.
> 
>>
>>> +			break;
>>>  		default:
>>>  			usage(argv[0]);
>>>  			goto end;
>>>
>>
>> Also can you run this test as normal user?
> 
> No.  Must be run as root to open /dev/watchdog.  When /dev/watchdog is opened, the
> WD is started and if not updated properly, the system will crash.

Hmm. I don't understand why the system would panic if non-root user can't open the
device, at least in the context of this test. 

        fd = open("/dev/watchdog", O_WRONLY);

        if (fd == -1) {
                printf("Watchdog device not enabled.\n");
                exit(-1);
        }


Shouldn't it just exit based on the code above?

> 

> "cat /dev/watchdog" is one of my favorite ways to crash a system.  :)  :)

That doesn't sound great, if a non-root user can bring the system down!!
 
thanks,
-- Shuah

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ