[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <167823124256.8008.4738010782615192469@noble.neil.brown.name>
Date: Wed, 08 Mar 2023 10:20:42 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Jerry Zhang" <jerry@...dio.com>
Cc: embedded@...dio.com, "Chuck Lever" <chuck.lever@...cle.com>,
"Jeff Layton" <jlayton@...nel.org>,
"Trond Myklebust" <trond.myklebust@...merspace.com>,
"Anna Schumaker" <anna@...nel.org>,
"J. Bruce Fields" <bfields@...hat.com>, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sunrpc: Fix incorrect parsing of expiry time
On Wed, 08 Mar 2023, Jerry Zhang wrote:
> On Tue, Mar 7, 2023 at 2:31 PM NeilBrown <neilb@...e.de> wrote:
> >
> > On Wed, 08 Mar 2023, Jerry Zhang wrote:
> > > The expiry time field is mean to be expressed in seconds since boot.
> >
> > Correct.
> >
> > > The get_expiry() function parses a relative time value in seconds.
> >
> > Incorrect. It parses and absoulte wall-clock time.
> I'm not familiar with the source of truth for this info. Is there a
> specification of some sort?
>
> For reference, we were seeing writes to
> /proc/net/rpc/nfsd.export/channel randomly fail with EINVAL despite
> usually succeeding with the same invocation. Upon investigation this
> was the string that exportfs was writing "-test-client- /path/to/mount
> 3 0 65534 65534 0". "3" was the value for expiry in this message,
> which led me to conclude that this is a relative field. If it isn't,
> perhaps this is a bug in userspace nfs tools?
The above information is very useful. This sort of detail should always
be included with a bug report, or a patch proposing to fix a bug.
The intent of that "3" is to be a time in the past. We don't want the
-test-client- entry to be added to the cache, but we want a failure
message if the path cannot be exported. So we set a time in the past as
the expiry time.
Using 0 is awkward as it often has special meaning, so I chose '3'.
>
> The failure in this was if nfs-server starts exactly 3s after bootup,
> boot.tv_sec would be 3 and thus get_expiry() returns 0, causing a
> failure to be returned.
I don't understand this. getboottime64() doesn't report time since boot.
It reports the time when the system booted. It only changes when the
system time is deliberately changed.
At boot, it presumably reports 0. As soon as some tool (e.g. systemd or
ntpdate) determines what the current time it and calls settimeofday() or
a similar function, the system time is changed, and the boot-time is
changed by the same amount. Typically this will make it well over 1
billion (for anything booted this century).
So for the boot time to report as '3', something would need to set the
current time to a moment early in January 1970. I'd be surprised if
anything is doing that.
How much tracing have you done? Have you printed out the value of
boot.tv_sec and confirmed that it is '3' or have you only deduced it
from other evidence.
Exactly what firm evidence do you have?
Thanks,
NeilBrown
Powered by blists - more mailing lists