[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b701f2b-d185-dd30-0aca-ba6d280221d5@rothenpieler.org>
Date: Fri, 26 Feb 2021 16:03:46 +0100
From: Timo Rothenpieler <timo@...henpieler.org>
To: Anton Ivanov <anton.ivanov@...bridgegreys.com>,
Bruce Fields <bfields@...ldses.org>
Cc: Salvatore Bonaccorso <carnil@...ian.org>,
Chuck Lever <chuck.lever@...cle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"940821@...s.debian.org" <940821@...s.debian.org>,
Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
trond.myklebust@...merspace.com, anna.schumaker@...app.com
Subject: Re: NFS Caching broken in 4.19.37
I think I can reproduce this, or something that at least looks very
similar to this, on 5.10. Namely on 5.10.17 (On both Client and Server).
We are running slurm, and since a while now (coincides with updating
from 5.4 to 5.10, but a whole bunch of other stuff was updated at the
same time, so it took me a while to correlate this) the logs it writes
have been truncated, but only while they're being observed on the
client, using tail -f or something like that.
Looks like this then:
On Server:
> store01 /srv/export/home/users/timo/TestRun # ls -l slurm-41101.out
> -rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
> store01 /srv/export/home/users/timo/TestRun # wc -l slurm-41101.out
> 61 slurm-41101.out
On Client:
> timo@...in01 ~/TestRun $ ls -l slurm-41101.out
> -rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
> timo@...in01 ~/TestRun $ wc -l slurm-41101.out
> 24 slurm-41101.out
See https://gist.github.com/BtbN/b9eb4fc08ccc53bb20087bce0bf9f826 for
the respective file-contents.
If I run the same test job, wait until its done, and then look at its
slurm.out file, it matches between NFS Client and Server.
If I tail -f the slurm.out on an NFS client, the file stops getting
updated on the client, but keeps getting more logs written to it on the
NFS server.
The slurm.out file is being written to by another NFS client, which is
running on one of the compute nodes of the system. It's being reads from
a login node.
Timo
On 21.02.2021 16:53, Anton Ivanov wrote:
> Client side. This seems to be an entirely client side issue.
>
> A variety of kernels on the clients starting from 4.9 and up to 5.10
> using 4.19 servers. I have observed it on a 4.9 client versus 4.9 server
> earlier.
>
> 4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.
>
> At present the server is at 4.19.67 in all tests.
>
> Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
> x86_64 GNU/Linux
>
> I can set-up a couple of alternative servers during the week, but so far
> everything is pointing towards a client fs cache issue, not a server one.
>
> Brgds,
>
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4494 bytes)
Powered by blists - more mailing lists