lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b701f2b-d185-dd30-0aca-ba6d280221d5@rothenpieler.org>
Date:   Fri, 26 Feb 2021 16:03:46 +0100
From:   Timo Rothenpieler <timo@...henpieler.org>
To:     Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        Bruce Fields <bfields@...ldses.org>
Cc:     Salvatore Bonaccorso <carnil@...ian.org>,
        Chuck Lever <chuck.lever@...cle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "940821@...s.debian.org" <940821@...s.debian.org>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        trond.myklebust@...merspace.com, anna.schumaker@...app.com
Subject: Re: NFS Caching broken in 4.19.37

I think I can reproduce this, or something that at least looks very 
similar to this, on 5.10. Namely on 5.10.17 (On both Client and Server).

We are running slurm, and since a while now (coincides with updating 
from 5.4 to 5.10, but a whole bunch of other stuff was updated at the 
same time, so it took me a while to correlate this) the logs it writes 
have been truncated, but only while they're being observed on the 
client, using tail -f or something like that.

Looks like this then:

On Server:
> store01 /srv/export/home/users/timo/TestRun # ls -l slurm-41101.out
> -rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
> store01 /srv/export/home/users/timo/TestRun # wc -l slurm-41101.out
> 61 slurm-41101.out

On Client:
> timo@...in01 ~/TestRun $ ls -l slurm-41101.out
> -rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
> timo@...in01 ~/TestRun $ wc -l slurm-41101.out
> 24 slurm-41101.out

See https://gist.github.com/BtbN/b9eb4fc08ccc53bb20087bce0bf9f826 for 
the respective file-contents.

If I run the same test job, wait until its done, and then look at its 
slurm.out file, it matches between NFS Client and Server.
If I tail -f the slurm.out on an NFS client, the file stops getting 
updated on the client, but keeps getting more logs written to it on the 
NFS server.

The slurm.out file is being written to by another NFS client, which is 
running on one of the compute nodes of the system. It's being reads from 
a login node.



Timo


On 21.02.2021 16:53, Anton Ivanov wrote:
> Client side. This seems to be an entirely client side issue.
> 
> A variety of kernels on the clients starting from 4.9 and up to 5.10 
> using 4.19 servers. I have observed it on a 4.9 client versus 4.9 server 
> earlier.
> 
> 4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.
> 
> At present the server is at 4.19.67 in all tests.
> 
> Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) 
> x86_64 GNU/Linux
> 
> I can set-up a couple of alternative servers during the week, but so far 
> everything is pointing towards a client fs cache issue, not a server one.
> 
> Brgds,
> 



Download attachment "smime.p7s" of type "application/pkcs7-signature" (4494 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ