lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72e16f18-d4ae-f963-fd09-5f1fa6885a1d@cambridgegreys.com>
Date:   Fri, 26 Feb 2021 15:40:13 +0000
From:   Anton Ivanov <anton.ivanov@...bridgegreys.com>
To:     Timo Rothenpieler <timo@...henpieler.org>,
        Bruce Fields <bfields@...ldses.org>
Cc:     Salvatore Bonaccorso <carnil@...ian.org>,
        Chuck Lever <chuck.lever@...cle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "940821@...s.debian.org" <940821@...s.debian.org>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        trond.myklebust@...merspace.com, anna.schumaker@...app.com
Subject: Re: NFS Caching broken in 4.19.37

On 26/02/2021 15:03, Timo Rothenpieler wrote:
> I think I can reproduce this, or something that at least looks very 
> similar to this, on 5.10. Namely on 5.10.17 (On both Client and Server).

I think this is a different issue - see below.

>
> We are running slurm, and since a while now (coincides with updating 
> from 5.4 to 5.10, but a whole bunch of other stuff was updated at the 
> same time, so it took me a while to correlate this) the logs it writes 
> have been truncated, but only while they're being observed on the 
> client, using tail -f or something like that.
>
> Looks like this then:
>
> On Server:
>> store01 /srv/export/home/users/timo/TestRun # ls -l slurm-41101.out
>> -rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
>> store01 /srv/export/home/users/timo/TestRun # wc -l slurm-41101.out
>> 61 slurm-41101.out
>
> On Client:
>> timo@...in01 ~/TestRun $ ls -l slurm-41101.out
>> -rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
>> timo@...in01 ~/TestRun $ wc -l slurm-41101.out
>> 24 slurm-41101.out
>
> See https://gist.github.com/BtbN/b9eb4fc08ccc53bb20087bce0bf9f826 for 
> the respective file-contents.
>
> If I run the same test job, wait until its done, and then look at its 
> slurm.out file, it matches between NFS Client and Server.
> If I tail -f the slurm.out on an NFS client, the file stops getting 
> updated on the client, but keeps getting more logs written to it on 
> the NFS server.
>
> The slurm.out file is being written to by another NFS client, which is 
> running on one of the compute nodes of the system. It's being reads 
> from a login node.

These are two different clients, then what you see is possible on NFS 
with client side caching. If you have multiple clients reading/writing 
to the same files you usually need to tune the caching options and/or 
use locking. I suspect that if you leave it for a while (until the cache 
expires) it will sort itself out.

In my test-case it is just one client, it missed a file deletion and 
nothing short of an unmount and remount fixes that. I have waited for 30 
mins+. It does not seem to refresh or expire. I also see the opposite 
behavior - the bug shows up on 4.x up to at least 5.4. I do not see it 
on 5.10.

Brgds,


>
>
>
>
> Timo
>
>
> On 21.02.2021 16:53, Anton Ivanov wrote:
>> Client side. This seems to be an entirely client side issue.
>>
>> A variety of kernels on the clients starting from 4.9 and up to 5.10 
>> using 4.19 servers. I have observed it on a 4.9 client versus 4.9 
>> server earlier.
>>
>> 4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.
>>
>> At present the server is at 4.19.67 in all tests.
>>
>> Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 
>> (2019-11-11) x86_64 GNU/Linux
>>
>> I can set-up a couple of alternative servers during the week, but so 
>> far everything is pointing towards a client fs cache issue, not a 
>> server one.
>>
>> Brgds,
>>
>
>

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ