linux-kernel - Re: [PATCH v5 00/19] vfs: add the ability to retry on ESTALE to several syscalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20120809081832.022e6a7d@corrin.poochiereds.net>
Date:	Thu, 9 Aug 2012 08:18:32 -0400
From:	Jeff Layton <jlayton@...hat.com>
To:	Namjae Jeon <linkinjeon@...il.com>
Cc:	viro@...iv.linux.org.uk, linux-fsdevel@...r.kernel.org,
	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
	michael.brantley@...haw.com, hch@...radead.org, miklos@...redi.hu,
	pstaubach@...grid.com
Subject: Re: [PATCH v5 00/19] vfs: add the ability to retry on ESTALE to
 several syscalls

On Thu, 9 Aug 2012 20:57:14 +0900
Namjae Jeon <linkinjeon@...il.com> wrote:

> Hi Jeff.
> 
> I still found ESTALE error although patching these patch-set.
> Is test method correct that I try to run estale_test on each nfs
> server and client at the same time ?
> 
> ./estale_test
> chmod: Stale NFS[  281.720000] ##### send signal from USER, SIG : 2,
> estale_test(107)->estale_test(102) sys_kill
> [  281.728000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(103) sys_kill
> [  281.736000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(104) sys_kill
> [  281.744000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(105) sys_kill
> [  281.752000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(106) sys_kill
> [  281.760000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(107) sys_kill
> [  281.768000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(108) sys_kill
> [  281.780000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(109) sys_kill
> [  281.788000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(110) sys_kill
> [  281.796000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(111) sys_kill
> [  281.804000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(112) sys_kill
> [  281.812000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(113) sys_kill
> [  281.820000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(114) sys_kill
> [  281.828000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(115) sys_kill
> [  281.840000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(116) sys_kill
> [  281.848000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(117) sys_kill
> [  281.856000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(118) sys_kill
> [  281.864000] ##### send signal from USER, SIG : 15,
> estale_test(102)->estale_test(119) sys_kill
>  file handle
> VDLinux#> chdir: Stale NFS[  282.664000] ##### send signal from USER,
> SIG : 2, estale_test(120)->???(102) sys_kill
>  file handle
> 
> Thanks.
> 

I guess you didn't read my response earlier? I'll re-post it here...

> It's a bit labor intensive, I'm afraid...
>
> Attached is a cleaned-up copy of the test program that Peter wrote to
> test his original patchset. The basic idea is to run this on both the
> client and server at the same time so they race against each other. He
> was able to run it overnight when testing with his patchset.
>
> With this patchset, that doesn't work since we're only retrying the
> lookup and call once. So, what I've been doing is modifying the program
> so that it just runs one test at a time, and sniffing traffic to see
> whether the lookups and calls are retried after an ESTALE return from
> the server. 


So, ESTALE errors are still expected when running that test. This
patchset only fixes a very specific set of circumstances where an entry
goes stale once between the lookup and the actual operation(s).
Anything outside of that, and it won't help.

That test is very aggressive, and can cause it to race multiple times.
You actually have to sniff traffic and look to see if the lookup and
call were reattempted after the ESTALE error.

-- 
Jeff Layton <jlayton@...hat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/