linux-kernel - Re: [PATCH v2 1/5] fat: allocate persistent inode numbers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKYAXd-5CQrE=232NKk9pRjzP3q8OBryeLt9mH==f=zrKdjzNg@mail.gmail.com>
Date:	Thu, 13 Sep 2012 17:11:54 +0900
From:	Namjae Jeon <linkinjeon@...il.com>
To:	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Cc:	"J. Bruce Fields" <bfields@...ldses.org>,
	"Steven J. Magnani" <steve@...idescorp.com>,
	Al Viro <viro@...iv.linux.org.uk>, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org,
	Namjae Jeon <namjae.jeon@...sung.com>,
	Ravishankar N <ravi.n1@...sung.com>,
	Amit Sahrawat <a.sahrawat@...sung.com>
Subject: Re: [PATCH v2 1/5] fat: allocate persistent inode numbers

> >> -> ESTALE will be returned -> discard old FH -> restart from LOOKUP ->
> >> make cached inode -> use returned new FH.
> >>
> >> Yeah, I know this is unstable (there is no perfect solution for now),
> >
> > You may end up with a totally different file, of course:
> >
> >     client:                 server:
> >
> >     open "/foo/bar"
> >                             rename "/foo/baz"->"/foo/bar"
> >     write to file
> >
> > And now we're writing to the file that was originally named /foo/baz
> > when we should have gotten ESTALE.
>
> I see. So, client can't solve the ESTALE if inode cache was evicted,
> right? (without application changes)

There can be situation where we may get not only ESTALE but EIO also.

For example,
-------------------------------
fd = open(“foo.txt”);
while (1) {
       sleep(1);
       write(fd..);
}
--------------------------------

Here “write” may fail when inode number of “foo.txt” is changed at
server due to cache eviction under memory pressure.
When we tried a similar test, we found that “write” is retuning “EIO”
instead of “ESTALE”

---------------------------------------------------------------------------------------------------------
#> ./write_test_dbg bbb 1000 0
FILE : bbb, SIZE : 1048576000 , FSYNC : OFF , RECORD_SIZE = 4096
106264 -rwxr-xr-x 1 root 0 0 Jan 1 00:14 bbb
write failed after 60080128 bytes:, errno = 5: Input/output error
---------------------------------------------------------------------------------------------------------

 As we get EIO instead of ESTALE, it may be difficult to decide when
"restart from LOOKUP” in such situation.
Also, as per Bruce opinion, we can not avoid ESTALE from inode number
change in rebooted server case.
In reboot case, it is worst as it may attempt to write in a different
file if NFS handle at NFS client match with inode number of some other
file at NFS server.

> We would want to compare client solution (-mm) and server solution
> (stable ino). Or I'd like to know which my knowledges/understanding are
> wrong here.
> I see. So, client can't solve the ESTALE if inode cache was evicted,
> right? (without application changes)
> I don't see how.

Yes, I think we can not fix inode number changing issue on two
situation (reboot, inode cache eviction).
And Inode number can change because currently FAT is not allocating
stable inode number in this situation.

> Grepping around... Documentation/sysctl/vm.txt mentions a
> vfs_cache_pressure parameter.
> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
I am worrying if it could lead to OOM condition on embedded
system(short memory(DRAM) and support 3TB HDD disk of big size.)

Please let me know if any issues or queries.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/