[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjTZ=6QkE_eksL+kzywj2cA_kiY-ydZKoz-+kBQwtNWwQ@mail.gmail.com>
Date: Fri, 29 Sep 2023 09:22:14 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Amir Goldstein <amir73il@...il.com>
Cc: "Theodore Ts'o" <tytso@....edu>, Jeff Layton <jlayton@...nel.org>,
"Darrick J. Wong" <djwong@...nel.org>,
Arnd Bergmann <arnd@...db.de>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
David Sterba <dsterba@...e.cz>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Kees Cook <keescook@...omium.org>, Jeremy Kerr <jk@...abs.org>,
Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arve Hjønnevåg <arve@...roid.com>,
Todd Kjos <tkjos@...roid.com>,
Martijn Coenen <maco@...roid.com>,
Joel Fernandes <joel@...lfernandes.org>,
Carlos Llamas <cmllamas@...gle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Mattia Dongili <malattia@...ux.it>,
Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Leon Romanovsky <leon@...nel.org>,
Brad Warrum <bwarrum@...ux.ibm.com>,
Ritu Agarwal <rituagar@...ux.ibm.com>,
Hans de Goede <hdegoede@...hat.com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
Mark Gross <markgross@...nel.org>,
Jiri Slaby <jirislaby@...nel.org>,
Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>,
Dominique Martinet <asmadeus@...ewreck.org>,
Christian Schoenebeck <linux_oss@...debyte.com>,
David Sterba <dsterba@...e.com>,
David Howells <dhowells@...hat.com>,
Marc Dionne <marc.dionne@...istor.com>,
Ian Kent <raven@...maw.net>,
Luis de Bethencourt <luisbg@...nel.org>,
Salah Triki <salah.triki@...il.com>,
"Tigran A. Aivazian" <aivazian.tigran@...il.com>,
Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
Xiubo Li <xiubli@...hat.com>,
Ilya Dryomov <idryomov@...il.com>,
Jan Harkes <jaharkes@...cmu.edu>, coda@...cmu.edu,
Joel Becker <jlbec@...lplan.org>,
Christoph Hellwig <hch@....de>,
Nicolas Pitre <nico@...xnic.net>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Ard Biesheuvel <ardb@...nel.org>, Gao Xiang <xiang@...nel.org>,
Chao Yu <chao@...nel.org>,
Yue Hu <huyue2@...jj8bn.sched.sma.tdnsstic1.cn>,
Jeffle Xu <jefflexu@...ux.alibaba.com>,
Namjae Jeon <linkinjeon@...nel.org>,
Sungjong Seo <sj1557.seo@...sung.com>,
Jan Kara <jack@...e.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Jaegeuk Kim <jaegeuk@...nel.org>,
OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
Christoph Hellwig <hch@...radead.org>,
Miklos Szeredi <miklos@...redi.hu>,
Bob Peterson <rpeterso@...hat.com>,
Andreas Gruenbacher <agruenba@...hat.com>,
Richard Weinberger <richard@....at>,
Anton Ivanov <anton.ivanov@...bridgegreys.com>,
Johannes Berg <johannes@...solutions.net>,
Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>,
Mike Kravetz <mike.kravetz@...cle.com>,
Muchun Song <muchun.song@...ux.dev>, Jan Kara <jack@...e.cz>,
David Woodhouse <dwmw2@...radead.org>,
Dave Kleikamp <shaggy@...nel.org>, Tejun Heo <tj@...nel.org>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna@...nel.org>,
Chuck Lever <chuck.lever@...cle.com>,
Neil Brown <neilb@...e.de>,
Olga Kornievskaia <kolga@...app.com>,
Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
Ryusuke Konishi <konishi.ryusuke@...il.com>,
Anton Altaparmakov <anton@...era.com>,
Konstantin Komarov <almaz.alexandrovich@...agon-software.com>,
Mark Fasheh <mark@...heh.com>,
Joseph Qi <joseph.qi@...ux.alibaba.com>,
Bob Copeland <me@...copeland.com>,
Mike Marshall <hubcap@...ibond.com>,
Martin Brandenburg <martin@...ibond.com>,
Luis Chamberlain <mcgrof@...nel.org>,
Iurii Zaikin <yzaikin@...gle.com>,
Tony Luck <tony.luck@...el.com>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
Anders Larsen <al@...rsen.net>,
Steve French <sfrench@...ba.org>,
Paulo Alcantara <pc@...guebit.com>,
Ronnie Sahlberg <lsahlber@...hat.com>,
Shyam Prasad N <sprasad@...rosoft.com>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Phillip Lougher <phillip@...ashfs.org.uk>,
Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Evgeniy Dushistov <dushistov@...l.ru>,
Chandan Babu R <chandan.babu@...cle.com>,
Damien Le Moal <dlemoal@...nel.org>,
Naohiro Aota <naohiro.aota@....com>,
Johannes Thumshirn <jth@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
John Johansen <john.johansen@...onical.com>,
Paul Moore <paul@...l-moore.com>,
James Morris <jmorris@...ei.org>,
"Serge E. Hallyn" <serge@...lyn.com>,
Stephen Smalley <stephen.smalley.work@...il.com>,
Eric Paris <eparis@...isplace.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, platform-driver-x86@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-serial@...r.kernel.org,
linux-usb@...r.kernel.org, v9fs@...ts.linux.dev,
linux-afs@...ts.infradead.org, autofs@...r.kernel.org,
linux-btrfs@...r.kernel.org, ceph-devel@...r.kernel.org,
codalist@...emann.coda.cs.cmu.edu, linux-efi@...r.kernel.org,
linux-erofs@...ts.ozlabs.org, linux-ext4@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net, gfs2@...ts.linux.dev,
linux-um@...ts.infradead.org, linux-mtd@...ts.infradead.org,
jfs-discussion@...ts.sourceforge.net, linux-nfs@...r.kernel.org,
linux-nilfs@...r.kernel.org, linux-ntfs-dev@...ts.sourceforge.net,
ntfs3@...ts.linux.dev, ocfs2-devel@...ts.linux.dev,
linux-karma-devel@...ts.sourceforge.net, devel@...ts.orangefs.org,
linux-unionfs@...r.kernel.org, linux-hardening@...r.kernel.org,
reiserfs-devel@...r.kernel.org, linux-cifs@...r.kernel.org,
samba-technical@...ts.samba.org,
linux-trace-kernel@...r.kernel.org, linux-xfs@...r.kernel.org,
bpf@...r.kernel.org, Netdev <netdev@...r.kernel.org>,
apparmor@...ts.ubuntu.com, linux-security-module@...r.kernel.org,
selinux@...r.kernel.org
Subject: Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers
On Thu, 28 Sept 2023 at 20:50, Amir Goldstein <amir73il@...il.com> wrote:
>
> OTOH, it is perfectly fine if the vfs wants to stop providing sub 100ns
> services to filesystems. It's just going to be the fs problem and the
> preserved pre-historic/fine-grained time on existing files would only
> need to be provided in getattr(). It does not need to be in __i_mtime.
Hmm. That sounds technically sane, but for one thing: if the aim is to try to do
(a) atomic timestamp access
(b) shrink the inode
then having the filesystem maintain its own timestamp for fine-grained
data will break both of those goals.
Yes, we'd make 'struct inode' smaller if we pack the times into one
64-bit entity, but if btrfs responds by adding mtime fields to "struct
btrfs_inode", we lost the size advantage and only made things worse.
And if ->getattr() then reads those fields without locking (and we
definitely don't want locking in that path), then we lost the
atomicity thing too.
So no. A "but the filesystem can maintain finer granularity" model is
not acceptable, I think.
If we do require nanoseconds for compatibility, what we could possibly
do is say "we guarantee nanosecond values for *legacy* dates", and say
that future dates use 100ns resolution. We'd define "legacy dates" to
be the traditional 32-bit signed time_t.
So with a 64-bit fstime_t, we'd have the "legacy format":
- top 32 bits are seconds, bottom 32 bits are ns
which gives us that ns format.
Then, because only 30 bits are needed for nanosecond resolution, we
use the top two bits of that ns field as flags. '00' means that legacy
format, and '01' would mean "we're not doing nanosecond resolution,
we're doing 64ns resolution, and the low 6 bits of the ns field are
actually bits 32-37 of the seconds field".
That still gives us some extensibility (unless the multi-grain code
still wants to use the other top bit), and it gives us 40 bits of
seconds, which is quite a lot.
And all the conversion functions will be simple bit field
manipulations, so there are no expensive ops here.
Anyway, I agree with the "let's introduce the accessor functions
first, we can do the 'pack into one word' decisions later".
Linus
Powered by blists - more mailing lists