lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190528200659.GK5221@magnolia>
Date:   Tue, 28 May 2019 13:06:59 -0700
From:   "Darrick J. Wong" <darrick.wong@...cle.com>
To:     Amir Goldstein <amir73il@...il.com>
Cc:     Theodore Tso <tytso@....edu>, Jan Kara <jack@...e.cz>,
        Dave Chinner <david@...morbit.com>, Chris Mason <clm@...com>,
        Al Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-btrfs@...r.kernel.org,
        linux-api@...r.kernel.org
Subject: Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA

On Mon, May 27, 2019 at 08:26:55PM +0300, Amir Goldstein wrote:
> New link flags to request "atomic" link.
> 
> Signed-off-by: Amir Goldstein <amir73il@...il.com>
> ---
> 
> Hi Guys,
> 
> Following our discussions on LSF/MM and beyond [1][2], here is
> an RFC documentation patch.
> 
> Ted, I know we discussed limiting the API for linking an O_TMPFILE
> to avert the hardlinks issue, but I decided it would be better to
> document the hardlinks non-guaranty instead. This will allow me to
> replicate the same semantics and documentation to renameat(2).
> Let me know how that works out for you.
> 
> I also decided to try out two separate flags for data and metadata.
> I do not find any of those flags very useful without the other, but
> documenting them seprately was easier, because of the fsync/fdatasync
> reference.  In the end, we are trying to solve a social engineering
> problem, so this is the least confusing way I could think of to describe
> the new API.
> 
> First implementation of AT_ATOMIC_METADATA is expected to be
> noop for xfs/ext4 and probably fsync for btrfs.
> 
> First implementation of AT_ATOMIC_DATA is expected to be
> filemap_write_and_wait() for xfs/ext4 and probably fdatasync for btrfs.
> 
> Thoughts?
> 
> Amir.
> 
> [1] https://lwn.net/Articles/789038/
> [2] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjZm6E2TmCv8JOyQr7f-2VB0uFRy7XEp8HBHQmMdQg+6w@mail.gmail.com/
> 
>  man2/link.2 | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/man2/link.2 b/man2/link.2
> index 649ba00c7..15c24703e 100644
> --- a/man2/link.2
> +++ b/man2/link.2
> @@ -184,6 +184,57 @@ See
>  .BR openat (2)
>  for an explanation of the need for
>  .BR linkat ().
> +.TP
> +.BR AT_ATOMIC_METADATA " (since Linux 5.x)"
> +By default, a link operation followed by a system crash, may result in the
> +new file name being linked with old inode metadata, such as out dated time
> +stamps or missing extended attributes.
> +.BR fsync (2)
> +before linking the inode, but that involves flushing of volatile disk caches.
> +
> +A filesystem that accepts this flag will guaranty, that old inode metadata
> +will not be exposed in the new linked name.
> +Some filesystems may internally perform
> +.BR fsync (2)
> +before linking the inode to provide this guaranty,
> +but often, filesystems will have a more efficient method to provide this
> +guaranty without flushing volatile disk caches.
> +
> +A filesystem that accepts this flag does
> +.BR NOT
> +guaranty that the new file name will exist after a system crash, nor that the
> +current inode metadata is persisted to disk.

Hmmm.  I think it would be much clearer to state the two expectations in
the same place, e.g.:

"A filesystem that accepts this flag guarantees that after a successful
call completion, the filesystem will return either (a) the version of
the metadata that was on disk at the time the call completed; (b) a
newer version of that metadata; or (c) -ENOENT.  In other words, a
subsequent access of the file path will never return metadata that was
obsolete at the time that the call completed, even if the system crashes
immediately afterwards."

Did I get that right?  I /think/ this means that one could implement Ye
Olde Write And Rename as:

fd = open(".", O_TMPFILE...);
write(fd);
fsync(fd);
snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "file.txt",
	AT_EMPTY_PATH | AT_ATOMIC_DATA | AT_ATOMIC_METADATA);

(Still struggling to figure out what userspace programs would use this
for...)

--D

> +Specifically, if a file has hardlinks, the existance of the linked name after
> +a system crash does
> +.BR NOT
> +guaranty that any of the other file names exist, nor that the last observed
> +value of
> +.I st_nlink
> +(see
> +.BR stat (2))
> +has persisted.
> +.TP
> +.BR AT_ATOMIC_DATA " (since Linux 5.x)"
> +By default, a link operation followed by a system crash, may result in the
> +new file name being linked with old data or missing data.
> +One way to prevent this is to call
> +.BR fdatasync (2)
> +before linking the inode, but that involves flushing of volatile disk caches.
> +
> +A filesystem that accepts this flag will guaranty, that old data
> +will not be exposed in the new linked name.
> +Some filesystems may internally perform
> +.BR fsync (2)
> +before linking the inode to provide this guaranty,
> +but often, filesystems will have a more efficient method to provide this
> +guaranty without flushing volatile disk caches.
> +
> +A filesystem that accepts this flag does
> +.BR NOT
> +guaranty that the new file name will exist after a system crash, nor that the
> +current inode data is persisted to disk.
> +.TP
>  .SH RETURN VALUE
>  On success, zero is returned.
>  On error, \-1 is returned, and
> -- 
> 2.17.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ