[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4130431c-0fed-6861-9c98-63efcc410d97@gmail.com>
Date: Sun, 28 May 2017 18:27:06 +0200
From: "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To: "Darrick J. Wong" <darrick.wong@...cle.com>
Cc: mtk.manpages@...il.com, Andreas Dilger <adilger@...ger.ca>,
Eric Biggers <ebiggers3@...il.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Theodore Ts'o <tytso@....edu>, Jann Horn <jannh@...gle.com>,
Andy Lutomirski <luto@...capital.net>,
linux-xfs@...r.kernel.org,
Linux FS Devel <linux-fsdevel@...r.kernel.org>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
linux-man <linux-man@...r.kernel.org>,
linux-btrfs <linux-btrfs@...r.kernel.org>
Subject: Re: [PATCH v2] ioctl_getfsmap.2: document the GETFSMAP ioctl
On 05/18/2017 04:07 AM, Darrick J. Wong wrote:
> Document the new GETFSMAP ioctl that returns the physical layout of a
> (disk-based) filesystem.
Thanks, Darrick! Applied (with a few minor edits). (Currently sitting in
a local branch, just in case anyone sends review comments that need
integrating.)
Cheers,
Michael
> Signed-off-by: Darrick J. Wong <darrick.wong@...cle.com>
> ---
> v2: emphasize that filesystems are not obligated to return inode numbers
> ---
> man2/ioctl_getfsmap.2 | 375 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 375 insertions(+)
> create mode 100644 man2/ioctl_getfsmap.2
>
> diff --git a/man2/ioctl_getfsmap.2 b/man2/ioctl_getfsmap.2
> new file mode 100644
> index 0000000..b451950
> --- /dev/null
> +++ b/man2/ioctl_getfsmap.2
> @@ -0,0 +1,375 @@
> +.\" Copyright (c) 2017, Oracle. All rights reserved.
> +.\"
> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> +.\" This is free documentation; you can redistribute it and/or
> +.\" modify it under the terms of the GNU General Public License as
> +.\" published by the Free Software Foundation; either version 2 of
> +.\" the License, or (at your option) any later version.
> +.\"
> +.\" The GNU General Public License's references to "object code"
> +.\" and "executables" are to be interpreted as the output of any
> +.\" document formatting or typesetting system, including
> +.\" intermediate and printed output.
> +.\"
> +.\" This manual is distributed in the hope that it will be useful,
> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +.\" GNU General Public License for more details.
> +.\"
> +.\" You should have received a copy of the GNU General Public
> +.\" License along with this manual; if not, see
> +.\" <http://www.gnu.org/licenses/>.
> +.\" %%%LICENSE_END
> +.TH IOCTL-GETFSMAP 2 2017-02-10 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +ioctl_getfsmap \- retrieve the physical layout of the filesystem
> +.SH SYNOPSIS
> +.br
> +.B #include <sys/ioctl.h>
> +.br
> +.B #include <linux/fs.h>
> +.br
> +.B #include <linux/fsmap.h>
> +.sp
> +.BI "int ioctl(int " fd ", FS_IOC_GETFSMAP, struct fsmap_head * " arg );
> +.SH DESCRIPTION
> +This
> +.BR ioctl (2)
> +retrieves physical extent mappings for a filesystem.
> +This information can be used to discover which files are mapped to a physical
> +block, examine free space, or find known bad blocks, among other things.
> +
> +The sole argument to this ioctl should be a pointer to a single
> +.BR "struct fsmap_head" ":"
> +.in +4n
> +.nf
> +
> +struct fsmap {
> + __u32 fmr_device; /* device id */
> + __u32 fmr_flags; /* mapping flags */
> + __u64 fmr_physical; /* device offset of segment */
> + __u64 fmr_owner; /* owner id */
> + __u64 fmr_offset; /* file offset of segment */
> + __u64 fmr_length; /* length of segment */
> + __u64 fmr_reserved[3]; /* must be zero */
> +};
> +
> +struct fsmap_head {
> + __u32 fmh_iflags; /* control flags */
> + __u32 fmh_oflags; /* output flags */
> + __u32 fmh_count; /* # of entries in array incl. input */
> + __u32 fmh_entries; /* # of entries filled in (output). */
> + __u64 fmh_reserved[6]; /* must be zero */
> +
> + struct fsmap fmh_keys[2]; /* low and high keys for the mapping search */
> + struct fsmap fmh_recs[]; /* returned records */
> +};
> +
> +.fi
> +.in
> +The two
> +.I fmh_keys
> +array elements specify the lowest and highest reverse-mapping
> +keys, respectively, for which userspace would like physical mapping
> +information.
> +A reverse mapping key consists of the tuple (device, block, owner, offset).
> +The owner and offset fields are part of the key because some filesystems
> +support sharing physical blocks between multiple files and
> +therefore may return multiple mappings for a given physical block.
> +.PP
> +Filesystem mappings are copied into the
> +.I fmh_recs
> +array, which immediately follows the header data.
> +.SS Fields of struct fsmap_head
> +.PP
> +The
> +.I fmh_iflags
> +field is a bitmask passed to the kernel to alter the output.
> +There are no flags defined, so callers must set this value to zero.
> +
> +.PP
> +The
> +.I fmh_oflags
> +field is a bitmask of flags set by the kernel concerning the returned mappings.
> +If
> +.B FMH_OF_DEV_T
> +is set, then the
> +.I fmr_device
> +field represents a
> +.B dev_t
> +structure containing the major and minor numbers of the block device.
> +
> +.PP
> +The
> +.I fmh_count
> +field contains the number of elements in the array being passed to the
> +kernel.
> +If this value is 0,
> +.I fmh_entries
> +will be set to the number of records that would have been returned had
> +the array been large enough;
> +no mapping information will be returned.
> +
> +.PP
> +The
> +.I fmh_entries
> +field contains the number of elements in the
> +.I fmh_recs
> +array that contain useful information.
> +
> +.PP
> +The
> +.I fmh_reserved
> +fields must be set to zero.
> +
> +.SS Keys
> +.PP
> +The two key records in
> +.B fsmap_head.fmh_keys
> +specify the lowest and highest extent records in the keyspace that the caller
> +wants returned.
> +A filesystem that can share blocks between files likely requires the tuple
> +.RI "(" "device" ", " "physical" ", " "owner" ", " "offset" ", " "flags" ")"
> +to uniquely index any filesystem mapping record.
> +Classic non-sharing filesystems might be able to identify any record with only
> +.RI "(" "device" ", " "physical" ", " "flags" ")."
> +For example, if the low key is set to (8:0, 36864, 0, 0, 0), the filesystem will
> +only return records for extents starting at or above 36KiB on disk.
> +If the high key is set to (8:0, 1048576, 0, 0, 0), only records below 1MiB will
> +be returned.
> +The format of
> +.B fmr_device
> +in the keys must match the format of the same field in the output records,
> +as defined below.
> +By convention, the field
> +.B fsmap_head.fmh_keys[0]
> +must contain the low key and
> +.B fsmap_head.fmh_keys[1]
> +must contain the high key for the request.
> +.PP
> +For convenience, if
> +.B fmr_length
> +is set in the low key, it will be added to
> +.IR fmr_block " or " fmr_offset
> +as appropriate.
> +The caller can take advantage of this subtlety to set up subsequent calls
> +by copying
> +.B fsmap_head.fmh_recs[fsmap_head.fmh_entries - 1]
> +into the low key.
> +The function
> +.B fsmap_advance
> +provides this functionality.
> +
> +.SS Fields of struct fsmap
> +.PP
> +The
> +.I fmr_device
> +field uniquely identifies the underlying storage device.
> +If the
> +.B FMH_OF_DEV_T
> +flag is set in the header's
> +.I fmh_oflags
> +field, this field contains a
> +.B dev_t
> +from which major and minor numbers can be extracted.
> +If the flag is not set, this field contains a value that must be unique
> +for each unique storage device.
> +
> +.PP
> +The
> +.I fmr_physical
> +field contains the disk address of the extent in bytes.
> +
> +.PP
> +The
> +.I fmr_owner
> +field contains the owner of the extent.
> +This is an inode number unless
> +.B FMR_OF_SPECIAL_OWNER
> +is set in the
> +.I fmr_flags
> +field, in which case the value is determined by the filesystem.
> +See the section below about owner values for more details.
> +
> +.PP
> +The
> +.I fmr_offset
> +field contains the logical address in the mapping record in bytes.
> +This field has no meaning if the
> +.BR FMR_OF_SPECIAL_OWNER " or " FMR_OF_EXTENT_MAP
> +flags are set in
> +.IR fmr_flags "."
> +
> +.PP
> +The
> +.I fmr_length
> +field contains the length of the extent in bytes.
> +
> +.PP
> +The
> +.I fmr_flags
> +field is a bitmask of extent state flags.
> +The bits are:
> +.RS 0.4i
> +.TP
> +.B FMR_OF_PREALLOC
> +The extent is allocated but not yet written.
> +.TP
> +.B FMR_OF_ATTR_FORK
> +This extent contains extended attribute data.
> +.TP
> +.B FMR_OF_EXTENT_MAP
> +This extent contains extent map information for the owner.
> +.TP
> +.B FMR_OF_SHARED
> +Parts of this extent may be shared.
> +.TP
> +.B FMR_OF_SPECIAL_OWNER
> +The
> +.I fmr_owner
> +field contains a special value instead of an inode number.
> +.TP
> +.B FMR_OF_LAST
> +This is the last record in the filesystem.
> +.RE
> +
> +.PP
> +The
> +.I fmr_reserved
> +field will be set to zero.
> +
> +.SS Owner Values
> +Generally, the value of the
> +.I fmr_owner
> +field for non-metadata extents should be an inode number.
> +However, filesystems are under no obligation to report inode numbers;
> +they may instead report
> +.B FMR_OWN_UNKNOWN
> +if the inode number cannot easily be retrieved, if the caller lacks
> +sufficient privilege, if the filesystem does not support stable
> +inode numbers, or for any other reason.
> +If a filesystem wishes to condition the reporting of inode numbers based
> +on process capabilities, it is strongly urged that the
> +.B CAP_SYS_ADMIN
> +capability be used for this purpose.
> +.TP
> +The following special owner values are generic to all filesystems:
> +.RS 0.4i
> +.TP
> +.B FMR_OWN_FREE
> +Free space.
> +.TP
> +.B FMR_OWN_UNKNOWN
> +This extent is in use but its owner is not known or not easily retrieved.
> +.TP
> +.B FMR_OWN_METADATA
> +This extent is filesystem metadata.
> +.RE
> +
> +XFS can return the following special owner values:
> +.RS 0.4i
> +.TP
> +.B XFS_FMR_OWN_FREE
> +Free space.
> +.TP
> +.B XFS_FMR_OWN_UNKNOWN
> +This extent is in use but its owner is not known or not easily retrieved.
> +.TP
> +.B XFS_FMR_OWN_FS
> +Static filesystem metadata which exists at a fixed address.
> +These are the AG superblock, the AGF, the AGFL, and the AGI headers.
> +.TP
> +.B XFS_FMR_OWN_LOG
> +The filesystem journal.
> +.TP
> +.B XFS_FMR_OWN_AG
> +Allocation group metadata, such as the free space btrees and the
> +reverse mapping btrees.
> +.TP
> +.B XFS_FMR_OWN_INOBT
> +The inode and free inode btrees.
> +.TP
> +.B XFS_FMR_OWN_INODES
> +Inode records.
> +.TP
> +.B XFS_FMR_OWN_REFC
> +Reference count information.
> +.TP
> +.B XFS_FMR_OWN_COW
> +This extent is being used to stage a copy-on-write.
> +.TP
> +.B XFS_FMR_OWN_DEFECTIVE:
> +This extent has been marked defective either by the filesystem or the
> +underlying device.
> +.RE
> +
> +ext4 can return the following special owner values:
> +.RS 0.4i
> +.TP
> +.B EXT4_FMR_OWN_FREE
> +Free space.
> +.TP
> +.B EXT4_FMR_OWN_UNKNOWN
> +This extent is in use but its owner is not known or not easily retrieved.
> +.TP
> +.B EXT4_FMR_OWN_FS
> +Static filesystem metadata which exists at a fixed address.
> +This is the superblock and the group descriptors.
> +.TP
> +.B EXT4_FMR_OWN_LOG
> +The filesystem journal.
> +.TP
> +.B EXT4_FMR_OWN_INODES
> +Inode records.
> +.TP
> +.B EXT4_FMR_OWN_BLKBM
> +Block bitmap.
> +.TP
> +.B EXT4_FMR_OWN_INOBM
> +Inode bitmap.
> +.RE
> +
> +.SH RETURN VALUE
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.PP
> +.SH ERRORS
> +Error codes can be one of, but are not limited to, the following:
> +.TP
> +.B EINVAL
> +The array is not long enough, the keys do not point to a valid part of
> +the filesystem, the low key points to a higher point in the filesystem's
> +physical storage address space than the high key, or a non-zero value
> +was passed in one of the fields that must be zero.
> +.TP
> +.B EFAULT
> +The pointer passed in was not mapped to a valid memory address.
> +.TP
> +.B EBADF
> +.IR fd
> +is not open for reading.
> +.TP
> +.B EOPNOTSUPP
> +The filesystem does not support this command.
> +.TP
> +.B EUCLEAN
> +The filesystem metadata is corrupt and needs repair.
> +.TP
> +.B EBADMSG
> +The filesystem has detected a checksum error in the metadata.
> +.TP
> +.B ENOMEM
> +Insufficient memory to process the request.
> +
> +.SH EXAMPLE
> +.TP
> +Please see io/fsmap.c in the xfsprogs distribution for a sample program.
> +
> +.SH CONFORMING TO
> +This API is Linux-specific.
> +Not all filesystems support it.
> +.fi
> +.in
> +.SH SEE ALSO
> +.BR ioctl (2)
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Powered by blists - more mailing lists