linux-ext4 - Semantic newlines (was: [man-pages PATCH v3] statx.2, open.2: document STATX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23596caf-db1a-0c22-70a5-6ff409282fd1@gmail.com>
Date:   Mon, 10 Oct 2022 18:15:42 +0200
From:   Alejandro Colomar <alx.manpages@...il.com>
To:     "Darrick J. Wong" <djwong@...nel.org>
Cc:     Eric Biggers <ebiggers@...nel.org>, linux-man@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, linux-xfs@...r.kernel.org,
        linux-fscrypt@...r.kernel.org, linux-block@...r.kernel.org,
        "G. Branden Robinson" <g.branden.robinson@...il.com>
Subject: Semantic newlines (was: [man-pages PATCH v3] statx.2, open.2:
 document STATX_DIOALIGN)

Hi Darrick,

On 10/10/22 17:22, Darrick J. Wong wrote:
> 
> I'm not so familiar with semantic newlines-- is there an automated

The following commit contains interesting details about them and their 
origins:

<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit?id=6ff6f43d68164f99a8c3fb66f4525d145571310c>

> reflow program that fixes these problems mechanically, or is this
> expected to be performed manually by manpage authors?

I don't know of a reflow program that fixes this.
The biggest issue is that
parsing natural language is not exactly easy.

So, it is expected to be performed manually by authors.

> 
> If manually, do the items in a comma-separated list count as clauses?

It depends.
Pedantically, yes;
but we evaluate it case by case,
depending on the length of each sentence
and the existence of subordinate clauses.
So author taste is important there and respected.

> 
> Would the next two paragraphs of this email reformat into semantic
> newlines like so?
> 
> 	In the source of a manual page,
> 	new sentences should  be started on new lines,
> 	long sentences should be split into lines at clause breaks
> 	(commas, semicolons, colons, and so on),
> 	and long clauses should be split at phrase boundaries.
> 	This convention,
> 	sometimes known as "semantic newlines",
> 	makes it easier to see the effect of patches,
> 	which often operate at the level of individual sentences, clauses, or phrases.
> 
 >
 > --D
 >
 >>> +If none of the above is available, then direct I/O support and 
alignment
 >>
 >> Please use semantic newlines.
 >>
 >> See man-pages(7):
 >>     Use semantic newlines
 >>         In the source of a manual page, new sentences  should  be
 >>         started on new lines, long sentences should be split into
 >>         lines  at  clause breaks (commas, semicolons, colons, and
 >>         so on), and long clauses should be split at phrase bound‐
 >>         aries.  This convention,  sometimes  known  as  "semantic
 >>         newlines",  makes it easier to see the effect of patches,
 >>         which often operate at the level of individual sentences,
 >>         clauses, or phrases.
 >>
 >>
 >>> +restrictions can only be assumed from known characteristics of the 
filesystem,
 >>> +the individual file, the underlying storage device(s), and the 
kernel version.
 >>> +In Linux 2.4, most block device based filesystems require that the 
file offset
 >>> +and the length and memory address of all I/O segments be multiples 
of the
 >>> +filesystem block size (typically 4096 bytes).
 >>> +In Linux 2.6.0, this was relaxed to the logical block size of the 
block device
 >>> +(typically 512 bytes).
 >>> +A block device's logical block size can be determined using the
 >>>    .BR ioctl (2)
 >>>    .B BLKSSZGET
 >>>    operation or from the shell using the command:
 >>> diff --git a/man2/statx.2 b/man2/statx.2
 >>> index 0d1b4591f..50397057d 100644
 >>> --- a/man2/statx.2
 >>> +++ b/man2/statx.2
 >>> @@ -61,7 +61,12 @@ struct statx {
 >>>           containing the filesystem where the file resides */
 >>>        __u32 stx_dev_major;   /* Major ID */
 >>>        __u32 stx_dev_minor;   /* Minor ID */
 >>> +
 >>>        __u64 stx_mnt_id;      /* Mount ID */
 >>> +
 >>> +    /* Direct I/O alignment restrictions */
 >>> +    __u32 stx_dio_mem_align;
 >>> +    __u32 stx_dio_offset_align;
 >>>    };
 >>>    .EE
 >>>    .in
 >>> @@ -247,6 +252,8 @@ STATX_BTIME	Want stx_btime
 >>>    STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
 >>>    	It is deprecated and should not be used.
 >>>    STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
 >>> +STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
 >>> +	(since Linux 6.1; support varies by filesystem)
 >>>    .TE
 >>>    .in
 >>>    .PP
 >>> @@ -407,6 +414,28 @@ This is the same number reported by
 >>>    .BR name_to_handle_at (2)
 >>>    and corresponds to the number in the first field in one of the 
records in
 >>>    .IR /proc/self/mountinfo .
 >>> +.TP
 >>> +.I stx_dio_mem_align
 >>> +The alignment (in bytes) required for user memory buffers for 
direct I/O
 >>> +.BR "" ( O_DIRECT )
 >>
 >> .RB and remove the "".
 >>
 >>> +on this file, or 0 if direct I/O is not supported on this file.
 >>> +.IP
 >>> +.B STATX_DIOALIGN
 >>> +.IR "" ( stx_dio_mem_align
 >>
 >> .RI
 >>
 >>> +and
 >>> +.IR stx_dio_offset_align )
 >>> +is supported on block devices since Linux 6.1.
 >>> +The support on regular files varies by filesystem; it is supported 
by ext4,
 >>> +f2fs, and xfs since Linux 6.1.
 >>> +.TP
 >>> +.I stx_dio_offset_align
 >>> +The alignment (in bytes) required for file offsets and I/O segment 
lengths for
 >>> +direct I/O
 >>> +.BR "" ( O_DIRECT )
 >>> +on this file, or 0 if direct I/O is not supported on this file.
 >>> +This will only be nonzero if
 >>> +.I stx_dio_mem_align
 >>> +is nonzero, and vice versa.
 >>>    .PP
 >>>    For further information on the above fields, see
 >>>    .BR inode (7).
 >>>
 >>> base-commit: bc28d289e5066fc626df260bafc249846a0f6ae6
 >>
 >> --
 >> <http://www.alejandro-colomar.es/>
 >
 >
 >
Yes, that would be correct;
in fact,
you almost matched the actual source code of the manual page.
There are two differences:
one comma at which we don't break (but we could),
and also we break the last line before the list.

See the source code here:
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man7/man-pages.7#n612>

> Do we still line-wrap at 72^W74^W78^W80 columns?

Yes, 80 is a strong limit.

Normally,
breaking at the level of clauses
will leave very few lines passing that limit.
When there's such a case,
you can break further at the level of phrases,
and I doubt any line will pass the 80-col boundary after that.

> 
> and would the proposed manpage text read:
> 
> 	If none of the above is available,
> 	then direct I/O support and alignment restrictions can only be assumed
> 	from known characteristics of the filesystem,
> 	the individual file,
> 	the underlying storage device(s),
> 	and the kernel version.
> 	In Linux 2.4,
> 	most block device based filesystems require that the file offset and the

block device based would need some '-' as it's a compound adjective (I 
don't know the exact rules in English when there are more than two words 
forming such an adjective, please check).

I would break after 'require that'.

> 	length and memory address of all I/O segments be multiples of the

And right before 'be', I think.

> 	filesystem block size
> 	(typically 4096 bytes).
> 	In Linux 2.6.0,
> 	this was relaxed to the logical block size of the block device
> 	(typically 512 bytes).
> 	A block device's logical block size can be determined using the
> 	.BR ioctl (2)
> 	.B BLKSSZGET
> 	operation or from the shell using the command:

But mostly looks good.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

Download attachment "OpenPGP_signature" of type "application/pgp-signature" (834 bytes)