lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 13 Jun 2024 20:47:49 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>, viro@...iv.linux.org.uk, jack@...e.cz, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/2] lockref: speculatively spin waiting for the lock to
 be released

On Thu, Jun 13, 2024 at 8:43 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, 13 Jun 2024 at 11:13, Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > I would assume the rule is pretty much well known and instead an
> > indicator where is what (as added in my comments) would be welcome.
>
> Oh, the rule is well-known, but I think what is worth pointing out is
> the different classes of fields, and the name[] field in particular.
>
> This ordering was last really updated back in 2011, by commit
> 44a7d7a878c9 ("fs: cache optimise dentry and inode for rcu-walk"). And
> it was actually somewhat intentional at the time. Quoting from that
> commit:
>
>     We also fit in 8 bytes of inline name in the first 64 bytes, so for short
>     names, only 64 bytes needs to be touched to perform the lookup. We should
>     get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes
>     of name in there, which will take care of 81% rather than 32% of the kernel
>     tree.
>

Right. Things degrading by accident is why I suggested BUILD_BUG_ON.

> but what has actually really changed - and that I didn't even realize
> until I now did a 'pahole' on it, was that this was all COMPLETELY
> broken by
>
>        seqcount_spinlock_t        d_seq;
>
> because seqcount_spinlock_t has been entirely broken and went from
> being 4 bytes back when, to now being 64 bytes.
>
> Crazy crazy.
>
> How did that happen?
>

perhaps lockdep in your config?

this is the layout on my prod config:
struct dentry {
        unsigned int               d_flags;              /*     0     4 */
        seqcount_spinlock_t        d_seq;                /*     4     4 */
        struct hlist_bl_node       d_hash;               /*     8    16 */
        struct dentry *            d_parent;             /*    24     8 */
        struct qstr                d_name;               /*    32    16 */
        struct inode *             d_inode;              /*    48     8 */
        unsigned char              d_iname[40];          /*    56    40 */
        /* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
        const struct dentry_operations  * d_op;          /*    96     8 */
        struct super_block *       d_sb;                 /*   104     8 */
        long unsigned int          d_time;               /*   112     8 */
        void *                     d_fsdata;             /*   120     8 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        struct lockref             d_lockref
__attribute__((__aligned__(8))); /*   128     8 */
        union {
                struct list_head   d_lru;                /*   136    16 */
                wait_queue_head_t * d_wait;              /*   136     8 */
        };                                               /*   136    16 */
        struct hlist_node          d_sib;                /*   152    16 */
        struct hlist_head          d_children;           /*   168     8 */
        union {
                struct hlist_node  d_alias;              /*   176    16 */
                struct hlist_bl_node d_in_lookup_hash;   /*   176    16 */
                struct callback_head d_rcu
__attribute__((__aligned__(8))); /*   176    16 */
        } d_u __attribute__((__aligned__(8)));           /*   176    16 */

        /* size: 192, cachelines: 3, members: 16 */
        /* forced alignments: 2 */
} __attribute__((__aligned__(8)));


-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ