[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHGdWQYH8pRu1B5NLRa_6EKPR6hm5vOf+fyjvUzm1po8VQ@mail.gmail.com>
Date: Thu, 13 Jun 2024 20:47:49 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>, viro@...iv.linux.org.uk, jack@...e.cz,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/2] lockref: speculatively spin waiting for the lock to
be released
On Thu, Jun 13, 2024 at 8:43 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, 13 Jun 2024 at 11:13, Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > I would assume the rule is pretty much well known and instead an
> > indicator where is what (as added in my comments) would be welcome.
>
> Oh, the rule is well-known, but I think what is worth pointing out is
> the different classes of fields, and the name[] field in particular.
>
> This ordering was last really updated back in 2011, by commit
> 44a7d7a878c9 ("fs: cache optimise dentry and inode for rcu-walk"). And
> it was actually somewhat intentional at the time. Quoting from that
> commit:
>
> We also fit in 8 bytes of inline name in the first 64 bytes, so for short
> names, only 64 bytes needs to be touched to perform the lookup. We should
> get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes
> of name in there, which will take care of 81% rather than 32% of the kernel
> tree.
>
Right. Things degrading by accident is why I suggested BUILD_BUG_ON.
> but what has actually really changed - and that I didn't even realize
> until I now did a 'pahole' on it, was that this was all COMPLETELY
> broken by
>
> seqcount_spinlock_t d_seq;
>
> because seqcount_spinlock_t has been entirely broken and went from
> being 4 bytes back when, to now being 64 bytes.
>
> Crazy crazy.
>
> How did that happen?
>
perhaps lockdep in your config?
this is the layout on my prod config:
struct dentry {
unsigned int d_flags; /* 0 4 */
seqcount_spinlock_t d_seq; /* 4 4 */
struct hlist_bl_node d_hash; /* 8 16 */
struct dentry * d_parent; /* 24 8 */
struct qstr d_name; /* 32 16 */
struct inode * d_inode; /* 48 8 */
unsigned char d_iname[40]; /* 56 40 */
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
const struct dentry_operations * d_op; /* 96 8 */
struct super_block * d_sb; /* 104 8 */
long unsigned int d_time; /* 112 8 */
void * d_fsdata; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
struct lockref d_lockref
__attribute__((__aligned__(8))); /* 128 8 */
union {
struct list_head d_lru; /* 136 16 */
wait_queue_head_t * d_wait; /* 136 8 */
}; /* 136 16 */
struct hlist_node d_sib; /* 152 16 */
struct hlist_head d_children; /* 168 8 */
union {
struct hlist_node d_alias; /* 176 16 */
struct hlist_bl_node d_in_lookup_hash; /* 176 16 */
struct callback_head d_rcu
__attribute__((__aligned__(8))); /* 176 16 */
} d_u __attribute__((__aligned__(8))); /* 176 16 */
/* size: 192, cachelines: 3, members: 16 */
/* forced alignments: 2 */
} __attribute__((__aligned__(8)));
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists