lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <du5vnvwygzbtal6qogmxckawwwxgbppuq5qi5aeqcs5unrlpz3@3k5degdvflzq>
Date: Wed, 3 Jul 2024 15:53:08 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>, 
	kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev, lkp@...el.com, 
	Linux Memory Management List <linux-mm@...ck.org>, linux-kernel@...r.kernel.org, ying.huang@...el.com, 
	feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput
 -33.7% regression

On Tue, Jul 02, 2024 at 03:14:54PM -0700, Linus Torvalds wrote:
> On Tue, 2 Jul 2024 at 14:15, Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > I asked you something in the previous e-mail though (with some nastiness
> > of the problem pointed out) concerning handling of slow vs fastpath,
> > here it is again:
> >
> > [..]for example did you know xfs does not honor rcu grace periods when
> > recycling inodes?
> 
> I don't think it should matter.
> 
> Yes, the vfs layer will access the inode, but then the sequence number
> checks should always verify that whatever access we did is then
> validated after-the-fact.
> 
[..]
> Or do you see something I'm missing? I'm not loving how we deal with
> dentry->d_inode, but considering that the whole point of the RCU
> lookup is that we don't hold any locks, I think it's almost
> unavoidable.
> 

Now I'm confused mate. Based on the convo so far I would expect you
would consider the xfs thing a no-go for the machinery.

You were rightfully pointing out the relationship dentry<->inode is not
stable and care is needed to grab the pointer, and even then the pointer
may be wrong by the time one finishes the work.

I presume you are also worried about callbacks not taking proper steps
when looking at the inode itself -- after all they can be racing with
teardown and have to handle it gracefully by returning an error.

This much comes with territory.

Inode changing identities adds potential trouble which does not need to
be there.

Certain things remain stable, for example the inode type (regular, fifo
etc.). This is where xfs not respecting grace periods comes in and
removes it.

It may happen to be for stat et al the way they can be implemented at
the moment this does not matter, but then it may start being a factor in
the future.

Say a type-dependent union showed up and needs to be looked at:
	union {
		fifo_pointer_t *fifo_data;
		dev_t		i_rdev;
		...;
	};

area pointed to by fifo_data is rcu-protected.

Then this:
	if (IS_FIFO(inode)) {
		fifo_pointer_t *f = rcu_dereference(inode->fifo_data);
		if (!f)
			return -ENODICE;
		/* use f here */
	}

while may look safe in terms of taking precatuions and bailing from the
fast path happens to be wrong.

Suppose the inode got reused and is now representing a device, i_rdev is
some funky value. This can race to read that int and trap dereferencing
what it found there. This would not happen if the grace period was
respected.

This is a bug class which does not need to be there and there may be
other possible bugs I did not think of.

There is also potential trouble with security modules as they
unfortunately have a hook for getattr. They presumably can be massaged
into handling a possibly dying inode (or failing with -ENODICE), I would
not put any stake into handling an inode which becomes something else as
they deal with it.

As a side note this is where I should also note the *current* LSM hooks
are racy as is.  Suppose you can stat a particular file now, but a chown
to 1234 means you can't. Then your stat call can race against chown +
other ops. You can end up in a state where LSMs gave a green light and
only then the state changed to one you are not allowed to look at. This
would be solvable with per-inode sequence counters, but is arguably
beyond the scope of this patch and I'm not interested in working on it.

[I read the rest of the mail and have no immediate commentary, we will
see after I take a stab at implementing this]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ