[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250710-geber-keimen-40cdd4bf17f7@brauner>
Date: Thu, 10 Jul 2025 14:03:05 +0200
From: Christian Brauner <brauner@...nel.org>
To: Arnd Bergmann <arnd@...db.de>
Cc: Arnd Bergmann <arnd@...nel.org>, linux-fsdevel@...r.kernel.org,
linux-block@...r.kernel.org, Anuj Gupta <anuj20.g@...sung.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>, Kanchan Joshi <joshi.k@...sung.com>,
LTP List <ltp@...ts.linux.it>, Dan Carpenter <dan.carpenter@...aro.org>,
Benjamin Copeland <benjamin.copeland@...aro.org>, rbm@...e.com, Naresh Kamboju <naresh.kamboju@...aro.org>,
Anders Roxell <anders.roxell@...aro.org>, Jens Axboe <axboe@...nel.dk>,
Pavel Begunkov <asml.silence@...il.com>, Alexey Dobriyan <adobriyan@...il.com>,
"Darrick J. Wong" <djwong@...nel.org>, Eric Biggers <ebiggers@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] block: fix FS_IOC_GETLBMD_CAP parsing in
blkdev_common_ioctl()
On Thu, Jul 10, 2025 at 12:11:12PM +0200, Arnd Bergmann wrote:
> On Thu, Jul 10, 2025, at 10:00, Christian Brauner wrote:
> > On Wed, Jul 09, 2025 at 08:10:14PM +0200, Arnd Bergmann wrote:
>
> >> + if (_IOC_DIR(cmd) == _IOC_DIR(FS_IOC_GETLBMD_CAP) &&
> >> + _IOC_TYPE(cmd) == _IOC_TYPE(FS_IOC_GETLBMD_CAP) &&
> >> + _IOC_NR(cmd) == _IOC_NR(FS_IOC_GETLBMD_CAP) &&
> >> + _IOC_SIZE(cmd) >= LBMD_SIZE_VER0 &&
> >> + _IOC_SIZE(cmd) <= _IOC_SIZE(FS_IOC_GETLBMD_CAP))
> >
> > This part is wrong as we handle larger sizes just fine via
> > copy_struct_{from,to}_user().
>
> I feel that is still an open question. As I understand it,
> you want to make it slightly easier for userspace callers
> to use future versions of an ioctl command by allowing them in
> old kernels as well, by moving that complexity into the kernel.
>
> Checking against _IOC_SIZE(FS_IOC_GETLBMD_CAP) would keep the
> behavior consistent with the traditional model where userspace
> needs to have a fallback to previous ABI versions.
>
> > Arnd, objections to writing it as follows?:
>
> > + /* extensible ioctls */
> > + switch (_IOC_NR(cmd)) {
> > + case _IOC_NR(FS_IOC_GETLBMD_CAP):
> > + if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP))
> > + break;
> > + if (_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP))
> > + break;
> > + if (_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP))
> > + break;
> > + if (_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
> > + break;
> > + if (_IOC_SIZE(cmd) > PAGE_SIZE)
> > + break;
> > + return blk_get_meta_cap(bdev, cmd, argp);
>
> The 'PAGE_SIZE' seems arbitrary here, especially since that is often
I used that as an illustration since we're capping nearly all (regular)
uapi structures at that as a somewhat reasonable upper bound. If that's
smaller that's fine.
> larger than the maximum size that can be encoded in an ioctl command
> number (8KB or 16KB depending on the architecture). If we do need
> an upper bound larger than _IOC_SIZE(FS_IOC_GETLBMD_CAP), I think it
> should be a fixed number rather than a configuration dependent
> one, and I would prefer a smaller one over a larger one. Anything
Sure, fine by me.
> over a few dozen bytes is certainly suspicious, and once it gets
> to thousands of bytes, you need a dynamic allocation to avoid stack
> overflow in the kernel.
Sure, we do that already in some cases because we have use-cases for
that.
>
> I had already updated my patch to move the checks into
> blk_get_meta_cap() itself and keep the caller simpler:
Ok.
> Regardless of what upper bound we pick, that at least limits
> the complexity to the one function that actually wants it.
Ok.
>
> > And can I ask you to please take a look at fs/pidfs.c:pidfd_ioctl() and
>
> PIDFD_GET_INFO has part of the same problem, as it still fails to
> check the _IOC_DIR() bits. I see you added a check for _IOC_TYPE()
> in commit 091ee63e36e8 ("pidfs: improve ioctl handling"), but
> the comment you added describes an unrelated issue and the fix
> was incomplete.
>
> > fs/nsfs.c:ns_ioctl()?
>
> You tried to add a similar validation in commit 7fd511f8c911
> ("nsfs: validate ioctls"), but it seems you got that wrong
> both by missing the _IOC_DIR check and by having a typo in the
> '_IOC_TYPE(cmd) == _IOC_TYPE(cmd)' line that means this is
> always true rather than comparing against 'NSIO'.
>
> Overall my feeling is similar to Christoph's, that the added
> complexity in any of these three cases was a mistake, as it's
> too easy to mess it up.
For pidfs and nsfs it will definitely be the way we're doing it.
Especially with structures we definitely want to grow. I have zero
interest in spamming userspace with either a fragmented set of 100
ioctls that never give consistent information because they need to be
called in sequence nor constantly revised V2-100 ioctl commands that all
use a new structure with a fixed layout.
The fact that the ioctl api currently lacks validation is more testament
to how unspecified the requirements for ioctls and their encoding are
and we better fix that in general because like it or not we already have
quite a few size-versioned ioctls anyway. This is nothing we cannot fix.
Tell me what work you need and we'll do it.
The alternative I see is that I will start being very liberal in adding
extensible system calls for stuff such as pidfd_info() and ns_info()
instead of making them ioctls on the file where they belong.
Powered by blists - more mailing lists