lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 9 Apr 2024 10:19:09 +0200
From: Lennart Poettering <mzxreary@...inter.de>
To: Keith Busch <kbusch@...nel.org>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>,
	Christoph Hellwig <hch@....de>, linux-block@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>, Jens Axboe <axboe@...nel.dk>
Subject: Re: API break, sysfs "capability" file

On Mo, 08.04.24 16:41, Keith Busch (kbusch@...nel.org) wrote:

> On Mon, Apr 08, 2024 at 10:23:49PM +0200, Lennart Poettering wrote:
> > Not sure how this is salvageable. This is just seriously fucked
> > up. What now?
> >
> > It has been proposed to use the "range_ext" sysfs attr instead as a
> > hint if partition scanning is available or not. But it's entirely
> > undocumented. Is this something that will remain stable? (I mean,
> > whether something is documented or not apparently has no effect on the
> > stability of an API anyway, so I guess it's equally shaky as the
> > capability sysattr? Is any of the block device sysfs interfaces
> > actually stable or can they change any time?)
>
> The "ext_range" attribute does look like an appropriate proxy for the
> attribute, but indeed, it's not well documented.
>
> Looking at the history of the documentation you had been relying on, it
> appears that was submitted with good intentions (9243c6f3e012a92d), but
> it itself changed values, acknowledging the instability of this
> interface.
>
> So what to do? If documentation is all that's preventing "ext_range"
> from replacing you're previous usage, then let's add it in the
> Documentation/ABI/stable/sysfs-block. It's been there since 2008, so
> that seems like a reliable attribute to put there.

Well, history so far is telling us that this doesn't stop the block layer
to change it anyway...

AFAICS "ext_range" is kinda messy to use for this since it changed
behaviour – only since
https://github.com/torvalds/linux/commit/1ebe2e5f9d68e94c524aba876f27b945669a7879
it actually directly exposes GENHD_FL_NO_PART, before it it did some
more complex stuff which did *not* take GENHD_FL_NO_PART into
consideration. It's nasty to hack against that from userspace, since
we never know on what kernel we are on, and how it has been patched.

Also "ext_range" is only available on whole block devices afaics. Partition
block devices do not have it at all, which makes the check userspace
has to do even more complex.

All I am looking for is a very simple test that returns me a boolean:
is there kernel-level partition scanning enabled on this device or
not. At this point it's not clear to me if I can write this at all in
a way that works reasonably correctly on any kernel since let's say
4.15 (which is systemd's "recommended baseline" right now).

I am really not sure how to salvage this mess at all. AFAICS there's
currently no way to write such a test correctly.

1. "ext_range" does not work on older kernels, and not on partition
   block devices
2. "capabilities" does not work on newer kernels, because it changed
   meaning and then was amputated to be zero.
3. There's no way to know if we are on an old or new kernel, as
   apparently various distros backported the amputation.

So, what now?

I think it would be nice if the "capabilities" thing would be brought
back in a limited form. For example, if it would be changed to start
to return 0x200|0x1000 for part scanning is off, 0x1000 when it is on.

That would then mean we return to compatibility with Linux <= 5.15,
but the new 0x1000 bit would tell us that the information is
reliable. i.e. if userspace sees 0x1000 being set we know that the
0x200 bit is definitely correct. That would then just mean that
kernels >= 5.16 until today are left in the cold...

That would then allow userspace to implement:

1. if "capabilities" has 0x200 set → definitely no partition scanning
2. if "capabilities" has 0x1000 set → bit 0x200 reliably tells is
   whether partition scanning on or off
3. if DEVTYPE=partition → definitely no partition scanning
4. if "ext_range" is 1 → definitely no partition scanning
5. if LOOP_GET_STATUS64 works, then .lo_flags' LO_FLAGS_PARTSCAN flag
   indicates partition scanning on or off.
6. otherwise: ??? (probably we should assume partition scanning is on?)

Lennart

--
Lennart Poettering, Berlin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ