lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YnZZlR7BV/cyn8xS@itl-email>
Date:   Sat, 7 May 2022 07:35:45 -0400
From:   Demi Marie Obenour <demi@...isiblethingslab.com>
To:     James Bottomley <James.Bottomley@...senpartnership.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Block Mailing List <linux-block@...r.kernel.org>,
        Linux Filesystem Mailing List <linux-fsdevel@...r.kernel.org>
Subject: Re: Race-free block device opening

On Wed, Apr 27, 2022 at 09:29:12AM -0400, James Bottomley wrote:
> On Tue, 2022-04-26 at 14:12 -0400, Demi Marie Obenour wrote:
> > Right now, opening block devices in a race-free way is incredibly
> > hard.
> 
> Could you be more specific about what the race you're having problems
> with is?  What is racing.

If I open /dev/mapper/qubes_dom0-vm--sys--net--private, it is possible
that something has destroyed the corresponding device and created a new
one with the same kernel name, *before* udev has managed to unlink the
device node.  As a result, I wind up opening the wrong device.

> > The only reasonable approach I know of is sd_device_new_from_path() +
> > sd_device_open(), and is only available in systemd git main.  It also
> > requires waiting on systemd-udev to have processed udev rules, which
> > can be a bottleneck.
> 
> This doesn't actually seem to be in my copy of systemd.

That’s because it is not in any release yet.

> >   There are better approaches in various special cases, such as using
> > device-mapper ioctls to check that the device one has opened still
> > has the name and/or UUID one expects.  However, none of them works
> > for a plain call to open(2).
> 
> Just so we're clear: if you call open on, say /dev/sdb1 and something
> happens to hot unplug and then replug a different device under that
> node, the file descriptor you got at open does *not* point to the new
> node.  It points to a dead device responder that errors everything.
> 
> The point being once you open() something, the file descriptor is
> guaranteed to point to the same device (or error).

That doesn’t help if the unplug and replug happens between passing the
path and udev having purged the now-stale symlink.

> > A much better approach would be for udev to point its symlinks at
> > "/dev/disk/by-diskseq/$DISKSEQ" for non-partition disk devices, or at
> > "/dev/disk/by-diskseq/${DISKSEQ}p${PARTITION}" for partitions.  A
> > filesystem would then be mounted at "/dev/disk/by-diskseq" that
> > provides for race-free opening of these paths.  This could be
> > implemented in userspace using FUSE, either with difficulty using the
> > current kernel API, or easily and efficiently using a new kernel API
> > for opening a block device by diskseq + partition.  However, I think
> > this should be handled by the Linux kernel itself.
> > 
> > What would be necessary to get this into the kernel?  I would like to
> > implement this, but I don’t have the time to do so anytime soon.  Is
> > anyone else interested in taking this on?  I suspect the kernel code
> > needed to implement this would be quite a bit smaller than the FUSE
> > implementation.
> 
> So it sounds like the problem is you want to be sure that the device
> doesn't change after you've called libblkid to identify it but before
> you call open?  If that's so, the way you do this in userspace is to
> call libblkid again after the open.  If the before and after id match,
> you're as sure as you can be the open was of the right device.

The devices I am working with are raw-format VM disks that contain
untrusted data.  They are identified not by their content, which the VM
has complete control over, but by various sysfs attributes such as
dm/name and dm/uuid.  And they need to be passed to interfaces, such as
libvirt and cryptsetup, that only accept device paths.

I can work around this in the case of cryptsetup by using the
libcryptsetup library and/or holding a file descriptor open, but neither
of those will work for libvirt since libvirtd is a separate process and
I cannot pass a file descriptor to it.  Furthermore, there is no way to
make libvirtd do any post-open() checking on the file descriptor it has
obtained.  While I plan to add a workaround in libxl and blkback for
loop and device-mapper devices, it is not reasonable to expect every
userspace tool to do the same.  

The approach I am suggesting avoids this problem entirely, because
/dev/mapper/qubes_dom0-vm--sys--net--private is now a symlink to a
device node under /dev/disk/by-diskseq/$DISKSEQ.  Those are never, ever
reused.  When the device goes away, the device node goes away too, and
so any attempt to open the symlink (without O_PATH|O_NOFOLLOW) gets
-ENOENT as it should.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ