linux-kernel - Re: [PATCH v3 1/6] block: add disk sequence number

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YNNLUnGMO/gNNIJK@gardel-login>
Date:   Wed, 23 Jun 2021 16:55:14 +0200
From:   Lennart Poettering <mzxreary@...inter.de>
To:     Hannes Reinecke <hare@...e.de>
Cc:     Luca Boccassi <bluca@...ian.org>,
        Matteo Croce <mcroce@...ux.microsoft.com>,
        Christoph Hellwig <hch@...radead.org>,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Jens Axboe <axboe@...nel.dk>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Damien Le Moal <damien.lemoal@....com>,
        Tejun Heo <tj@...nel.org>,
        Javier Gonz??lez <javier@...igon.com>,
        Niklas Cassel <niklas.cassel@....com>,
        Johannes Thumshirn <johannes.thumshirn@....com>,
        Matthew Wilcox <willy@...radead.org>,
        JeffleXu <jefflexu@...ux.alibaba.com>
Subject: Re: [PATCH v3 1/6] block: add disk sequence number

On Mi, 23.06.21 16:21, Hannes Reinecke (hare@...e.de) wrote:

> > We need this so that we can reliably correlate events to instances of a
> > device. Events alone cannot solve this problem, because events _are_
> > the problem.
> >
> In which sense?
> Yes, events can be delayed (if you list to uevents), but if you listen to
> kernel events there shouldn't be a delay, right?

uevents are delivered to userpace via an AF_NETLINK socket. The
AF_NETLINK socket is basically an asynchronous buffer.

I mean, consider what you are saying: you establish the AF_NETLINK
uevent watching socket, then you allocate /dev/loop0. Since you cannot
do that atomically, you'll first have to do one, and then the
other. But if you do that in two steps, then in the middle some other
process might get scheduled that quickly allocates /dev/loop0 and
releases it again, before your code gets to run. So now you have in
your AF_NETLINK socket buffer the uevents for that other process' use
of the device, and you cannot sanely distinguish them from your own.

of course you could do it the other way: allocate the device first,
and only then allocate the AF_NETLINK uevent socket. But then you
might or might not lose the "add" event for the device you just
allocated. And you don't know if you should wait for it or not.

This isn't even a constructed issue, this is the common case if you
have multiple processes all simultaneously trying to acquire a loopback
block device, because they all will end up eying /dev/loop0 at the same time.

But it gets worse IRL because of various factors. For example,
partition probing is asynchronous, so if you use LO_FLAGS_PARTSCAN and
want to watch for some partition device associated to your loopback
block device to show up, this can take *really* long, so the race
window is large. Or you actually use udev (like most userspace
probably should) because you want the metainfo it collects about the
device, in which case it will take even longer for the uevent to reach
you, i.e. the time window where a previous user's uevents and your own
for the same loopback device "overlap" can be quite large and you
cannot determine if they are yours or the previous user's uevents —
unless we have these new sequence numbers.

Lennart

--
Lennart Poettering, Berlin