linux-kernel - Design of xa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190117200816.GP6310@bombadil.infradead.org>
Date:   Thu, 17 Jan 2019 12:08:17 -0800
From:   Matthew Wilcox <willy@...radead.org>
To:     linux-kernel@...r.kernel.org
Subject: Design of xa_alloc


As part of getting rid of the radix tree, I need to replace idr_alloc()
with xa_alloc().  xa_alloc() takes the internal xa_lock (and has
xa_alloc_irq(), xa_alloc_bh() and __xa_alloc() variations which handle
the lock the way you probably expect them to).

It would, of course, be possible to make the xa_alloc() API identical
to idr_alloc().  idr_alloc() has been a phenomenal success with around
200 callers in the kernel today.  But it could be improved, and this
seems like a good time to make the API changes.

int idr_alloc(struct idr *, void *ptr, int start, int end, gfp_t);

1. It is not possible to allocate an ID above 2^31.  This led to the
   introduction of idr_alloc_u32 (which has its own problems I'll talk
   about later)
2. It's hard to insert a fully-initialised object into an IDR in a
   way that is safe for RCU-based lookups; the object needs to be marked
   as not-fully-initialised, then inserted, then the ID updated, then the
   object marked as fully-initialised.  Many users choose an "invalid" ID
   for this purpose, or make two calls, first allocating a NULL pointer,
   then replacing it with the actual object.
3. Allocating a specific ID is a little awkward, passing 'id, id + 1' as
   start & end.
4, The meaning of the 'end' argument has been misunderstood by some users.
   It's exclusive, not inclusive, so writing 'INT_MAX' means you
   can allocate up to INT_MAX-1.  This probably doesn't cause any bugs,
   but it's not great.
5. Nobody actually uses the ability to allocate an ID between 'a' and
   'a + n' which is the rationale for how idr_alloc interprets negative
   'end'.
6. Using a non-zero 'start' is inefficient.  I made a start on solving
   this with IDR_INIT_BASE(), but this is a good opportunity to solve
   it for the whole kernel.

As for idr_alloc_u32(), it's just too easy to forget to initialise *id
before calling it, and there's no realistic way to get gcc to warn for
that mistake.  It does solve problems 2, 3 and 5, but it's a bad tradeoff.
The current xa_alloc() in the tree is modelled after idr_alloc_u32(), but
I feel it should be fixed.

I've been through a number of variations of this interface, but this is
what I've currently settled on:

/**
 * xa_alloc() - Find somewhere to store this entry in the XArray.
 * @xa: XArray.
 * @id: Pointer to ID.
 * @entry: New entry.
 * @limit: Range of ID to allocate.
 * @gfp: Memory allocation flags.
 *
 * Finds an empty entry in @xa between @limit.min and @limit.max,
 * stores the index into the @id pointer, then stores the entry at
 * that index.  A concurrent lookup will not see an uninitialised @id.
 *
 * Context: Any context.  Takes and releases the xa_lock.  May sleep if
 * the @gfp flags permit.
 * Return: 0 on success, -ENOMEM if memory could not be allocated or
 * -ENOSPC if there are no free entries in @limit.
 */
static inline __must_check int xa_alloc(struct xarray *xa, u32 *id,
                void *entry, struct xa_limit limit, gfp_t gfp)

Problems 1 & 2 are solved by passing a pointer to the ID, but using it
only as an output, avoiding the initialisation problem of idr_alloc_u32().
There are a few users who store the allocated ID in a u16, but this
isn't too hard to work around.

Problem 3 is solved by using xa_insert() instead of xa_alloc().  It does
return a different errno (-EEXIST) instead of -ENOSPC, but most users
convert the errno to something else anyway because -ENOSPC is a horrible
errno to return to userspace.

Problem 4 is solved by interpreting the 'max' element of xa_limit
inclusively.  The xa_limit is a little complicated to explain, but easy
to use.  Most users will use one of two predefined ranges; xa_limit_32b
or xa_limit_31b which correspond to the ranges [0 - UINT_MAX] and [0 -
INT_MAX] respectively.  The remaining users will pass XA_LIMIT(x, y)
as an argument.

Problem 6 is solved by having two ways of declaring an allocating XArray:

#define DEFINE_XARRAY_ALLOC(name) DEFINE_XARRAY_FLAGS(name, XA_FLAGS_ALLOC)
#define DEFINE_XARRAY_ALLOC1(name) DEFINE_XARRAY_FLAGS(name, XA_FLAGS_ALLOC1)

Supporting bases other than 0 and 1 seems like more effort than it's worth.


Here's an example conversion which is fairly typical, drivers/scsi/st.c:

-static DEFINE_SPINLOCK(st_index_lock);
-static DEFINE_IDR(st_index_idr);
+static DEFINE_XARRAY_ALLOC(st_index);

[...]

-       idr_preload(GFP_KERNEL);
-       spin_lock(&st_index_lock);
-       error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_NOWAIT);
-       spin_unlock(&st_index_lock);
-       idr_preload_end();
+       error = xa_alloc(&st_index, &tpnt->index, tpnt,
+                       XA_LIMIT(0, ST_MAX_TAPES), GFP_KERNEL);
        if (error < 0) {
                pr_warn("st: idr allocation failed: %d\n", error);
                goto out_put_queue;
        }
-       tpnt->index = error;


Similarly, we also need an xa_alloc_cyclic, which can be found here:
http://git.infradead.org/users/willy/linux-dax.git/commitdiff/9bbcfb45052235ede806b0a84dcaacc27a5c2f66