[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b7b0c6bf-225f-dbb8-7a80-4bc9f3e78a53@linux.ibm.com>
Date: Wed, 23 Feb 2022 21:49:35 -0500
From: Stefan Berger <stefanb@...ux.ibm.com>
To: Mimi Zohar <zohar@...ux.ibm.com>, linux-integrity@...r.kernel.org
Cc: serge@...lyn.com, christian.brauner@...ntu.com,
containers@...ts.linux.dev, dmitry.kasatkin@...il.com,
ebiederm@...ssion.com, krzysztof.struczynski@...wei.com,
roberto.sassu@...wei.com, mpeters@...hat.com, lhinds@...hat.com,
lsturman@...hat.com, puiterwi@...hat.com, jejb@...ux.ibm.com,
jamjoom@...ibm.com, linux-kernel@...r.kernel.org,
paul@...l-moore.com, rgb@...hat.com,
linux-security-module@...r.kernel.org, jmorris@...ei.org,
Mehmet Kayaalp <mkayaalp@...ux.vnet.ibm.com>
Subject: Re: [PATCH v10 18/27] integrity/ima: Define ns_status for storing
namespaced iint data
On 2/23/22 21:21, Stefan Berger wrote:
>
> On 2/23/22 11:12, Mimi Zohar wrote:
>> On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
>>> From: Mehmet Kayaalp <mkayaalp@...ux.vnet.ibm.com>
>>>
>>> Add an rbtree to the IMA namespace structure that stores a namespaced
>>> version of iint->flags in ns_status struct. Similar to the
>>> integrity_iint_cache, both the iint and ns_status are looked up
>>> using the
>>> inode pointer value. The lookup, allocate, and insertion code is also
>>> similar.
>>>
>>> In subsequent patches we will have to find all ns_status entries an
>>> iint
>>> is being used in and reset flags there. To do this, connect a list of
>>> ns_status to the integrity_iint_cache and provide a reader-writer
>>> lock in the integrity_iint_cache to lock access to the list.
>>>
>>> To simplify the code in the non-namespaces case embed an ns_status in
>>> the integrity_iint_cache and have it linked into the iint's
>>> ns_status list
>>> when calling ima_get_ns_status().
>>>
>>> When getting an ns_status first try to find it in the RB tree. Here
>>> we can
>>> run into the situation that an ns_status found in the RB tree has a
>>> different iint associated with it for the same inode. In this case
>>> we need
>>> to delete the ns_status structure and get a new one.
>>>
>>> There are two cases for freeing:
>>> - when the iint is freed (inode deletion): Walk the list of ns_status
>>> entries and disconnect each ns_status from the list; take the
>>> writer lock to protect access to the list; also, take the item
>>> off the
>>> per-namespace rbtree
>>>
>>> - when the ima_namepace is freed: While walking the rbtree, remove the
>>> ns_status from the list while also holding the iint's writer lock;
>>> to be able to grab the lock we have to have a pointer to the iint on
>>> the ns_status structure.
>>>
>>> To avoid an ns_status to be freed by the two cases concurrently,
>>> prevent
>>> these two cases to run concurrently. Therefore, groups of threads
>>> deleting either inodes or ima_namespaces are allowed to run
>>> concurrently
>>> but no two threads may run and one delete an inode and the other an
>>> ima_namespace.
>> The locking involved here is really complex. I'm sure you thought
>> about it a lot, but isn't there a better alternative?
>
> I am afraid this is a difficult question and a short and concise
> answer is not possible...
>
> The complexity of the locking is driven by concurrency and the data
> structures that are involved. The data structures (existing global
> iint rbtree, ns_status structure, and per namespace rbtree for
> ns_status) and how they are organized and connected (via linked lists)
> are a consequence of the fact that we need to be able to handle files
> shared between IMA namespaces (and the host) so that re-auditing,
> re-measuring and re-appraisal of files after file modifications or
> modifications of the security.ima xattr (by any namespaces) can be
> done efficiently. Furthermore, it helps to efficiently remove all the
> status information that an IMA namespace has kept for files it
> audited/measured/appraised. The goal was to make this as scalable as
> possible by having each namespace get out of the way of other
> namespaces by preventing them from locking each other out too much.
> The single biggest problem are files shared between IMA namespaces.
>
> The best argument for the design I can come up with is the 'Big O
> notation' describing the time complexity of operations.
>
>
> The existing global iint rbtree maintains IMA status information for
> each inode. Lookup and insertion of data into the gloab iint rbtree
> is O(log(n)), thus optimal.
>
> To accommodate re-auditing/re-measurement/re-appraisal, which is
> driven by resetting status flags, I connected a list of ns_status
> structures, in which each namespace maintains its status information
> for each inode, to the iint maintained in that global rbtree. The
> resetting of status flags is fast because traversing the list after a
> lookup in the tree is O(n). Lookup + resetting the flags therefore is
> O(log(n) + n). If the list didn't exist we would have to search all
> IMA namespaces for the inode to be able to reset the flags, resulting
> in O(n * log(n)) time complexity, which is of course much worse. So,
> the list of ns_status linked to an iint has a good reason: better time
> complexity to traverse the list and reset status flags. Beside that
> it also supports fast handling of deletion of files where the iint has
> to be delete from the global rbtree and the ns_status list it holds
> must also be deleted (each ns_status also needs to be delete from a
> per IMA-namespace rbtree then)
>
>
> There's also a per-IMA namespace rbtree for each inode that serves two
> purposes:
>
> a) Fast lookup of ns_status (O(log(n)) for an IMA namespace; at least
> to insert an ns_status into this tree we need not write-lock the iint
> tree but the initial iint creation required the write-locking of the
> iint tree
>
> b) Maintaining a collection of inodes that the namespace has
> audited/measured/appraised for efficient deletion upon IMA namespace
> teardown: We can traverse this tree in O(n) time and determine which
> iints have no more namespace users and delete them from the iint tree.
>
>
> Now the dilemma with this is that an ns_status structure is connected
> to a list hanging off the iint and on top of this it is part of an
> rbtree. And this is where the 'group locking' is coming from. What we
> need to prevent is that an ns_status is deleted from its iint list
> (when a file is deleted) while it is also deleted from the per-IMA
> namespace rbtree (when the namespace is deleted). Both must not be
> done concurrently. What is possible is that a group of threads may
> tear down namespaces and the other group may act on file deletion, but
> threads from both groups must not run concurrently.
>
>
> Now we can at least look at two alternatives for the per-IMA namespace
> rbtree.
>
> 1) One alternative is to use a list instead of an rbtree. We would
> loose the fast lookup via the per IMA namespace tree and get O(n)
> lookup times but quick insertion into the list [O(1)]. We still would
> have the collection of inodes. And we would still have the dilemma
> that an ns_status would be connected to two lists, thus requiring the
> group locking. I don't think using a list instead of an rbtree is a
> solution.
>
> 2) We could try to get rid of the per-IMA namespace rbtree altogether
> and just use the global iint rbtree that exists today with a list of
> ns_status connected to its iints. If we do this we would loose the
> knowledge of which inodes a namespace has an ns_status structure for.
> The only way we would find this is by traversing the global iint tree
> (O(n)) and follow each iint list (O(m)) to see whether we find an
> ns_status holding information about the iint. The time complexity for
> this would be O(n*m) but much less than O(n^2). A downside would also
> be that we would have to keep a lock on the global iint rbtree while
> traversing it, thus locking out those that want to add inodes to the
> tree. On the upside it would allow us to get rid of the group locking.
> Lookup of an ns_status in the global iint tree would be O(n) + O(m)
> and insertion would be O(n) + O(1).
>
>
> Certainly, the alternative is 2) with its own trade-offs. My guess is
> some sort of yielding could probably also be helpful there then to
> avoid blocking higher priority operations than deleting of a namespace.
I forgot to mention: It makes a difference if one has to walk the global
iint tree to find the few ns_status for the namespace among possibly
thousands of entries in that tree than having a per-IMA namespace rbtree
that has these few ns_status right there. So walking the iint tree is
more like O(N) versus O(n) walking the per-IMA namespace rbtree.
Powered by blists - more mailing lists