[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b498e3b004bedc460991e167c154cc88d568f587.camel@huaweicloud.com>
Date: Fri, 18 Oct 2024 11:24:06 +0200
From: Roberto Sassu <roberto.sassu@...weicloud.com>
To: Paul Moore <paul@...l-moore.com>, ebpqwerty472123@...il.com,
kirill.shutemov@...ux.intel.com
Cc: zohar@...ux.ibm.com, dmitry.kasatkin@...il.com,
eric.snowberg@...cle.com, jmorris@...ei.org, serge@...lyn.com,
linux-integrity@...r.kernel.org, linux-security-module@...r.kernel.org,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org, Roberto Sassu
<roberto.sassu@...wei.com>, linux-mm@...ck.org, akpm@...ux-foundation.org,
vbabka@...e.cz, lorenzo.stoakes@...cle.com, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/3] ima: Remove inode lock
On Wed, 2024-10-16 at 11:59 +0200, Roberto Sassu wrote:
> On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote:
> > On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote:
> > > On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
> > > > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> > > > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@...l-moore.com> wrote:
> > > > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > > > > > <roberto.sassu@...weicloud.com> wrote:
> > > > > > >
> > > > > > > From: Roberto Sassu <roberto.sassu@...wei.com>
> > > > > > >
> > > > > > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > > > > > whether or not inode integrity metadata are stored in the inode.
> > > > > > >
> > > > > > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > > > > > the inode security blob.
> > > > > > >
> > > > > > > Move the mutex initialization and annotation in the new function
> > > > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > > > > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > > > > >
> > > > > > > Finally, expand the critical region in process_measurement() guarded by
> > > > > > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > > > > >
> > > > > > > Signed-off-by: Roberto Sassu <roberto.sassu@...wei.com>
> > > > > > > ---
> > > > > > > security/integrity/ima/ima.h | 26 ++++++++---
> > > > > > > security/integrity/ima/ima_api.c | 4 +-
> > > > > > > security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > > > > > > security/integrity/ima/ima_main.c | 39 +++++++---------
> > > > > > > 4 files changed, 104 insertions(+), 42 deletions(-)
> > > > > >
> > > > > > I'm not an IMA expert, but it looks reasonable to me, although
> > > > > > shouldn't this carry a stable CC in the patch metadata?
> > > > > >
> > > > > > Reviewed-by: Paul Moore <paul@...l-moore.com>
> > > > >
> > > > > Sorry, one more thing ... did you verify this patchset resolves the
> > > > > syzbot problem? I saw at least one reproducer.
> > > >
> > > > Uhm, could not reproduce the deadlock with the reproducer. However,
> > > > without the patch I have a lockdep warning, and with I don't.
> > > >
> > > > I asked syzbot to try the patches. Let's see.
> > >
> > > I actually got a different lockdep warning:
> > >
> > > [ 904.603365] ======================================================
> > > [ 904.604264] WARNING: possible circular locking dependency detected
> > > [ 904.605141] 6.12.0-rc2+ #20 Not tainted
> > > [ 904.605697] ------------------------------------------------------
> >
> > I can reproduce by executing the syzbot reproducer and in another
> > terminal by logging in with SSH (not all the times).
> >
> > If I understood what the lockdep warning means, this is the scenario.
> >
> > A task accesses a seq_file which is in the IMA policy, so we enter the
> > critical region guarded by the iint lock. But before we get the chance
> > to measure the file, a second task calls remap_file_pages() on the same
> > seq_file.
> >
> > remap_file_pages() takes the mmap lock and, if the event matches the
> > IMA policy too, the second task waits for the iint lock to be released.
> >
> > Now, the first task starts to measure the seq_file and takes the
> > seq_file lock. I don't know if 3 processes must be involved, but I was
> > thinking that reading the seq_file from the first task can trigger a
> > page fault, which requires to take the mmap lock.
> >
> > At this point, we reach a deadlock. The first task waits for the mmap
> > lock to be released, and the second waits for the iint lock to be
> > released, which both cannot happen.
>
> + mm, mm folks
>
> (I leave the lockdep warning down for the new people included in the
> thread).
>
> I think I understood what the problem is. Any lock that is taken inside
> security_file_mmap() would cause lock inversion since:
+ Kirill
Ok, to be clear, we are talking about a regression introduced by commit
ea7e2d5e49c05 ("mm: call the security_mmap_file() LSM hook in
remap_file_pages()").
Originally, Mimi asked this patch to be included:
https://lore.kernel.org/lkml/1336963631-3541-1-git-send-email-zohar@us.ibm.com/
This was due to the commit 196f518 ("IMA: explicit IMA i_flag to remove
global lock on inode_delete"), which added an inode lock to protect the
new flag S_IMA, used to track integrity metadata only for the set of
inodes evaluated by IMA.
The problem was that, for mmapped files, the lock is taken in the
opposite order than the convention (i_mutex and then mmap_sem), causing
a lock inversion and a lockdep warning.
Linus proposed to split security_file_mmap() into security_mmap_file()
and security_mmap_addr(), so that the former can be taken out of the
mmap_sem lock and remove the lock inversion. The final commit was made
by Al Viro, 8b3ec6814c83 ("take security_mmap_file() outside of -
>mmap_sem").
Commit ea7e2d5e49c05 put again a security_mmap_file() call inside the
mmap_sem (now mmap_lock) lock, causing the recent syzbot reports.
Paul asked if the inode lock can be removed from IMA, and actually
partially can be done:
https://lore.kernel.org/linux-integrity/20241008165732.2603647-1-roberto.sassu@huaweicloud.com/
The patch is good in its own, since it merges two critical sections in
one. However, the inode lock cannot be removed completely:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/integrity/ima/ima_main.c?h=v6.12-rc3#n386
This one is required to set the security.ima xattr in
ima_appraise_measurement() -> ima_fix_xattr(), when IMA-Appraise is in
fix mode (ima_appraise=fix in the kernel command line).
I would say that my original suggestion of putting security_mmap_file()
back to the mmap_lock lock probably is not the optimal solution. Maybe
we could get around removing the remaining inode lock in IMA, but we
would also limit future LSMs to not use the inode lock if they need.
Probably it is hard, @Kirill would there be any way to safely move
security_mmap_file() out of the mmap_lock lock?
Thanks
Roberto
PS: I'm aware that Shu Han proposed a solution to the lock inversion:
https://lore.kernel.org/linux-security-module/20240928180044.50-1-ebpqwerty472123@gmail.com/
but I think anyway it boils down to either moving security_mmap_file()
to the mmap_lock lock again for all calls, or viceversa, moving
security_mmap_file() out of the remap_file_pages() system call.
> vm_mmap_pgoff():
>
> ret = security_mmap_file(file, prot, flag);
> if (!ret) {
> if (mmap_write_lock_killable(mm))
>
>
>
> SYSCALL_DEFINE5(remap_file_pages, ...):
>
> if (mmap_write_lock_killable(mm))
> return -EINTR;
>
> [...]
>
> file = get_file(vma->vm_file);
> ret = security_mmap_file(vma->vm_file, prot, flags);
> if (ret)
> goto out_fput;
>
>
> Since I didn't see a good way to change the order of the second, I
> changed the order of the first, i.e. I put security_file_mmap() under
> the mmap lock.
>
> (I don't know if this is a good idea.)
>
> I cannot reproduce the lockdep warning anymore.
>
> @mm folks, what is your opinion?
>
> Thanks
>
> Roberto
>
> > Roberto
> >
> > > [ 904.606577] systemd/1 is trying to acquire lock:
> > > [ 904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0
> > > [ 904.608290]
> > > [ 904.608290] but task is already holding lock:
> > > [ 904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> > > [ 904.610429]
> > > [ 904.610429] which lock already depends on the new lock.
> > > [ 904.610429]
> > > [ 904.611574]
> > > [ 904.611574] the existing dependency chain (in reverse order) is:
> > > [ 904.612628]
> > > [ 904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}:
> > > [ 904.613681] __mutex_lock+0xaf/0x760
> > > [ 904.614266] mutex_lock_nested+0x27/0x40
> > > [ 904.614897] ima_iint_lock+0x24/0x40
> > > [ 904.615490] process_measurement+0x176/0xef0
> > > [ 904.616168] ima_file_mmap+0x98/0x120
> > > [ 904.616767] security_mmap_file+0x408/0x560
> > > [ 904.617444] __do_sys_remap_file_pages+0x2fa/0x4c0
> > > [ 904.618194] __x64_sys_remap_file_pages+0x29/0x40
> > > [ 904.618937] x64_sys_call+0x6e8/0x4550
> > > [ 904.619546] do_syscall_64+0x71/0x180
> > > [ 904.620155] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [ 904.620952]
> > > [ 904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}:
> > > [ 904.621813] __might_fault+0x6f/0xb0
> > > [ 904.622400] _copy_to_iter+0x12e/0xa80
> > > [ 904.623009] seq_read_iter+0x593/0x6b0
> > > [ 904.623629] proc_reg_read_iter+0x31/0xe0
> > > [ 904.624276] vfs_read+0x256/0x3d0
> > > [ 904.624822] ksys_read+0x6d/0x160
> > > [ 904.625372] __x64_sys_read+0x1d/0x30
> > > [ 904.625964] x64_sys_call+0x1068/0x4550
> > > [ 904.626594] do_syscall_64+0x71/0x180
> > > [ 904.627188] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [ 904.627975]
> > > [ 904.627975] -> #0 (&p->lock){+.+.}-{4:4}:
> > > [ 904.628787] __lock_acquire+0x17f3/0x2320
> > > [ 904.629432] lock_acquire+0xf2/0x420
> > > [ 904.630013] __mutex_lock+0xaf/0x760
> > > [ 904.630596] mutex_lock_nested+0x27/0x40
> > > [ 904.631225] seq_read_iter+0x62/0x6b0
> > > [ 904.631831] kernfs_fop_read_iter+0x1ef/0x2c0
> > > [ 904.632599] __kernel_read+0x113/0x350
> > > [ 904.633206] integrity_kernel_read+0x23/0x40
> > > [ 904.633902] ima_calc_file_hash_tfm+0x14e/0x230
> > > [ 904.634621] ima_calc_file_hash+0x97/0x250
> > > [ 904.635281] ima_collect_measurement+0x4be/0x530
> > > [ 904.636008] process_measurement+0x7c0/0xef0
> > > [ 904.636689] ima_file_check+0x65/0x80
> > > [ 904.637295] security_file_post_open+0xb1/0x1b0
> > > [ 904.638008] path_openat+0x216/0x1280
> > > [ 904.638605] do_filp_open+0xab/0x140
> > > [ 904.639185] do_sys_openat2+0xba/0x120
> > > [ 904.639805] do_sys_open+0x4c/0x80
> > > [ 904.640366] __x64_sys_openat+0x23/0x30
> > > [ 904.640992] x64_sys_call+0x2575/0x4550
> > > [ 904.641616] do_syscall_64+0x71/0x180
> > > [ 904.642207] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [ 904.643003]
> > > [ 904.643003] other info that might help us debug this:
> > > [ 904.643003]
> > > [ 904.644149] Chain exists of:
> > > [ 904.644149] &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth]
> > > [ 904.644149]
> > > [ 904.645763] Possible unsafe locking scenario:
> > > [ 904.645763]
> > > [ 904.646614] CPU0 CPU1
> > > [ 904.647264] ---- ----
> > > [ 904.647909] lock(&ima_iint_lock_mutex_key[depth]);
> > > [ 904.648617] lock(&mm->mmap_lock);
> > > [ 904.649479] lock(&ima_iint_lock_mutex_key[depth]);
> > > [ 904.650543] lock(&p->lock);
> > > [ 904.650974]
> > > [ 904.650974] *** DEADLOCK ***
> > > [ 904.650974]
> > > [ 904.651826] 1 lock held by systemd/1:
> > > [ 904.652376] #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> > > [ 904.653759]
> > > [ 904.653759] stack backtrace:
> > > [ 904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20
> > > [ 904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> > > [ 904.656497] Call Trace:
> > > [ 904.656856] <TASK>
> > > [ 904.657166] dump_stack_lvl+0x134/0x1a0
> > > [ 904.657728] dump_stack+0x14/0x30
> > > [ 904.658206] print_circular_bug+0x38d/0x450
> > > [ 904.658812] check_noncircular+0xed/0x120
> > > [ 904.659396] ? srso_return_thunk+0x5/0x5f
> > > [ 904.659972] ? srso_return_thunk+0x5/0x5f
> > > [ 904.660569] __lock_acquire+0x17f3/0x2320
> > > [ 904.661145] lock_acquire+0xf2/0x420
> > > [ 904.661664] ? seq_read_iter+0x62/0x6b0
> > > [ 904.662217] ? srso_return_thunk+0x5/0x5f
> > > [ 904.662886] __mutex_lock+0xaf/0x760
> > > [ 904.663408] ? seq_read_iter+0x62/0x6b0
> > > [ 904.663961] ? seq_read_iter+0x62/0x6b0
> > > [ 904.664525] ? srso_return_thunk+0x5/0x5f
> > > [ 904.665098] ? mark_lock+0x4e/0x750
> > > [ 904.665610] ? mutex_lock_nested+0x27/0x40
> > > [ 904.666194] ? find_held_lock+0x3a/0x100
> > > [ 904.666770] mutex_lock_nested+0x27/0x40
> > > [ 904.667337] seq_read_iter+0x62/0x6b0
> > > [ 904.667869] kernfs_fop_read_iter+0x1ef/0x2c0
> > > [ 904.668536] __kernel_read+0x113/0x350
> > > [ 904.669079] integrity_kernel_read+0x23/0x40
> > > [ 904.669698] ima_calc_file_hash_tfm+0x14e/0x230
> > > [ 904.670349] ? __lock_acquire+0xc32/0x2320
> > > [ 904.670937] ? srso_return_thunk+0x5/0x5f
> > > [ 904.671525] ? __lock_acquire+0xfbb/0x2320
> > > [ 904.672113] ? srso_return_thunk+0x5/0x5f
> > > [ 904.672693] ? srso_return_thunk+0x5/0x5f
> > > [ 904.673280] ? lock_acquire+0xf2/0x420
> > > [ 904.673818] ? kernfs_iop_getattr+0x4a/0xb0
> > > [ 904.674424] ? srso_return_thunk+0x5/0x5f
> > > [ 904.674997] ? find_held_lock+0x3a/0x100
> > > [ 904.675564] ? srso_return_thunk+0x5/0x5f
> > > [ 904.676156] ? srso_return_thunk+0x5/0x5f
> > > [ 904.676740] ? srso_return_thunk+0x5/0x5f
> > > [ 904.677322] ? local_clock_noinstr+0x9/0xb0
> > > [ 904.677923] ? srso_return_thunk+0x5/0x5f
> > > [ 904.678502] ? srso_return_thunk+0x5/0x5f
> > > [ 904.679075] ? lock_release+0x4e2/0x570
> > > [ 904.679639] ima_calc_file_hash+0x97/0x250
> > > [ 904.680227] ima_collect_measurement+0x4be/0x530
> > > [ 904.680901] ? srso_return_thunk+0x5/0x5f
> > > [ 904.681496] ? srso_return_thunk+0x5/0x5f
> > > [ 904.682070] ? __kernfs_iattrs+0x4a/0x140
> > > [ 904.682658] ? srso_return_thunk+0x5/0x5f
> > > [ 904.683242] ? process_measurement+0x7c0/0xef0
> > > [ 904.683876] ? srso_return_thunk+0x5/0x5f
> > > [ 904.684462] process_measurement+0x7c0/0xef0
> > > [ 904.685078] ? srso_return_thunk+0x5/0x5f
> > > [ 904.685654] ? srso_return_thunk+0x5/0x5f
> > > [ 904.686228] ? _raw_spin_unlock_irqrestore+0x5d/0xd0
> > > [ 904.686938] ? srso_return_thunk+0x5/0x5f
> > > [ 904.687523] ? srso_return_thunk+0x5/0x5f
> > > [ 904.688098] ? srso_return_thunk+0x5/0x5f
> > > [ 904.688672] ? local_clock_noinstr+0x9/0xb0
> > > [ 904.689273] ? srso_return_thunk+0x5/0x5f
> > > [ 904.689846] ? srso_return_thunk+0x5/0x5f
> > > [ 904.690430] ? srso_return_thunk+0x5/0x5f
> > > [ 904.691005] ? srso_return_thunk+0x5/0x5f
> > > [ 904.691583] ? srso_return_thunk+0x5/0x5f
> > > [ 904.692180] ? local_clock_noinstr+0x9/0xb0
> > > [ 904.692841] ? srso_return_thunk+0x5/0x5f
> > > [ 904.693419] ? srso_return_thunk+0x5/0x5f
> > > [ 904.693990] ? lock_release+0x4e2/0x570
> > > [ 904.694544] ? srso_return_thunk+0x5/0x5f
> > > [ 904.695115] ? kernfs_put_active+0x5d/0xc0
> > > [ 904.695708] ? srso_return_thunk+0x5/0x5f
> > > [ 904.696286] ? kernfs_fop_open+0x376/0x6b0
> > > [ 904.696872] ima_file_check+0x65/0x80
> > > [ 904.697409] security_file_post_open+0xb1/0x1b0
> > > [ 904.698058] path_openat+0x216/0x1280
> > > [ 904.698589] do_filp_open+0xab/0x140
> > > [ 904.699106] ? srso_return_thunk+0x5/0x5f
> > > [ 904.699693] ? lock_release+0x554/0x570
> > > [ 904.700264] ? srso_return_thunk+0x5/0x5f
> > > [ 904.700836] ? do_raw_spin_unlock+0x76/0x140
> > > [ 904.701450] ? srso_return_thunk+0x5/0x5f
> > > [ 904.702021] ? _raw_spin_unlock+0x3f/0xa0
> > > [ 904.702606] ? srso_return_thunk+0x5/0x5f
> > > [ 904.703178] ? alloc_fd+0x1ca/0x3b0
> > > [ 904.703685] do_sys_openat2+0xba/0x120
> > > [ 904.704223] ? file_free+0x8d/0x110
> > > [ 904.704729] do_sys_open+0x4c/0x80
> > > [ 904.705221] __x64_sys_openat+0x23/0x30
> > > [ 904.705784] x64_sys_call+0x2575/0x4550
> > > [ 904.706337] do_syscall_64+0x71/0x180
> > > [ 904.706863] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [ 904.707587] RIP: 0033:0x7f3462123037
> > > [ 904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac
> > > [ 904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
> > > [ 904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037
> > > [ 904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c
> > > [ 904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000
> > > [ 904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8
> > > [ 904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690
> > > [ 904.716877] </TASK>
> > >
> > > Roberto
> >
>
Powered by blists - more mailing lists