linux-kernel - Re: lockdep: holding locks across syscall boundaries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231027155949.GA26550@noisy.programming.kicks-ass.net>
Date:   Fri, 27 Oct 2023 17:59:49 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     Ingo Molnar <mingo@...hat.com>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: lockdep: holding locks across syscall boundaries

On Fri, Oct 27, 2023 at 09:14:53AM -0600, Jens Axboe wrote:
> Hi,
> 
> Normally we'd expect locking state to be clean and consistent across
> syscall entry and exit, as that is always the case for sync syscalls.

> We currently have a work-around for holding a lock from aio, see
> kiocb_start_write(), which pretends to drop the lock from lockdeps
> perspective, as it's held from submission to until kiocb_end_write() is
> called at completion time.

I was not aware of this, the only such hack I knew about was the
filesystem freezer thing.

The problem with holding locks past the end of a syscall is that you'll
nest whatever random lock hierarchies possibly by every other syscall
under that lock.

> This is a bit of an ugly work-around, and defeats the purpose of
> lockdep.
> 
> Since I've now got another case where I want to hold a resource across
> syscalls, is there a better way to do this?
> 
> This is for inode_dio_start(), which increments an inode int count, and
> inode_dio_end() which decrements it. If a task is doing
> inode_dio_start() and then inode_dio_wait(), I want to trigger this. I
> have a hack that does this, but it disables lockdep_sys_exit() as
> otherwise I just get that warning rather than the more useful one.

Suppose syscall-a returns with your kiocb thing held, call it lock A
Suppose syscall-b returns with your inode thing held, call it lock B

Then userspace does:

	syscall-a
	syscall-b

while it also does:

	syscall-b
	syscall-a

and we're up a creek, no?