[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210417170931.hxo2vvt4532jrx7k@ast-mbp.dhcp.thefacebook.com>
Date: Sat, 17 Apr 2021 10:09:31 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: "David S. Miller" <davem@...emloft.net>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next 11/15] bpf: Add bpf_sys_close() helper.
On Sat, Apr 17, 2021 at 04:48:53PM +0000, Al Viro wrote:
> On Sat, Apr 17, 2021 at 07:36:39AM -0700, Alexei Starovoitov wrote:
>
> > The kernel will perform the same work with FDs. The same locks are held
> > and the same execution conditions are in both cases. The LSM hooks,
> > fsnotify, etc will be called the same way.
> > It's no different if new syscall was introduced "sys_foo(int num)" that
> > would do { return close_fd(num); }.
> > It would opearate in the same user context.
>
> Hmm... unless I'm misreading the code, one of the call chains would seem to
> be sys_bpf() -> bpf_prog_test_run() -> ->test_run() -> ... -> bpf_sys_close().
> OK, as long as you make sure bpf_prog_get() does fdput() (i.e. that we
> don't have it restructured so that fdget()/fdput() pair would be lifted into
> bpf_prog_test_run(), with fdput() moved in place of bpf_prog_put()).
Got it. There is no fdget/put bracketing in the code.
On the way to test_run we do __bpf_prog_get() which does fdget and immediately
fdput after incrementing refcnt of the prog.
I believe this pattern is consistent everywhere in kernel/bpf/*
> Note that we *really* can not allow close_fd() on anything to be bracketed
> by fdget()/fdput() pair; we had bugs of that sort and, as the matter of fact,
> still have one in autofs_dev_ioctl().
>
> The trouble happens if you have file F with 2 references, held by descriptor
> tables of different processes. Say, process A has descriptor 6 refering to
> it, while B has descriptor 42 doing the same. Descriptor tables of A and B
> are not shared with anyone.
>
> A: fdget(6) -> returns a reference to F, refcount _not_ touched
> A: close_fd(6) -> rips the reference to F from descriptor table, does fput(F)
> refcount drops to 1.
> B: close(42) -> rips the reference to F from B's descriptor table, does fput(F)
> This time refcount does reach 0 and we use task_work_add() to
> make sure the destructor (__fput()) runs before B returns to
> userland. sys_close() returns and B goes off to userland.
> On the way out __fput() is run, and among other things,
> ->release() of F is executed, doing whatever it wants to do.
> F is freed.
> And at that point A, which presumably is using the guts of F, gets screwed.
Thanks for these details. That's really helpful.
> So please, mark all call sites with "make very sure you never get
> here with unpaired fdget()".
Good point. Will add this comment.
> BTW, if my reading (re ->test_run()) is correct, what limits the recursion
> via bpf_sys_bpf()?
Glad you asked! This kind of code review questions are much appreciated.
It's an allowlist of possible commands in bpf_sys_bpf().
'case BPF_PROG_TEST_RUN:' is not there for this exact reason.
I'll add a comment to make it more obvious.
Powered by blists - more mailing lists