[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6285f19-01aa-49c8-8fef-4b5842136215@kernel.dk>
Date: Fri, 3 May 2024 13:35:09 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Kees Cook <keescook@...omium.org>
Cc: Bui Quang Minh <minhquangbui99@...il.com>,
Al Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>,
syzbot <syzbot+045b454ab35fd82a35fb@...kaller.appspotmail.com>,
io-uring@...r.kernel.org, jack@...e.cz, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linaro-mm-sig@...ts.linaro.org, Laura Abbott <laura@...bott.name>
Subject: Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?]
general protection fault in __ep_remove)
On 5/3/24 1:22 PM, Kees Cook wrote:
> On Fri, May 03, 2024 at 12:49:11PM -0600, Jens Axboe wrote:
>> On 5/3/24 12:26 PM, Kees Cook wrote:
>>> Thanks for doing this analysis! I suspect at least a start of a fix
>>> would be this:
>>>
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index 8fe5aa67b167..15e8f74ee0f2 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
>>>
>>> if (events & EPOLLOUT) {
>>> /* Paired with fput in dma_buf_poll_cb */
>>> - get_file(dmabuf->file);
>>> -
>>> - if (!dma_buf_poll_add_cb(resv, true, dcb))
>>> + if (!atomic_long_inc_not_zero(&dmabuf->file) &&
>>> + !dma_buf_poll_add_cb(resv, true, dcb))
>>> /* No callback queued, wake up any other waiters */
>>
>> Don't think this is sane at all. I'm assuming you meant:
>>
>> atomic_long_inc_not_zero(&dmabuf->file->f_count);
>
> Oops, yes, sorry. I was typed from memory instead of copy/paste.
Figured :-)
>> but won't fly as you're not under RCU in the first place. And what
>> protects it from being long gone before you attempt this anyway? This is
>> sane way to attempt to fix it, it's completely opposite of what sane ref
>> handling should look like.
>>
>> Not sure what the best fix is here, seems like dma-buf should hold an
>> actual reference to the file upfront rather than just stash a pointer
>> and then later _hope_ that it can just grab a reference. That seems
>> pretty horrible, and the real source of the issue.
>
> AFAICT, epoll just doesn't hold any references at all. It depends,
> I think, on eventpoll_release() (really eventpoll_release_file())
> synchronizing with epoll_wait() (but I don't see how this happens, and
> the race seems to be against ep_item_poll() ...?)
>
> I'm really confused about how eventpoll manages the lifetime of polled
> fds.
epoll doesn't hold any references, and it's got some ugly callback to
deal with that. It's not ideal, nor pretty, but that's how it currently
works. See eventpoll_release() and how it's called. This means that
epoll itself is supposedly safe from the file going away, even though it
doesn't hold a reference to it.
Except that in this case, the file is already gone by the time
eventpoll_release() is called. Which presumably is some interaction with
the somewhat suspicious file reference management that dma-buf is doing.
But I didn't look into that much, outside of noting it looks a bit
suspect.
>>> Due to this issue I've proposed fixing get_file() to detect pathological states:
>>> https://lore.kernel.org/lkml/20240502222252.work.690-kees@kernel.org/
>>
>> I don't think this would catch this case, as the memory could just be
>> garbage at this point.
>
> It catches it just fine! :) I tested it against the published PoC.
Sure it _may_ catch the issue, but the memory may also just be garbage
at that point. Not saying it's a useless addition, outside of the usual
gripes of making the hot path slower, just that it won't catch all
cases. Which I guess is fine and expected.
> And for cases where further allocations have progressed far enough to
> corrupt the freed struct file and render the check pointless, nothing
> different has happened than what happens today. At least now we have a
> window to catch the situation across the time frame before it is both
> reallocated _and_ the contents at the f_count offset gets changed to
> non-zero.
Right.
--
Jens Axboe
Powered by blists - more mailing lists