lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f92b7d6-7d3c-4830-a591-75dc4d55c46c@redhat.com>
Date: Mon, 23 Jun 2025 18:59:19 +0200
From: David Hildenbrand <david@...hat.com>
To: Pavel Begunkov <asml.silence@...il.com>, Jens Axboe <axboe@...nel.dk>,
 Alexander Potapenko <glider@...gle.com>
Cc: syzbot <syzbot+1d335893772467199ab6@...kaller.appspotmail.com>,
 akpm@...ux-foundation.org, catalin.marinas@....com, jgg@...pe.ca,
 jhubbard@...dia.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 peterx@...hat.com, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [mm?] kernel BUG in sanity_check_pinned_pages

On 23.06.25 18:48, Pavel Begunkov wrote:
> On 6/23/25 16:11, David Hildenbrand wrote:
>> On 23.06.25 16:58, Jens Axboe wrote:
>>> On 6/23/25 6:22 AM, David Hildenbrand wrote:
>>>> On 23.06.25 12:10, David Hildenbrand wrote:
>>>>> On 23.06.25 11:53, Alexander Potapenko wrote:
>>>>>> On Mon, Jun 23, 2025 at 11:29?AM 'David Hildenbrand' via
>>>>>> syzkaller-bugs <syzkaller-bugs@...glegroups.com> wrote:
>>>>>>>
> ...>>> When only pinning a single tail page (iovec.iov_len = pagesize), it works as expected.
>>>>
>>>> So, if we pinned two tail pages but end up calling io_release_ubuf()->unpin_user_page()
>>>> on the head page, meaning that "imu->bvec[i].bv_page" points at the wrong folio page
>>>> (IOW, one we never pinned).
>>>>
>>>> So it's related to the io_coalesce_buffer() machinery.
>>>>
>>>> And in fact, in there, we have this weird logic:
>>>>
>>>> /* Store head pages only*/
>>>> new_array = kvmalloc_array(nr_folios, sizeof(struct page *), GFP_KERNEL);
>>>> ...
>>>>
>>>>
>>>> Essentially discarding the subpage information when coalescing tail pages.
>>>>
>>>>
>>>> I am afraid the whole io_check_coalesce_buffer + io_coalesce_buffer() logic might be
>>>> flawed (we can -- in theory -- coalesc different folio page ranges in
>>>> a GUP result?).
>>>>
>>>> @Jens, not sure if this only triggers a warning when unpinning or if we actually mess up
>>>> imu->bvec[i].bv_page, to end up pointing at (reading/writing) pages we didn't even pin in the first
>>>> place.
>>>>
>>>> Can you look into that, as you are more familiar with the logic?
>>>
>>> Leaving this all quoted and adding Pavel, who wrote that code. I'm
>>> currently away, so can't look into this right now.
> 
> Chenliang Li did, but not like it matters
> 
>> I did some more digging, but ended up being all confused about io_check_coalesce_buffer() and io_imu_folio_data().
>>
>> Assuming we pass a bunch of consecutive tail pages that all belong to the same folio, then the loop in io_check_coalesce_buffer() will always
>> run into the
>>
>> if (page_folio(page_array[i]) == folio &&
>>       page_array[i] == page_array[i-1] + 1) {
>>       count++;
>>       continue;
>> }
>>
>> case, making the function return "true" ... in io_coalesce_buffer(), we then store the head page ... which seems very wrong.
>>
>> In general, storing head pages when they are not the first page to be coalesced seems wrong.
> 
> Yes, it stores the head page even if the range passed to
> pin_user_pages() doesn't cover the head page.
 > > It should be converted to unpin_user_folio(), which doesn't seem
> to do sanity_check_pinned_pages(). Do you think that'll be enough
> (conceptually)? Nobody is actually touching the head page in those
> cases apart from the final unpin, and storing the head page is
> more convenient than keeping folios. I'll take a look if it can
> be fully converted to folios w/o extra overhead.

Assuming we had from GUP

nr_pages = 2
pages[0] = folio_page(folio, 1)
pages[1] = folio_page(folio, 2)

After io_coalesce_buffer() we have

nr_pages = 1
pages[0] = folio_page(folio, 0)


Using unpin_user_folio() in all places where we could see something like 
that would be the right thing to do. The sanity checks are not in 
unpin_user_folio() for exactly that reason: we don't know which folio 
pages we pinned.

But now I wonder where you make sure that "Nobody is actually touching 
the head page"?

How do you get back the "which folio range" information after 
io_coalesce_buffer() ?


If you rely on alignment in virtual address space for you, combined with 
imu->folio_shift, that might not work reliably ...

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ