netdev - Re: [PATCH v4 4/4] libceph: use sendpages_ok() instead of sendpage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3f675a0f-45c1-41bd-887a-fe6e6d793ecf@grimberg.me>
Date: Thu, 18 Jul 2024 01:51:50 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Ilya Dryomov <idryomov@...il.com>, Ofir Gal <ofir.gal@...umez.com>
Cc: davem@...emloft.net, linux-block@...r.kernel.org,
 linux-nvme@...ts.infradead.org, netdev@...r.kernel.org,
 ceph-devel@...r.kernel.org, dhowells@...hat.com, edumazet@...gle.com,
 pabeni@...hat.com, kbusch@...nel.org, xiubli@...hat.com
Subject: Re: [PATCH v4 4/4] libceph: use sendpages_ok() instead of
 sendpage_ok()



On 17/07/2024 23:26, Ilya Dryomov wrote:
> On Tue, Jul 16, 2024 at 2:46 PM Ofir Gal <ofir.gal@...umez.com> wrote:
>> Xiubo/Ilya please take a look
>>
>> On 6/11/24 09:36, Ofir Gal wrote:
>>> Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
>>> order to enable MSG_SPLICE_PAGES, it check the first page of the
>>> iterator, the iterator may represent contiguous pages.
>>>
>>> MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
>>> pages it sends with sendpage_ok().
>>>
>>> When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
>>> first page is sendable, but one of the other pages isn't
>>> skb_splice_from_iter() warns and aborts the data transfer.
>>>
>>> Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
>>> solves the issue.
>>>
>>> Signed-off-by: Ofir Gal <ofir.gal@...umez.com>
>>> ---
>>>   net/ceph/messenger_v1.c | 2 +-
>>>   net/ceph/messenger_v2.c | 2 +-
>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
>>> index 0cb61c76b9b8..a6788f284cd7 100644
>>> --- a/net/ceph/messenger_v1.c
>>> +++ b/net/ceph/messenger_v1.c
>>> @@ -94,7 +94,7 @@ static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
>>>         * coalescing neighboring slab objects into a single frag which
>>>         * triggers one of hardened usercopy checks.
>>>         */
>>> -     if (sendpage_ok(page))
>>> +     if (sendpages_ok(page, size, offset))
>>>                msg.msg_flags |= MSG_SPLICE_PAGES;
>>>
>>>        bvec_set_page(&bvec, page, size, offset);
>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>> index bd608ffa0627..27f8f6c8eb60 100644
>>> --- a/net/ceph/messenger_v2.c
>>> +++ b/net/ceph/messenger_v2.c
>>> @@ -165,7 +165,7 @@ static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
>>>                 * coalescing neighboring slab objects into a single frag
>>>                 * which triggers one of hardened usercopy checks.
>>>                 */
>>> -             if (sendpage_ok(bv.bv_page))
>>> +             if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
>>>                        msg.msg_flags |= MSG_SPLICE_PAGES;
>>>                else
>>>                        msg.msg_flags &= ~MSG_SPLICE_PAGES;
> Hi Ofir,
>
> Ceph should be fine as is -- there is an internal "cursor" abstraction
> that that is limited to PAGE_SIZE chunks, using bvec_iter_bvec() instead
> of mp_bvec_iter_bvec(), etc.  This means that both do_try_sendpage() and
> ceph_tcp_sendpage() should be called only with
>
>    page_off + len <= PAGE_SIZE
>
> being true even if the page is contiguous (and that we lose out on the
> potential performance benefit, of course...).
>
> That said, if the plan is to remove sendpage_ok() so that it doesn't
> accidentally grow new users who are unaware of this pitfall, consider
> this
>
> Acked-by: Ilya Dryomov <idryomov@...il.com>

 From which tree should this go from? we can take it via the nvme tree, 
unless
someone else wants to queue it up...