lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6b2c19b-c2a6-400c-bbf1-bf0469138777@grimberg.me>
Date: Thu, 30 May 2024 20:58:06 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Ofir Gal <ofir.gal@...umez.com>, davem@...emloft.net,
 linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
 netdev@...r.kernel.org, ceph-devel@...r.kernel.org
Cc: dhowells@...hat.com, edumazet@...gle.com, pabeni@...hat.com,
 kbusch@...nel.org, axboe@...nel.dk, hch@....de, philipp.reisner@...bit.com,
 lars.ellenberg@...bit.com, christoph.boehmwalder@...bit.com,
 idryomov@...il.com, xiubli@...hat.com
Subject: Re: [PATCH 0/4] bugfix: Introduce sendpages_ok() to check
 sendpage_ok() on contiguous pages

Hey Ofir,

On 30/05/2024 16:26, Ofir Gal wrote:
> skb_splice_from_iter() warns on !sendpage_ok() which results in nvme-tcp
> data transfer failure. This warning leads to hanging IO.
>
> nvme-tcp using sendpage_ok() to check the first page of an iterator in
> order to disable MSG_SPLICE_PAGES. The iterator can represent a list of
> contiguous pages.
>
> When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
> it requires all pages in the iterator to be sendable.
> skb_splice_from_iter() checks each page with sendpage_ok().
>
> nvme_tcp_try_send_data() might allow MSG_SPLICE_PAGES when the first
> page is sendable, but the next one are not. skb_splice_from_iter() will
> attempt to send all the pages in the iterator. When reaching an
> unsendable page the IO will hang.

Interesting. Do you know where this buffer came from? I find it strange
that a we get a bvec with a contiguous segment which consists of non slab
originated pages together with slab originated pages... it is surprising 
to see
a mix of the two.

I'm wandering if this is something that happened before david's splice_pages
changes. Maybe before that with multipage bvecs? Anyways it is strange, 
never
seen that.

David,  strange that nvme-tcp is setting a single contiguous element 
bvec but it
is broken up into PAGE_SIZE increments in skb_splice_from_iter...

>
> The patch introduces a helper sendpages_ok(), it returns true if all the
> continuous pages are sendable.
>
> Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
> this helper to check whether the page list is OK. If the helper does not
> return true, the driver should remove MSG_SPLICE_PAGES flag.
>
>
> The bug is reproducible, in order to reproduce we need nvme-over-tcp
> controllers with optimal IO size bigger than PAGE_SIZE. Creating a raid
> with bitmap over those devices reproduces the bug.
>
> In order to simulate large optimal IO size you can use dm-stripe with a
> single device.
> Script to reproduce the issue on top of brd devices using dm-stripe is
> attached below.

This is a great candidate for blktests. would be very beneficial to have 
it added there.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ