linux-kernel - Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251115222850.183b8557@pumpkin>
Date: Sat, 15 Nov 2025 22:28:50 +0000
From: David Laight <david.laight.linux@...il.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>, Christoph
 Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
 linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-nvme@...ts.infradead.org
Subject: Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle
 larger sizes

On Sat, 15 Nov 2025 20:05:47 +0200
Leon Romanovsky <leon@...nel.org> wrote:

> On Sat, Nov 15, 2025 at 05:33:41PM +0000, David Laight wrote:
> > On Sat, 15 Nov 2025 18:22:45 +0200
> > Leon Romanovsky <leon@...nel.org> wrote:
> >   
> > > From: Leon Romanovsky <leonro@...dia.com>
> > > 
> > > This patch changes the length variables from unsigned int to size_t.
> > > Using size_t ensures that we can handle larger sizes, as size_t is
> > > always equal to or larger than the previously used u32 type.  
> > 
> > Where are requests larger than 4GB going to come from?  
> 
> The main goal is to reuse phys_vec structure. It is going to represent PCI
> regions exposed through VFIO DMABUF interface. Their length is more than u32.

Unless you actually need to have the same structure (because some function
is used in both places) there isn't really any need to have a single structure
for a a phy_addr:length pair.
Indeed keeping them separate can even remove bugs.

For instance (I think) blk_map_iter_next() returns an addr:len pair
that is only only used for the following sg_set_page() call - which
has separate parameters for phys_to_page(addr) and len.
So unless there are other place it is used it doesn't need to be
the same structure at all.
(Other people might disagree...)

> 
> >   
> > > Originally, u32 was used because blk-mq-dma code evolved from
> > > scatter-gather implementation, which uses unsigned int to describe length.
> > > This change will also allow us to reuse the existing struct phys_vec in places
> > > that don't need scatter-gather.
> > > 
> > > Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> > > ---
> > >  block/blk-mq-dma.c      | 14 +++++++++-----
> > >  drivers/nvme/host/pci.c |  4 ++--
> > >  2 files changed, 11 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > > index e9108ccaf4b0..cc3e2548cc30 100644
> > > --- a/block/blk-mq-dma.c
> > > +++ b/block/blk-mq-dma.c
> > > @@ -8,7 +8,7 @@
> > >  
> > >  struct phys_vec {
> > >  	phys_addr_t	paddr;
> > > -	u32		len;
> > > +	size_t		len;
> > >  };
> > >  
> > >  static bool __blk_map_iter_next(struct blk_map_iter *iter)
> > > @@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
> > >  		struct phys_vec *vec)
> > >  {
> > >  	enum dma_data_direction dir = rq_dma_dir(req);
> > > -	unsigned int mapped = 0;
> > >  	unsigned int attrs = 0;
> > > +	size_t mapped = 0;
> > >  	int error;
> > >  
> > >  	iter->addr = state->addr;
> > > @@ -296,8 +296,10 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
> > >  	blk_rq_map_iter_init(rq, &iter);
> > >  	while (blk_map_iter_next(rq, &iter, &vec)) {
> > >  		*last_sg = blk_next_sg(last_sg, sglist);
> > > -		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
> > > -				offset_in_page(vec.paddr));
> > > +
> > > +		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));  
> > 
> > I'm not at all sure you need that test.
> > blk_map_iter_next() has to guarantee that vec.len is valid.
> > (probably even less than a page size?)
> > Perhaps this code should be using a different type for the addr:len pair?  
> 
> I added this test for future proof, this is why it doesn't "return" on
> overflow, but prints dump stack and continues. It can't happen.

No, on a large number of installed systems it prints the stack an panicks.
Were it to continue the effect would be all wrong anyway.
But blk_map_iter_next() guarantees to return a sane length.

> 
> >   
> > > +		sg_set_page(*last_sg, phys_to_page(vec.paddr),
> > > +			    (unsigned int)vec.len, offset_in_page(vec.paddr));  
> > 
> > You definitely don't need the explicit cast.  
> 
> We degrade type from u64 to u32. Why don't we need cast?

Because you don't need to cast pretty much all integer conversions.
Any warnings compilers might output for such assignments really are best
disabled.
The more casts you add to code to remove 'silly' compiler warnings the
harder it is to find the ones that actually have a desired effect and/or
unwanted effects that are actually bugs.

I'm busy trying to fix a load of min_t(u32, a, b) which mask off high
significant bits from u64 values.
The casts got added (implicitly by using min_t() instead of min()) because
min() required the types match - and in a lot of cases the programmer
picked the type of the result not that of the larger parameter.
Others are just cut&paste of another line.
But the effect is the same, the casts add bugs rather than making the
code better.

I've even seen:
	uchar_buf[0] = (unsigned char)(int_val & 0xff);
(Presumably written to avoid compiler warnings.)
and looked at the object code to find the compiler (not gcc) anded the
value with 0xff for the '& 0xff', anded it with 0xff again for the cast
and then did a memory write of the low bits.

casts could easily be the next 'bug'...

	David

> 
> Thanks