[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO8a2SjWXbVxDy4kcKF6JSesB=_QEfb=ZfPbwXpiY_GUuwA8zQ@mail.gmail.com>
Date: Wed, 27 Nov 2024 15:47:02 +0200
From: Alex Markuze <amarkuze@...hat.com>
To: Luis Henriques <luis.henriques@...ux.dev>
Cc: Goldwyn Rodrigues <rgoldwyn@...e.de>, Xiubo Li <xiubli@...hat.com>,
Ilya Dryomov <idryomov@...il.com>, ceph-devel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] ceph: ceph: fix out-of-bound array access when
doing a file read
Hi, Folks.
AFAIK there is no side effect that can affect MDS with this fix.
This crash happens following this patch
"1065da21e5df9d843d2c5165d5d576be000142a6" "ceph: stop copying to iter
at EOF on sync reads".
Per your fix Luis, it seems to address only the cases when i_size goes
to zero but can happen anytime the `i_size` goes below `off`.
I propose fixing it this way:
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..19b084212fee 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode,
loff_t *ki_pos,
if (ceph_inode_is_shutdown(inode))
return -EIO;
- if (!len)
+ if (!len || !i_size)
return 0;
/*
* flush any page cache pages in this range. this
@@ -1200,12 +1200,11 @@ ssize_t __ceph_sync_read(struct inode *inode,
loff_t *ki_pos,
}
idx = 0;
- if (ret <= 0)
- left = 0;
- else if (off + ret > i_size)
- left = i_size - off;
+ if (off + ret > i_size)
+ left = (i_size > off) ? i_size - off : 0;
else
- left = ret;
+ left = (ret > 0) ? ret : 0;
+
while (left > 0) {
size_t plen, copied;
On Thu, Nov 7, 2024 at 1:09 PM Luis Henriques <luis.henriques@...ux.dev> wrote:
>
> (CC'ing Alex)
>
> On Wed, Nov 06 2024, Goldwyn Rodrigues wrote:
>
> > Hi Xiubo,
> >
> >> BTW, so in the following code:
> >>
> >> 1202 idx = 0;
> >> 1203 if (ret <= 0)
> >> 1204 left = 0;
> >> 1205 else if (off + ret > i_size)
> >> 1206 left = i_size - off;
> >> 1207 else
> >> 1208 left = ret;
> >>
> >> The 'ret' should be larger than '0', right ?
> >>
> >> If so we do not check anf fix it in the 'else if' branch instead?
> >>
> >> Because currently the read path code won't exit directly and keep
> >> retrying to read if it found that the real content length is longer than
> >> the local 'i_size'.
> >>
> >> Again I am afraid your current fix will break the MIX filelock semantic ?
> >
> > Do you think changing left to ssize_t instead of size_t will
> > fix the problem?
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 4b8d59ebda00..f8955773bdd7 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> > if (ceph_inode_is_shutdown(inode))
> > return -EIO;
> >
> > - if (!len)
> > + if (!len || !i_size)
> > return 0;
> > /*
> > * flush any page cache pages in this range. this
> > @@ -1087,7 +1087,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> > size_t page_off;
> > bool more;
> > int idx;
> > - size_t left;
> > + ssize_t left;
> > struct ceph_osd_req_op *op;
> > u64 read_off = off;
> > u64 read_len = len;
> >
>
> I *think* (although I haven't tested it) that you're patch should work as
> well. But I also think it's a bit more hacky: the overflow will still be
> there:
>
> if (ret <= 0)
> left = 0;
> else if (off + ret > i_size)
> left = i_size - off;
> else
> left = ret;
> while (left > 0) {
> // ...
> }
>
> If 'i_size' is '0', 'left' (which is now signed) will now have a negative
> value in the 'else if' branch and the loop that follows will not be
> executed. My version will simply set 'ret' to '0' before this 'if'
> construct.
>
> So, in my opinion, what needs to be figured out is whether this will cause
> problems on the MDS side or not. Because on the kernel client, it should
> be safe to ignore reads to an inode that has size set to '0', even if
> there's already data available to be read. Eventually, the inode metadata
> will get updated and by then we can retry the read.
>
> Unfortunately, the MDS continues to be a huge black box for me and the
> locking code in particular is very tricky. I'd rather defer this for
> anyone that is familiar with the code.
>
> Cheers,
> --
> Luís
>
Powered by blists - more mailing lists