[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO8a2SjHq0hi22QdmaTH2E_c1vP2qHvy7JWE3E1+y3VhEWbDaw@mail.gmail.com>
Date: Thu, 28 Nov 2024 20:19:31 +0200
From: Alex Markuze <amarkuze@...hat.com>
To: Luis Henriques <luis.henriques@...ux.dev>
Cc: Goldwyn Rodrigues <rgoldwyn@...e.de>, Xiubo Li <xiubli@...hat.com>,
Ilya Dryomov <idryomov@...il.com>, ceph-devel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] ceph: ceph: fix out-of-bound array access when
doing a file read
I didn't discard it though :).
I folded it into the `if` statement. I find the if else construct
overly verbose and cumbersome.
+ left = (ret > 0) ? ret : 0;
On Thu, Nov 28, 2024 at 7:43 PM Luis Henriques <luis.henriques@...ux.dev> wrote:
>
> Hi Alex,
>
> [ Thank you for looking into this. ]
>
> On Wed, Nov 27 2024, Alex Markuze wrote:
>
> > Hi, Folks.
> > AFAIK there is no side effect that can affect MDS with this fix.
> > This crash happens following this patch
> > "1065da21e5df9d843d2c5165d5d576be000142a6" "ceph: stop copying to iter
> > at EOF on sync reads".
> >
> > Per your fix Luis, it seems to address only the cases when i_size goes
> > to zero but can happen anytime the `i_size` goes below `off`.
> > I propose fixing it this way:
>
> Hmm... you're probably right. I didn't see this happening, but I guess it
> could indeed happen.
>
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 4b8d59ebda00..19b084212fee 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode,
> > loff_t *ki_pos,
> > if (ceph_inode_is_shutdown(inode))
> > return -EIO;
> >
> > - if (!len)
> > + if (!len || !i_size)
> > return 0;
> > /*
> > * flush any page cache pages in this range. this
> > @@ -1200,12 +1200,11 @@ ssize_t __ceph_sync_read(struct inode *inode,
> > loff_t *ki_pos,
> > }
> >
> > idx = 0;
> > - if (ret <= 0)
> > - left = 0;
>
> Right now I don't have any means for testing this patch. However, I don't
> think this is completely correct. By removing the above condition you're
> discarding cases where an error has occurred (i.e. where ret is negative).
>
> Why not simply modify my patch and do:
>
> if (i_size < off)
> ret = 0;
>
> instead of:
> if (i_size == 0)
> ret = 0;
>
> ?
>
> (Again, totally untested!)
>
> Cheers,
> --
> Luís
>
> > - else if (off + ret > i_size)
> > - left = i_size - off;
> > + if (off + ret > i_size)
> > + left = (i_size > off) ? i_size - off : 0;
> > else
> > - left = ret;
> > + left = (ret > 0) ? ret : 0;
> > +
> > while (left > 0) {
> > size_t plen, copied;
> >
> >
> > On Thu, Nov 7, 2024 at 1:09 PM Luis Henriques <luis.henriques@...ux.dev> wrote:
> >>
> >> (CC'ing Alex)
> >>
> >> On Wed, Nov 06 2024, Goldwyn Rodrigues wrote:
> >>
> >> > Hi Xiubo,
> >> >
> >> >> BTW, so in the following code:
> >> >>
> >> >> 1202 idx = 0;
> >> >> 1203 if (ret <= 0)
> >> >> 1204 left = 0;
> >> >> 1205 else if (off + ret > i_size)
> >> >> 1206 left = i_size - off;
> >> >> 1207 else
> >> >> 1208 left = ret;
> >> >>
> >> >> The 'ret' should be larger than '0', right ?
> >> >>
> >> >> If so we do not check anf fix it in the 'else if' branch instead?
> >> >>
> >> >> Because currently the read path code won't exit directly and keep
> >> >> retrying to read if it found that the real content length is longer than
> >> >> the local 'i_size'.
> >> >>
> >> >> Again I am afraid your current fix will break the MIX filelock semantic ?
> >> >
> >> > Do you think changing left to ssize_t instead of size_t will
> >> > fix the problem?
> >> >
> >> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> > index 4b8d59ebda00..f8955773bdd7 100644
> >> > --- a/fs/ceph/file.c
> >> > +++ b/fs/ceph/file.c
> >> > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >> > if (ceph_inode_is_shutdown(inode))
> >> > return -EIO;
> >> >
> >> > - if (!len)
> >> > + if (!len || !i_size)
> >> > return 0;
> >> > /*
> >> > * flush any page cache pages in this range. this
> >> > @@ -1087,7 +1087,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >> > size_t page_off;
> >> > bool more;
> >> > int idx;
> >> > - size_t left;
> >> > + ssize_t left;
> >> > struct ceph_osd_req_op *op;
> >> > u64 read_off = off;
> >> > u64 read_len = len;
> >> >
> >>
> >> I *think* (although I haven't tested it) that you're patch should work as
> >> well. But I also think it's a bit more hacky: the overflow will still be
> >> there:
> >>
> >> if (ret <= 0)
> >> left = 0;
> >> else if (off + ret > i_size)
> >> left = i_size - off;
> >> else
> >> left = ret;
> >> while (left > 0) {
> >> // ...
> >> }
> >>
> >> If 'i_size' is '0', 'left' (which is now signed) will now have a negative
> >> value in the 'else if' branch and the loop that follows will not be
> >> executed. My version will simply set 'ret' to '0' before this 'if'
> >> construct.
> >>
> >> So, in my opinion, what needs to be figured out is whether this will cause
> >> problems on the MDS side or not. Because on the kernel client, it should
> >> be safe to ignore reads to an inode that has size set to '0', even if
> >> there's already data available to be read. Eventually, the inode metadata
> >> will get updated and by then we can retry the read.
> >>
> >> Unfortunately, the MDS continues to be a huge black box for me and the
> >> locking code in particular is very tricky. I'd rather defer this for
> >> anyone that is familiar with the code.
> >>
> >> Cheers,
> >> --
> >> Luís
> >>
>
Powered by blists - more mailing lists