linux-kernel - Re: [RFC PATCH v2] ceph: ceph: fix out-of-bound array access when doing a file read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO8a2SjHq0hi22QdmaTH2E_c1vP2qHvy7JWE3E1+y3VhEWbDaw@mail.gmail.com>
Date: Thu, 28 Nov 2024 20:19:31 +0200
From: Alex Markuze <amarkuze@...hat.com>
To: Luis Henriques <luis.henriques@...ux.dev>
Cc: Goldwyn Rodrigues <rgoldwyn@...e.de>, Xiubo Li <xiubli@...hat.com>, 
	Ilya Dryomov <idryomov@...il.com>, ceph-devel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2] ceph: ceph: fix out-of-bound array access when
 doing a file read

I didn't discard it though :).
I folded it into the `if` statement. I find the if else construct
overly verbose and cumbersome.

+                       left = (ret > 0) ? ret : 0;

On Thu, Nov 28, 2024 at 7:43 PM Luis Henriques <luis.henriques@...ux.dev> wrote:
>
> Hi Alex,
>
> [ Thank you for looking into this. ]
>
> On Wed, Nov 27 2024, Alex Markuze wrote:
>
> > Hi, Folks.
> > AFAIK there is no side effect that can affect MDS with this fix.
> > This crash happens following this patch
> > "1065da21e5df9d843d2c5165d5d576be000142a6" "ceph: stop copying to iter
> > at EOF on sync reads".
> >
> > Per your fix Luis, it seems to address only the cases when i_size goes
> > to zero but can happen anytime the `i_size` goes below  `off`.
> > I propose fixing it this way:
>
> Hmm... you're probably right.  I didn't see this happening, but I guess it
> could indeed happen.
>
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 4b8d59ebda00..19b084212fee 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode,
> > loff_t *ki_pos,
> >         if (ceph_inode_is_shutdown(inode))
> >                 return -EIO;
> >
> > -       if (!len)
> > +       if (!len || !i_size)
> >                 return 0;
> >         /*
> >          * flush any page cache pages in this range.  this
> > @@ -1200,12 +1200,11 @@ ssize_t __ceph_sync_read(struct inode *inode,
> > loff_t *ki_pos,
> >                 }
> >
> >                 idx = 0;
> > -               if (ret <= 0)
> > -                       left = 0;
>
> Right now I don't have any means for testing this patch.  However, I don't
> think this is completely correct.  By removing the above condition you're
> discarding cases where an error has occurred (i.e. where ret is negative).
>
> Why not simply modify my patch and do:
>
>                 if (i_size < off)
>                         ret = 0;
>
> instead of:
>                 if (i_size == 0)
>                         ret = 0;
>
> ?
>
> (Again, totally untested!)
>
> Cheers,
> --
> Luís
>
> > -               else if (off + ret > i_size)
> > -                       left = i_size - off;
> > +               if (off + ret > i_size)
> > +                       left = (i_size > off) ? i_size - off : 0;
> >                 else
> > -                       left = ret;
> > +                       left = (ret > 0) ? ret : 0;
> > +
> >                 while (left > 0) {
> >                         size_t plen, copied;
> >
> >
> > On Thu, Nov 7, 2024 at 1:09 PM Luis Henriques <luis.henriques@...ux.dev> wrote:
> >>
> >> (CC'ing Alex)
> >>
> >> On Wed, Nov 06 2024, Goldwyn Rodrigues wrote:
> >>
> >> > Hi Xiubo,
> >> >
> >> >> BTW, so in the following code:
> >> >>
> >> >> 1202                 idx = 0;
> >> >> 1203                 if (ret <= 0)
> >> >> 1204                         left = 0;
> >> >> 1205                 else if (off + ret > i_size)
> >> >> 1206                         left = i_size - off;
> >> >> 1207                 else
> >> >> 1208                         left = ret;
> >> >>
> >> >> The 'ret' should be larger than '0', right ?
> >> >>
> >> >> If so we do not check anf fix it in the 'else if' branch instead?
> >> >>
> >> >> Because currently the read path code won't exit directly and keep
> >> >> retrying to read if it found that the real content length is longer than
> >> >> the local 'i_size'.
> >> >>
> >> >> Again I am afraid your current fix will break the MIX filelock semantic ?
> >> >
> >> > Do you think changing left to ssize_t instead of size_t will
> >> > fix the problem?
> >> >
> >> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> > index 4b8d59ebda00..f8955773bdd7 100644
> >> > --- a/fs/ceph/file.c
> >> > +++ b/fs/ceph/file.c
> >> > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >> >       if (ceph_inode_is_shutdown(inode))
> >> >               return -EIO;
> >> >
> >> > -     if (!len)
> >> > +     if (!len || !i_size)
> >> >               return 0;
> >> >       /*
> >> >        * flush any page cache pages in this range.  this
> >> > @@ -1087,7 +1087,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >> >               size_t page_off;
> >> >               bool more;
> >> >               int idx;
> >> > -             size_t left;
> >> > +             ssize_t left;
> >> >               struct ceph_osd_req_op *op;
> >> >               u64 read_off = off;
> >> >               u64 read_len = len;
> >> >
> >>
> >> I *think* (although I haven't tested it) that you're patch should work as
> >> well.  But I also think it's a bit more hacky: the overflow will still be
> >> there:
> >>
> >>                 if (ret <= 0)
> >>                         left = 0;
> >>                 else if (off + ret > i_size)
> >>                         left = i_size - off;
> >>                 else
> >>                         left = ret;
> >>                 while (left > 0) {
> >>                         // ...
> >>                 }
> >>
> >> If 'i_size' is '0', 'left' (which is now signed) will now have a negative
> >> value in the 'else if' branch and the loop that follows will not be
> >> executed.  My version will simply set 'ret' to '0' before this 'if'
> >> construct.
> >>
> >> So, in my opinion, what needs to be figured out is whether this will cause
> >> problems on the MDS side or not.  Because on the kernel client, it should
> >> be safe to ignore reads to an inode that has size set to '0', even if
> >> there's already data available to be read.  Eventually, the inode metadata
> >> will get updated and by then we can retry the read.
> >>
> >> Unfortunately, the MDS continues to be a huge black box for me and the
> >> locking code in particular is very tricky.  I'd rather defer this for
> >> anyone that is familiar with the code.
> >>
> >> Cheers,
> >> --
> >> Luís
> >>
>