linux-kernel - Re: [PATCH v4 6/6] io_uring: add support for zone-append

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+1E3rJAa3E2Ti0fvvQTzARP797qge619m4aYLjXeR3wxdFwWw@mail.gmail.com>
Date:   Tue, 28 Jul 2020 00:46:28 +0530
From:   Kanchan Joshi <joshiiitr@...il.com>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     Kanchan Joshi <joshi.k@...sung.com>, viro@...iv.linux.org.uk,
        bcrl@...ck.org, Matthew Wilcox <willy@...radead.org>,
        Christoph Hellwig <hch@...radead.org>,
        Damien Le Moal <Damien.LeMoal@....com>,
        asml.silence@...il.com, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-aio@...ck.org,
        io-uring@...r.kernel.org, linux-block@...r.kernel.org,
        linux-api@...r.kernel.org, SelvaKumar S <selvakuma.s1@...sung.com>,
        Nitesh Shetty <nj.shetty@...sung.com>,
        Javier Gonzalez <javier.gonz@...sung.com>
Subject: Re: [PATCH v4 6/6] io_uring: add support for zone-append

On Fri, Jul 24, 2020 at 10:00 PM Jens Axboe <axboe@...nel.dk> wrote:
>
> On 7/24/20 9:49 AM, Kanchan Joshi wrote:
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 7809ab2..6510cf5 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -1284,8 +1301,15 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
> >       cqe = io_get_cqring(ctx);
> >       if (likely(cqe)) {
> >               WRITE_ONCE(cqe->user_data, req->user_data);
> > -             WRITE_ONCE(cqe->res, res);
> > -             WRITE_ONCE(cqe->flags, cflags);
> > +             if (unlikely(req->flags & REQ_F_ZONE_APPEND)) {
> > +                     if (likely(res > 0))
> > +                             WRITE_ONCE(cqe->res64, req->rw.append_offset);
> > +                     else
> > +                             WRITE_ONCE(cqe->res64, res);
> > +             } else {
> > +                     WRITE_ONCE(cqe->res, res);
> > +                     WRITE_ONCE(cqe->flags, cflags);
> > +             }
>
> This would be nice to keep out of the fast path, if possible.

I was thinking of keeping a function-pointer (in io_kiocb) during
submission. That would have avoided this check......but argument count
differs, so it did not add up.

> > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > index 92c2269..2580d93 100644
> > --- a/include/uapi/linux/io_uring.h
> > +++ b/include/uapi/linux/io_uring.h
> > @@ -156,8 +156,13 @@ enum {
> >   */
> >  struct io_uring_cqe {
> >       __u64   user_data;      /* sqe->data submission passed back */
> > -     __s32   res;            /* result code for this event */
> > -     __u32   flags;
> > +     union {
> > +             struct {
> > +                     __s32   res;    /* result code for this event */
> > +                     __u32   flags;
> > +             };
> > +             __s64   res64;  /* appending offset for zone append */
> > +     };
> >  };
>
> Is this a compatible change, both for now but also going forward? You
> could randomly have IORING_CQE_F_BUFFER set, or any other future flags.

Sorry, I didn't quite understand the concern. CQE_F_BUFFER is not
used/set for write currently, so it looked compatible at this point.
Yes, no room for future flags for this operation.
Do you see any other way to enable this support in io-uring?

> Layout would also be different between big and little endian, so not
> even that easy to set aside a flag for this. But even if that was done,
> we'd still have this weird API where liburing or the app would need to
> distinguish this cqe from all others based on... the user_data? Hence
> liburing can't do it, only the app would be able to.
>
> Just seems like a hack to me.

Yes, only user_data to distinguish. Do liburing helpers need to look
at cqe->res (and decide something) before returning the cqe to
application?
I see that happening at once place, but not sure when it would hit
LIBURING_DATA_TIMEOUT condition.
__io_uring_peek_cqe()
{
           do {
                io_uring_for_each_cqe(ring, head, cqe)
                        break;
                if (cqe) {
                        if (cqe->user_data == LIBURING_UDATA_TIMEOUT) {
                                if (cqe->res < 0)
                                        err = cqe->res;
                                io_uring_cq_advance(ring, 1);
                                if (!err)
                                        continue;
                                cqe = NULL;
                        }
                }
                break;
        } while (1);
}



-- 
Joshi