linux-kernel - Async IO idea

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID:  <loom.20070225T040733-711@post.gmane.org>
Date:	Sun, 25 Feb 2007 03:08:38 +0000 (UTC)
From:	Pierre Baillargeon <pierrebai@...mail.com>
To:	linux-kernel@...r.kernel.org
Subject:  Async IO idea

I'm an app programmer, not a kernel hacker. With that caveat...

I've been reading LWN article about AIO and the description of Linus' solution
and the following realization dawned on me: at its heart, the idea is to fork
when blocking. So let's make it explicit with a single new function call:

#define MAYBE_FORK_END    0
#define FORK_ON_BLOCKING  1
#define FORK_ON_SOMETHING 2 /* Other ideas to reuse this? */
int maybe_fork(jmp_buf *, int flags);

Conceptually, this call is a setjump() and from then on, any syscall which
would block would conceptually do fork()+longjump(). To end the potential
forking sequence of calls, one simply calls maybe_fork() with the
MAYBE_FORK_END flag. This solution takes advantage of the knowledge and
coding style already accumulated by programmers.

Demonstration:

/* Prepare async call: save current execution state. */
jmp_buf buffer;
int childpid = maybe_fork(&buffer, FORK_ON_BLOCKING);
if(!childpid)
{
   /* OK, we're at the initial sequence after FORK_ON_BLOCKING. */
   /* No fork as taken place yet. */
   /* Any blocking syscall from here on may cause a fork. */
   read();
   /* Stop the fork potential. */
   int our_new_pid = maybe_fork(0, MAYBE_FORK_END);
   /* Work that depends on read() and maybe done in child, who knows? */
   /* But it *won't* cause a fork if it blocks */
   bar();
   /* Check if we're in child. */
   if(our_new_pid)
   {
      /* Oh my! We blocked in read() and forked there! */
      /* Of course, we're not *forced* to exit() or anything... */
      exit();
   }
}
/* Work potentially done in parallel to async read(). */
foo();
/* Check if we had forked and are in parent. */
if(childpid)
   /* Oh my! We blocked and really are a parent! */
   /* Wait for async ops to finish. */
   int status;
   waitpid(childpid, &status, 0);
}
/* Work that depends on read() but must be done after foo(). */
qat();

/*
 * Non-blocking case:
 *    - getpid(), maybe_fork(), read(), maybe_fork(), bar(), foo(), qat().
 *
 * Blocking case:
 *    - getpid(), maybe_fork(), read() [Blocks and forks there.]
 *       - In child:
 *          - maybe_fork(), bar(), exit()
 *       - In parent:
 *          - first maybe_fork() returns child pid.
 *          - foo(), waitpid(), qat()
 */

Some non-issues with the idea, which are in reality just a re-hash
of longjump():

- A pointer to the jmp_buf must be kept in the process structure to
  be able to (conceptually) longjmp() there.

     This isn't much of an issue. It's the duty of the caller, like
     keeping a proper jmp_buf is required. It could be a security
     risk if the longjmp() would be done in kernel space, but arranging
     for doing it in user-space isn't hard (I would think).

- If there are process-wide state changed in the potentially asynchronous
  calls (say, due to an open() in the middle of a sequence of calls), then
  when/if there is a fork, that change will be visible in the parent process.
  IOW, if you write your code naively, you could leak, say, file descriptors.

     Again, this is only a user-space issue. All that is needed is for
     that state to be visible in the potential parent process, say by
     putting the file descriptor in a variable that is visible in the
     context of the 1st maybe_fork(). This is also equivalent to the
     coding issues of setjump()/longjump(), so it's nothing new.

The great things are:

- You can do as many syscalls as you wish in the async portion.

- No forking in the non-blocking case.

- Very light setup work.

- Reuse known structure, calls and concepts.

- You can have many styles for looping cases:

     * A single maybe_fork(), with all work potentially done in a child.
     * A limited number of maybe_fork() (i.e. a statically declared
       array of jmp_buf, on exhaustion the last child does it all).
     * A first stack-based jmp_buf kept in a pointer, creating further
       ones as needed.

- Support all kinds of blocking code.

- Support new kinds of conditional forking by using new values for the flags.

Just throwing the idea around.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/