linux-kernel - Re: [RFC PATCH] 9p: forbid use of mempool for TFLUSH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a4d7f359-c430-6487-ec38-009bc2af2e97@gmail.com>
Date:   Wed, 13 Jul 2022 03:40:10 -0400
From:   Kent Overstreet <kent.overstreet@...il.com>
To:     Dominique Martinet <asmadeus@...ewreck.org>
Cc:     Christian Schoenebeck <linux_oss@...debyte.com>,
        linux-kernel@...r.kernel.org, v9fs-developer@...ts.sourceforge.net,
        Greg Kurz <groug@...d.org>,
        Eric Van Hensbergen <ericvh@...il.com>,
        Latchesar Ionkov <lucho@...kov.net>,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [RFC PATCH] 9p: forbid use of mempool for TFLUSH

On 7/13/22 03:12, Dominique Martinet wrote:
> Kent Overstreet wrote on Wed, Jul 13, 2022 at 02:39:06AM -0400:
>> On 7/13/22 00:17, Dominique Martinet wrote:
>>> TFLUSH is called while the thread still holds memory for the
>>> request we're trying to flush, so mempool alloc can deadlock
>>> there. With p9_msg_buf_size() rework the flush allocation is
>>> small so just make it fail if allocation failed; all that does
>>> is potentially leak the request we're flushing until its reply
>>> finally does come.. or if it never does until umount.
>>
>> Why not just add separate mempools for flushes? We don't have to allocate
>> memory for big payloads so they won't cost much, and then the IO paths will
>> be fully mempool-ified :)
> 
> I don't think it really matters either way -- I'm much more worried
> about the two points I gave in the commit comment section: mempools not
> being shared leading to increased memory usage when many mostly-idle
> mounts (I know users who need that), and more importantly that the
> mempool waiting is uninterruptible/non-failible might be "nice" from the
> using mempool side but I'd really prefer users to be able to ^C out of
> a mount made on a bad server getting stuck in mempool_alloc at least.

We should never get stuck allocating memory - if that happens, game 
over, system can no longer make forward progress.

(oh, that does give me an idea: Suren just implemented a code tagging 
mechanism for tracking memory allocations by callsite, and I was talking 
about using it for tracking latency. Memory allocation latency would be 
a great thing to measure, it's something we care about and we haven't 
had a good way of measuring it before).

> It looked good before I realized all the ways this could hang, but now I
> just think for something like 9p it makes more sense to fail the
> allocation and the IO than to bounce forever trying to allocate memory
> we don't have.

A filesystem that randomly fails IOs is, fundamentally, not a filesystem 
that _works_. This whole thing started because 9pfs failing IOs has been 
breaking my xfstests runs - and 9pfs isn't the thing I'm trying to test!

Local filesystems and the local IO stack have always had this 
understanding - that IO needs to _just work_ and be able to make forward 
progress without allocating additional memory, otherwise everything 
falls over because memory reclaim requires doing IO. It's fundamentally 
no different with network filesystems, the cultural expectation just 
hasn't been there historically and not for any good technical reason - 
it's just that in -net land dropping packets is generally a fine thing 
to do when you have to - but it's really not in filesystem land, not if 
you want to make something that's reliable under memory pressure!

> Let's first see if you still see if you still see high order allocation
> failures when these are made much less likely with Chritisan's patch.

Which patch is that? Unless you're talking about my mempool patch?

> What I intend to push this cycle is in
> https://github.com/martinetd/linux/commits/9p-test
> up to 'net/9p: allocate appropriate reduced message buffers'; if you can
> easily produce them I'd appreciate if you could confirm if it helps.
> 
> (just waiting for Chritian's confirmation + adjusting the strcmp for
> rdma before I push it to 9p-next)
> --
> Dominique
>