lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9924e31b-9f31-bf81-7a10-c95b93e2999e@auristor.com>
Date:   Thu, 4 Nov 2021 08:15:25 -0700
From:   Jeffrey E Altman <jaltman@...istor.com>
To:     "Matthew Wilcox (willy@...radead.org)" <willy@...radead.org>,
        David Howells <dhowells@...hat.com>
Cc:     marc.dionne@...istor.com, linux-afs@...ts.infradead.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jeff Layton <jlayton@...nel.org>
Subject: Re: [PATCH] afs: Fix ENOSPC, EDQUOT and other errors to fail a write
 rather than retrying

On 11/3/2021 8:22 PM, Matthew Wilcox (willy@...radead.org) wrote:
> On Wed, Nov 03, 2021 at 11:43:20PM +0000, David Howells wrote:
>> Currently, at the completion of a storage RPC from writepages, the errors
>> ENOSPC, EDQUOT, ENOKEY, EACCES, EPERM, EKEYREJECTED and EKEYREVOKED cause
>> the pages involved to be redirtied and the write to be retried by the VM at
>> a future time.
>>
>> However, this is probably not the right thing to do, and, instead, the
>> writes should be discarded so that the system doesn't get blocked (though
>> unmounting will discard the uncommitted writes anyway).
> umm.  I'm not sure that throwing away the write is the best answer
> for some of these errors.  Our whole story around error handling in
> filesystems, the page cache and the VFS is pretty sad, but I don't think
> that this is the right approach.
>
> Ideally, we'd hold onto the writes in the page cache until (eg for ENOSPC
> / EDQUOT), the user has deleted some files, then retry the writes.

Hi Matthew,

I agree that it would be desirable to avoid discarding user data but in
practice that is hard to do.  The proposed behavior change is consistent
with other Unix AFS/AuriStorFS cache manager implementations.   There
are many situations which can result in an out of quota or out of space
error where the end user has absolutely no ability to do anything about it.

An EDQUOT error might occur because the AFS volume has reached its
quota.  However, the writer only has insert privilege and cannot
delete.  The user might not even be able to list the contents of the
volume.   

An ENOSPC error might be the result of the backing store for AFS vice
partitions filling due to data being written to other AFS volumes that
the writer has no ability to access or manage.

AFS cache managers frequently implement write-on-close semantics and
will flush dirty content to the fileserver only when the file is closed
or the local cache is out-of-space.   Holding onto dirty data that
cannot be flushed to the server on a multi-user timeshare system can
result on unwanted negative impacts on other users of the system.

Another risk is that if dirty data persists locally that the
EDQUOT/ENOSPC errors will be replaced by EACCES or EPERM errors when the
associated authentication credentials expire.

If a back-off strategy is to be implemented in the future, AFS does
provide RPCs that can be used to query the volume's online status, the
maximum quota in one KiB blocks, the blocks in use, the available blocks
in the partition, and the maximum number of blocks in the partition.  
Querying RXAFS_GetVolumeStatus or RXYFS_GetVolumeStatus can avoid the
overhead of issuing a StoreData operation that is likely to fail.

Jeffrey Altman



Download attachment "smime.p7s" of type "application/pkcs7-signature" (4033 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ