On this page


All about fsync

The trouble with file writes

Take a trivial example of writing to a file in PHP:

function writeFile() {
    $success = false;
    $file = fopen('sample.txt', 'wb');
    $data = "Hello\nthis is a test file!";
    if (fwrite($file, $data) !== false) {
        $success = true;
    }
    fclose($file);
    return $success;
}

You call this function somewhere and it returns true - hooray! Your file's been successfully written to disk, right? Even if you had a power cut the same millisecond that function returned, you're good, the write already happened, yes?

No, sadly not.

In PHP (or in other programming languages exposing a similar interface), when you write to a file there is not actually a guarantee your changes have been written to disk. What really happens is your write goes in to the PHP process buffer and at some point (of PHP's choosing) that buffer gets flushed out to the operating system.

But wait! I hear you say. Doesn't PHP have an fflush function for exactly this purpose? Surely in that example above, if I add in fflush($file) immediately after that fwrite, the buffers will be flushed and the file immediately committed to disk?

No, sadly not.

The operating system in turn, which likes to be efficient with your disk drive, will also save up some writes in its own kernel buffer and physically write them to disk later.

Now "later" may be mere milliseconds, maybe even less - and this is all well and good for the majority of file writes where we don't particularly care if there's some one-off event like a crash or power-cut and we lose a line of a log, or whatever. But there are times when we do care.

If you're familiar with databases, you've probably heard of the ACID acronym - atomicity, consistency, isolation, durability. These are the properties a DBMS must provide in order to guarantee transactional behaviour. The durability part means that once you successfully commit a database transaction, those changes have been stored permanently - even if your system immediately crashes after. In other words, your changes have been physically written to disk.

Now I'm not suggesting it's a good idea to write your own DBMS from scratch in PHP, but there are definitely occasions where you might want to ensure the durability of a file write. Mission-critical system logs, audit logs, maybe something not web-related; PHP remains web-first but it is no longer web-only. With PHP 8 we are starting to see the language features and adoption for more general purpose programming.

Enter fsync

fsync is a system call on POSIX-ish systems which synchronizes a file with storage. In simple terms, it's a way of instructing the operating system to ensure any changes or writes made to a file which are hanging around in the buffer are immediately persisted to disk and have been successfully persisted to disk by the time the call returns, such that the data would be recoverable even in the event of a system crash or power loss. It is the durability in ACID.

In C, fsync() is part of the standard library. In Java, we have io.File.sync(), in Python we have os.fsync(). Until now, PHP remained the only major programming language I've used which did not provide an interface to the fsync system call.

fsync in PHP

PHP 8.1 will introduce the fsync and fdatasync functions for the first time. This is the first significant change (and first language feature) I've added to PHP and it's a very small thing, but 1) it was a great exercise in familiarising myself with PHP's internals and source code and 2) it's nice to give back something, even little, to an open source product I've been using for years.

fsync() is by its very nature a file-system operation, so you'll only be able to call it plain file handles (that is, you cannot implement your own stream-wrapper for fsync, nor will it work on resources which are not regular files or able to be treated as regular files). The function will take a file handle and attempt to commit changes to disk, returning true on success, false on failure, or raising a warning if the resource is not a file.

function writeFile() {
    $success = false;
    $file = fopen('sample.txt', 'wb');
    $data = "Hello\nthis is a test file!";
    if (fwrite($file, $data) !== false) {
        $success = fsync($file); // that's better!
    }
    fclose($file);
    return $success;
}

fdatasync

PHP 8.1 also introduces fdatasync which is (in theory) a slightly faster sync, since fsync() will attempt to fully synchronize both the file's data changes and meta-data about the file (last modified time, etc.), which is technically two disk writes. The idea of fdatasync() is it only synchronizes the data itself.

My advice; don't worry about this too much. In practice, modern Linux file systems do the same thing for fsync and fdatasync and both include updating meta-data. In respect of Windows, there is no native fsync implementation, the PHP function is a wrap on the FlushFileBuffers API which does the same job. PHP's fsync() and fdatasync() are aliases of the same system call on Windows, so it doesn't matter which you use.

Words of warning

Caveat 1: fsync() is not suitable for high-throughput, intensive file writes. If you need to make hundreds or thousands of writes in a second, you shouldn't be trying to fsync every one, your I/O performance will grind to a halt. The only solid way to deal with that kind of situation is to use unbuffered, direct-to-disk writes and this is simply too low-level for PHP.

Caveat 2: If you want to really, really ensure durability of a file on a Linux system, you should also open a handle to the directory containing the file and fsync() that too. Otherwise there's a tiny chance you'll end up in a situation where the file changes have been sync'd successfully, but the directory tree hasn't, meaning your data would be ultimately recoverable but not necessarily and conveniently attached to the file as you'd expect on system restart. This is not necessary on Windows.

Caveat 3: These days, even the disk drives themselves have internal buffers. It's buffers all the way down. Operating systems are smart and they know a lot of disks will lie about having successfully written data to permanent storage, so fsync() implementation on most systems will tell the drive to also flush its own buffers. But some drives - USB flash drives are notorious for this - will just lie to the OS about having completed writes and there's not really anything that can be done about it. So be careful, it is still technically possible in some circumstances to get back true from fsync and find your data hasn't been persisted at that point. Something of an edge case, but it can happen.

Further reading

PHP Watch did a better job of writing up my own changes for PHP 8.1 fsync than I did.

The Linux man page for fsync gives a technical description of the system call.

The RFC on php.net contains more details and a link to the implementation.


Comments

Add a comment

All comments are pre-moderated and will not be published until approval.
Moderation policy: no abuse, no spam, no problem.

You can write in _italics_ or **bold** like this.

Recent posts


Sunday 01 December 2024, 18:37

Re-examining this famous puzzle of probability and explaining why our intuitions aren't correct.

musings

Sunday 17 November 2024, 22:53

Keep your database data secure by selectively encrypting fields using this free bundle.

php

SPONSORED AD

Buy this advertising space. Your product, your logo, your promotional text, your call to action, visible on every page. Space available for 3, 6 or 12 months.

Get in touch

Sunday 27 October 2024, 19:02

Learn how to build an extensible plugin system for a Symfony application

php

Saturday 10 February 2024, 17:18

The difference between failure and success isn't whether you make mistakes, it's whether you learn from them.

musings coding

Monday 22 January 2024, 20:15

Recalling the time I turned down a job offer because the company's interview technique sucked.

musings