The values are crazy high by default (on modern hardware anyway): 10% of memory for dirty_background_bytes and 20% for dirty_bytes. I wonder why no distro touches these.
To me this seems great and even humble. As RAM is dirt-cheap this can significantly improve performance (especially when external or a remotes drives are involved and not only NVMe SSDs), also prolong SSDs life (which means saving not just money but also the hassle of replacing them, money as well though - when it's a Mac and you can't just replace the SSD).
I wish I could configure Windows the same way: whenever it can use RAM to avoid an extra disk write/read - it should.
In the case you describe it may indeed decrease the performance (or may not, I'm not a disk I/O or caching expert and know the things can be weird) but still may increase it in some scenarios. Copying files from one physical disk drive to another is not the only kind of operation in which RAM cache of disk I/O is involved.
> Because people complain their system is "slow" if it blocks on disk I/O.
Yeah, I/O blocks drive me mad every time. They are more noticeable on Windows though. Perhaps that's because Windows doesn't RAM-cache enough and I do a lot of USB I/O (USB NIC, USB drives).
> Another set of people also complain Linux takes too long to safely unplug USB drives.
If only it had an APIs to see how much RAM of a specific device is RAM-cached rigth now and visualize the progress of flushing that cache... Unvisualized long I/O (incl. caching) operations, let alone those freezing the UI, indeed feel bad and are a UX bug.
One of the key reasons I prefer Linux over Windows is Linux is much more rare to freeze, no matter the workload.
I've also thought it would be nice if Linux's dirty page handling was more granular. But at the same time, whenever dirty pages are a concern it's usually one large file or one NFS mount or one USB device. The system otherwise doesn't have a great deal of dirty pages to bother reporting on. Also programs have access to selective file flushing with fsync() so there is at least that.
This isn't just about NFS timeouts. Try playing a movie from a rotational disk while simultaneously doing high-volume writes. You will get frequent pauses in your video because the write buffer size is so large that a single writeback will cause the video buffer to drain empty.
On my desktop with 32GB ram, I can even get audio to skip when ripping DVD's to disk. That's because practically the entire movie fits into ram before Linux decides to start the writeback process, and that writeback process will hog the disk for almost a minute. Or it used to, until I reduced the buffer size by a full order of magnitude.
This is just another sad example of buffer bloat: the inability to tune data buffers to the capacity of the underlying stream.
That's another thing I can't understand: Why does NFS timeout when the data transfer is still on? Shouldn't it timeout only when the server is no longer ACK-ing packets?
> Shouldn't it timeout only when the server is no longer ACK-ing packets?
That's exactly what happens. The server ACKs data until it fills its write buffer, and then stalls unresponsive until the entire buffer is flushed to disk. If it takes longer to flush the buffer to disk than the client's timeout, it gives up.
I have personally watched this happen via wireshark where the server doesn't ACK for more than 10 minutes.
That's not it. I only had this problem on a fast-ethernet connection (because I had to share the cable for two connections). The server could write ~ 50 MB/s, but it still timed out on the 10MB/s upload.
It's possible you were seeing another problem, but this issue is more likely to appear with a faster network connection, because the network transfer happens faster than the disk writes.
You can confirm by watching /proc/meminfo and watching the Dirty and Writeback numbers.
Changing up the vm.dirty* settings can help as described here:
Holy shit i think you and the above comment, along with this thread, may have finally given me the answer to one of the few problems i was never able to solve.
About 4-5 years ago, i was working on a project, and part of that was copying big amounts of data to a system via nfs. At 30 minutes exactly, nfs would croak, transfer fails.
I think this buffer fill and empty flow was fucking killing it. Its a shame i dont work there anymore, id definitely wanna try tweaking these settings and see if i could solve it
Yeah that does sound like the symptoms of the problem I discovered. If you ever witness it again, the trick is watching /proc/meminfo for the Dirty and Writeback numbers.
Is there a max yet or per disk limits? IIRC those would start writeback. I always thought wiping a slow USB disk shouldn't consume all available RAM. But it used to. Maybe it still does.
https://docs.kernel.org/admin-guide/sysctl/vm.html#dirty-bac...
The values are crazy high by default (on modern hardware anyway): 10% of memory for dirty_background_bytes and 20% for dirty_bytes. I wonder why no distro touches these.