Using rsync to upload files to Amazon S3 over s3fs? You might be paying double–or even triple–the S3 fees.
I was observing the file upload progress on the transcoder server this morning, curious how it was moving along, and I noticed something: the currently uploading file had an odd name.
My file, CAT5TV-265-Writing-Without-Distractions-With-Free-Software-HD.m4v was being uploaded as .CAT5TV-265-Writing-Without-Distractions-With-Free-Software-HD.m4v.f100q3.
I use rsync to upload the files to the S3 folder over S3FS on Debian, because it offers good bandwidth control. I can restrict how much of our upstream bandwidth is dedicated to the upload and prevent it from slowing down our other services.
Noticing the filename this morning, and understanding the way rsync works, I know the random filename gets renamed the instant the upload is complete.
In a normal disk-to-disk operation, or when rsync’ing over something such as SSH, that’s fine, because a mv this that doesn’t use any resources, and certainly doesn’t cost anything: it’s a simple rename operation. So why did my antennae go up this morning? Because I also know how S3FS works.
A rename operation over S3FS means the file is first downloaded to a file in /tmp, renamed, and then re-uploaded. So what rsync is effectively doing is:
- Uploading the file to S3 with a random filename, with bandwidth restrictions.
- Downloading the file to /tmp with no bandwidth restrictions.
- Renaming the /tmp file.
- Re-uploading the file to S3 with no bandwidth restrictions.
- Deleting the temp files.
Fortunately, this is 2013 and not 2002. The developers of rsync realized at some point that direct uploading may be desired in some cases. I don’t think they had S3FS in mind, but it certainly fits the bill.
The option is –inplace.
Here is what the manpage says about –inplace:
This option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the update data directly to the destination file.
It’s that simple! Adding –inplace to your rsync command will cut your Amazon S3 transfer fees by as much as 2/3 for future rsync transactions!
I’m glad I caught this before the transcoders transferred all 314 episodes of Category5 Technology TV to S3. I just saved us a boatload of cash.