Tuesday, 10 January 2012

Linux Fileserver and ClamFS

I recently needed to provide a file server for a client that would work with Windows and OS X clients. For reasons of cost and maintenance we decided to use Ubuntu LTS Server. We also wanted anti-virus scanning as customer files are introduced to this server regularly. I decided to use the popular, open source ClamAV engine, with ClamFS providing the on-access scanning. I want to talk briefly about ClamFS in general, because there isn't much comment on it that I can find and then about a specific problem I had, because the solution is not necessarily obvious and uses an interesting feature of samba.

ClamFS seems to be most straightforward way to provide on-access scanning with ClamAV. It's a FUSE based daemon that mirrors one part of the file system to a mount point elsewhere, providing on-access protection for reads and writes to the mirrored version. I discovered the following about it:

  1. The version I installed from the Ubuntu repository doesn't include an init.d script – adding a line to rc.local seems to be the preferred method of boot time initiation. You can, of course, write your own init.d script
  2. The config file is written in XML, rather than the more readable and more easily editable (certainly on a GUIless server) familiar format that pretty much every other Unix-based config file uses. You need to include the config filename when starting ClamFS
  3. There is apparently no way to stop the process other than using kill and then manually umounting the FUSE mount associated with it
  4. Lack of permissions caused a bit of difficulty – the ClamAV user might need some additional permissions before your users can read and write protected files
  5. There is little documentation; a tutorial taking new users through the steps of installation and configuration would make its use clearer
  6. Once set up, it seems to work fine: I've had no problems with it.

My configuration is as follows: Truecrypt volumes (which are normal files, stored at a point we'll call location A) are mounted at another point in the filesystem (location B) and ClamFS mounts a copy of B to a third point (location C). Location C is then used for the samba share path.

I wondered if having ClamFS start at boot time and mounting a copy of B elsewhere would prevent TC (which doesn't start at boot time) mounting a volume to B later on, but it turns out mounting volumes "underneath" an existing ClamFS mount works fine.

I had another problem though. Because I have more than one share and more than one encrypted volume, I configured ClamFS to protect the directory above the one in which all the TC drives were mounted. Because of this (or maybe because of some other aspect of the redirection), the free space reported by samba was not that of the individual drives mounted within the ClamFS protected directory, but the space on the drive that contained those mount points (or the point which the ClamFS was mounting to, I'm not sure which as they are on the same partition).

This can be more than an annoyance because Windows systems from Vista onwards actually check this free space before attempting to write a file. If there isn't room, you can't write. In my case, reported size was on a partition that was almost full of TC volumes, so the reported free space (and therefore the maximum file size that could be written by Windows 7 clients) was severely curtailed.

There are two possible ways round this. The most obvious is to only allow ClamFS to mount to and from points inside any TC volumes you want to share. This will cause you headaches if either you have many shares and only want to have ClamFS configured to protect one directory or ClamFS needs to be started before TC mounts its volumes (common, because manual intervention is usually needed on TC mounts for security reasons).

The second solution is to use a feature of samba which allows you to override the internal free space code with a method of your design. The smb.conf man page explains the details – essentially you need to provide a command (writing a script seems to be the most common solution) that will return two numbers. These give the total number of 1K blocks in the filesystem and the number that are free, respectively. The man page makes a suggestion which I tailored slightly:

#!/bin/sh
df -P $1 | tail -1 | awk '{print $2,$4}'

The "-P" switch (added to the df command) forces the results for each drive onto a single line. If you don't do this and the path reported for the partition is longer than 20 characters, a line break is inserted and the positional parameters to awk will be incorrect.

You then need to make sure the definition in smb.conf for each affected share contains the following:

[Sharename]
   …
   path = /path/to/share  # loc C
   dfree command = /path/to/script.sh /path/to/TC/mount  # loc B

A quick side note: samba calls the script with the location it is trying to ascertain the size of as a first parameter. We've included a first parameter here, which simply pushes the samba-appended one into second position (which is then ignored). I have read that samba may call the script with the parameter "/", having chrooted to the share point before executing the script. I haven't investigated exactly what is happening in my test or production installations, but both work with the procedure I have outlined and this would not be the case if any chrooting were going on. I can only conclude that this is not the behaviour of current versions of samba (I'm using 3.4.7, courtesy of Ubuntu 10.04 LTS) or something else about my environments is altering that behaviour. I'd be interested to hear about different experiences.