Thursday, 2 February 2017

VBox shared folders: making them work in a LAMP stack

The Setup

This is one of those “tutorial” posts that is really written for me to remember how I did something. It’s for a specific set of circumstances, but I won’t be the only person to be in these circumstances and there is generally applicable info too.

My problem was that I wanted to edit a PHP project on my Windows Desktop, but have it served from a LAMP environment similar to the one in which it would run. In the past, I’ve used a VM on a remote server for this. It works fine, but you need a technique to synchronise files on the dev machine with the server and you need the server available. This time, to keep everything local, I decided to use a VM on my dev machine, using VirtualBox. As well as the lower admin overhead of keeping everything on my dev machine, it seemed a major advantage would be using “shared folders”: the LAMP machine could see my local dev folder and serve from it directly. This means I can tweak a setting in the VBox settings panel if I want to work on a different project, which is very easy and I don’t have to worry about the availability of a VMWare instance somewhere else.

I started with an Ubuntu 16.04 LTS server that had OpenSSH and LAMP added as part of the initial install process, through Ubuntu’s handy task packages, installed Guest Additions and set networking to “bridged” so the server would be available across the local network with it’s own routable IP address. I also shared my project folder on the host using a permanent “shared folder” addition to the settings for my LAMP VM. All this was fine.

Then I hit a snag…

The Problem

By default, apache looks at /var/www/html for web pages. I assumed I could just mount the shared folder device over the top of this and everything would work - but it didn’t:

  • I started by adding a traditional fstab entry, but the VBox shared folder kernel module isn’t loaded by the time fstab is parsed.
  • I read a suggestion that vboxsf should be added to /etc/modules, which should then load the module early enough in the boot process for fstab parsing. My mount point wasn’t established.
  • I put noauto in the options for the fstab entry and then added a command to mount it in /etc/rc.local. Nothing was mounted.
  • I Removed the fstab entry altogether and put a complete mount command in /etc/rc.local. Once again, nothing happened.

It’s worth noting that the mount commands I was using worked fine if used from a terminal after logging in; both a complete mount command and one that referenced and activated an entry in fstab. But I couldn’t get them to work at any point in the boot process and there were no particularly useful log messages.

The Fix

Now, there is an option in the “shared folder” creation process in VBox, “Auto-mount”. Unless I had this checked, I was receiving a “Protocol Error” (no further explanation) wherever and whenever I tried to mount the share. So I had this ticked and was going to worry about removing it later. What it does is cause a mount point to be created automatically (/media/sf_<share name>). I wanted control of the mount point so I was trying to avoid this, but with nothing else working, the obvious solution was to simply use this point. I created a symlink:

ln -s /media/sf_hosthtml /var/www/html

and this solved the mount problem immediately. I cursed myself for not taking the obvious, expedient route much sooner. I have since read that using the auto-generated mount point is the “preferred” option for mounting shared folders and if this is the case, I can see why.

The last problem was simply permissions. The original /var/www/html/ folder’s contents are world readable, but the VBox replcemant is not:

-rw-r--r-- 1 root root 12K Feb 1 12:55 index.html
vs.
-rwxrwx--- 1 root vboxsf 1.4K Feb 1 17:00 index.php

As there is no obvious way to alter the permissions that windows files are given by VBox’s shared folders module, I solved this by adding the apache user, www-data to the vboxsf group:

usermod -a -G vboxsf www-data

This may not be ideal because unlike the original permissions on the default index.html file, group permissions now allow apache to write to these files, but it’s a closed test system and it’s good enough for me.

Friday, 16 September 2016

Bash on Ubuntu on Windows

I ran some Linux commands today. I do that a lot, but it’s the first time I’ve run most of them from a command prompt I opened in windows, and had them act on that Windows machine.

This isn’t unheard of, of course. The Cygwin project has provided Win32 builds of Linux tools for years and virtual machines provide another route to these things but they have their problems. Cygwin requires that binaries are recompiled for Windows and while they provide a dll containing much of the standard POSIX API, there’s a substantial amount of friction in getting anything new to work. Sure, that’s been done for you in a lot of cases, but even then, the results aren’t always perfect.

VMs have different problems: you have to allocate memory and disk space in large chunks and then put up with the resource overhead that running a whole OS within an OS creates. After that, you have to access them – typically you’re looking at a new IP address for the VM and ssh, VNC or some kind of proprietary access method specific to the hosting software is necessary to overcome the rigid separation between Windows and the VM. Operating on or moving data from one to the other is hard work.

This is new.

With the first anniversary update (build 1607) of Windows 10 (x64 only), comes Ubuntu, which is not something I’d have put much money on. More specifically, the user space root filesystem (including all the binaries) from Ubuntu 14.04 LTS “Trusty” can be installed inside Windows. It is quite simply a command-line only (no X) Ubuntu installation, bit for bit, with just the Linux kernel and the boot code taken out.

To make it useful, MS (with Canonical’s help) have created a new Windows subsystem that allows the NT kernel to service all the calls that would normally be made to the Linux kernel. And it’s that simple. All the Ubuntu software can behave as it normally does, making the syscalls it normally makes and getting the responses it expects to get, so nothing needs to be recompiled even though there is no Linux kernel running.

You get apt, so you can install stuff from the Ubuntu software repository, anything you like. Software (including servers) get access to Windows’ own ports and the Windows file system. You can write C in Visual Studio Code and compile it with the Ubuntu version of gcc. You get bash, but if that’s not your thing, apt-get install zsh or something more esoteric. In short, this is the perfect way to run grep and sed on your windows files.

Now, it’s not all rosy. It is currently a beta release and not everything works as it should yet. There are apparently about 350 syscalls in the API being emulated and not all of them are used, so the devs have implemented a subset based on what they believe to be useful and what has been possible, given the inevitable restrictions. There is no hardware access and no graphical implementation beyond what you get in a normal terminal. Also, the talk back in March was that 16.04 was just around the corner and, now in September, it hasn’t arrived yet.

Furthermore, everything Linuxy is done in the bash shell and you can’t run Win32 exes from there, nor can you run Linux ELFs from a normal Windows command prompt, unless you invoke the bash shell as a wrapper (and if you can pipe output from one to the other, I’ve yet to work out how). For me though, there’s so much you can do that the restrictions don’t feel like restrictions and this is still a pre-release version.

Apart from the technical details, the other aspect of this that intrigued me was the political side. MS and Canonical have never been obvious bedfellows and although the stance MS has taken on Linux and open source in general has significantly mellowed under Satya Nadella (cf Steve Ballmer making SMB changes to deliberately stop Samba working), collaboration like this is still a surprising step. In essence, The Windows Way is being side-lined for a rival. I came to the conclusion that this does make sense for both parties, but it’s really Canonical who are riding the tiger.

From Microsoft’s perspective, they’ve come to realise that Linux isn’t going to go away. Whether antagonism towards it helps or hinders their own cause is an interesting and probably nuanced question but given that people are using it, devs especially, it makes some sense for them to offer tools to those devs inside the Windows marquee that they used to have to go elsewhere for – people using Linux (and at risk of moving to Linux exclusively) now have less of a reason to do so. Is there a downside? Well, yes: devs who haven’t been exposed to Linux previously may feel more comfortable with it when server OS options are being considered. Big deal? Probably not massive.

Canonical’s position is more interesting and it may take a while to see how it’s going to play out. Their stuff is freely available, of course, and MS could have just come and taken it, but they’ve been actively engaged in this project (even showing up on panels at BUILD) and as leaders in a world where MS is frequently seen as the enemy, “selling out” is a charge that might come their way.

What’s the upside? Well, Windows devs have often dabbled in Linux for various reasons – usually server related, but also for tools that work really well at solving problems that have never been fully solved on Windows (or where the tools exist but are less at home). Those devs are now going to be doing that with Ubuntu rather than Fedora or SUSE. So what are they going to pick, familiar as they will be with Ubuntu, when they need a server for something (or even choose to use a Linux desktop)? It’s not rocket science to see this as a really, really effective advertising campaign. But what are they actually helping Microsoft do?

In the past, MS was known for its strategy of taking things like “standards” and building on them, bending them to their own way of doing things and forcing the de facto standard that resulted onto everyone else, regardless of whether it benefited the rest of the world or not. They can’t alter the Ubuntu code directly (although they could start offering pull requests for modifications), especially since Ubuntu aren’t themselves responsible for most of the most commonly used utilities.

What they could do, ultimately, is threaten a different freedom: the freedom from Windows in the server market. This "freedom" is forced (welcome or not) on projects who choose (or need) Linux tools. Soon, perhaps, there will be a server OS that can run software from both camps and at that point, it’s Linux that is being side-lined as Windows is suddenly there, wrapped around it, begging to be given some jobs to do.

So the balancing act is this: Canonical are stealing a march on their competitors and more selflessly, pushing Linux into the Windows world in an astoundingly direct way, but are they also greasing the way for Windows in spaces which would have been solely Linux enclaves previously?

Tuesday, 23 April 2013

It sounds great… but what does it actually do?

I read an article this morning on TechCrunch by the co-founder of a mobile app company (they make “Bump”, which shares contacts or other data through phones’ NFC chips). He was talking about the difference in understanding of technology between the creators of that technology verses their target market. It’s not a bad article, although the points are reasonably well rehearsed. But I think he’s misunderstood the problem slightly.

According to David Lieb, the article's author, for the mass market, things should be very, very simple, at the expense of features. Bump’s creators apparently discovered this idea and helpfully named it for us, “cognitive simplicity”, although I doubt the notion will be a surprise to many. The other side of the coin is “cognitive overhead”, a more widely used term for the brain power necessary for the uninitiated to understand a point or operate a tool. With delightful irony, a number of the comments below the article (as well as complaining about the rehashing of well-known ideas) pointed out the “cognitive overhead” of giving jargon names to straightforward concepts.

The article implies that technology producers often simply do not understand that non-techy users could struggle to comprehend how something they’ve designed operates. This is almost certainly correct in some cases: it’s easy to overlook the learning curve of an item you’ve been intimately familiar with since its inception. By and large though, I think it’s a problem of scale rather than intent – it’s actually quite hard to make something simple without losing the essence of the product or what differentiates it from its competition. And this is where people fall down – it’s a much bigger job than people often think and it’s compounded by the fact that you can stop at any point and still have a perfectly functional product.

And one of the difficulties is going too far. Albert Einstein said that things should be made as simple as possible, but no simpler. Several article commentators who use Bump complained about vanishing features in the latest version, presumably removed to increase simplicity. It’s obviously going to annoy a certain proportion of your users if you take functions away and perhaps the trade-off here is the right one, but it makes it clear that it is a balancing act.

In my opinion, a neglected counterpart to having an easily understandable product is having an easily describable product, or at least making an effort to describe it. Distressingly common is the inability of software companies (and to some extent providers of other forms of technology) to properly explain something they want to sell to me (or sometimes give to me, for which I can’t be quite so belligerent). This goes all the way from small open source projects to massive product suites from big companies. In the latter case, it’s usually because they have a marketing department who feel it’s their duty to talk up the product without worrying too much about whether it will actually fit the needs of potential customers. Examples can be found in the whole gamut between.

I don’t want to know how I’ll feel using your gadget. I don’t care (initially, at least) whether it’s been designed with solid engineering principles or by throwing a bag of C++ keywords in the air and seeing how they land. I especially don’t want to be greeted with a list of the minor functionality tweeks between versions 2.4.2.17 and 2.4.2.18, which seems particularly common in open source product pages. I want to know what it does. I think that should be more obvious than it would appear to be.

So I’d like to make a plea for simplicity myself, in the sales pitch: if the first page an interested visitor will see on your product’s website or information brochure does not contain a brief, clear description of what the program / gadget / vegetable can be used for (and I don’t mean, “Use the Big Bright Green Pleasure Machine and it will make your TCO lower / your life better / your partner sexier”, unless one of these purposes is its sole function) then your marketing is rubbish.

Wednesday, 9 May 2012

Excel: Conditional Formatting of Formulae


Yesterday, I was working with an Excel sheet and wanted to calculate a table of values, but be able to override some of the values. To make it obvious which values were calculated and which typed in, I decided to make the calculated values grey and leave the overridden values as the default black text.

Excel is massively flexible, so I didn't think I'd have much trouble – something in the Conditional Formatting arena would do what I needed. To my surprise, although I could do it, the only technique I found (on the j-walk website) relies on an obscure part of Excel that has almost been lost to memory: the XLM macro language, which was superseded by VBA in 1993.

I'm not entirely sure which version of Excel is in the picture on that page (97?), but it's old enough so that some of the technique has changed. For Excel 2007 (and probably later versions):
  1. On the "Formulas" ribbon, click "Define Name"
  2. Name your name, "CellHasFormula"
  3. In the "Refers to:" box, type, "=GET.CELL(48,INDIRECT("rc",FALSE))"
  4. Click "OK"
  5. Select the cells for which you want the conditional formatting rule to apply
  6. On the "Home" ribbon, click "Conditional Formatting" and select "New Rule…" from the menu.
  7. Select rule type "Use a formula…"
  8. In the rule description formula box, type, "=CellHasFormula"
  9. Change your formatting to the desired style using the "Format…" button
  10. Press "OK"
In all cases, inverted commas are not included in the values to be used. Be careful of typos: if you make a mistake you probably won't get any error messages (it just won't work).

For the curious, there is a description of the mechanics on the source site. The important point is that "48" is a magic number which instructs the GET.CELL() function to identify cells which contain formulae. Unfortunately, "=GET.CELL(…)" is not a valid argument to the Conditional Formatting rules engine but is to the Define Name engine. It's worth noting that Conditional Formatting rules also don't allow searching for '=' at the start of a cell (anywhere but the start is fine), which would have made this a lot easier.

A complication with using Excel 2007 upwards is that the standard file type (with the .xlsx extension) does not support macros because of potential security issues with passing around files in which they're used. You'll need to use the macro-enabled .xslm format instead. Template files in the .xltm format are fine. When reloading, you may also be told that macros have been disabled. If so, click "Options" on the message to remedy the situation and hit F9 to refresh the formatting once you've done so.

Does anyone know of a better way? Please let me know!

Friday, 9 March 2012

Squeezebox: Longer Timeout for Alarms

At home, we use Logitech Squeezeboxes for all our listening pleasures, including our bedroom where we use the alarm function to wake us up in the morning. The alarm has never been outstanding (a few years ago, a separate plugin was necessary to get reasonable functionality) and although it’s improved in fits and starts over the years, it’s still not a paragon of outstanding design.

For us, the worst problem is that the length of time the squeezebox plays for after the alarm has been triggered is restricted to 90 minutes or less (you can have it play indefinitely, but if you want it to switch itself off at all, 90 minutes is the cut-off). As the Today programme is on for two hours after we wish to be woken up, two hours is the time I want the alarm to last for.

It turns out the restriction is purely down to the user interface and not a limitation of the system itself. I seem to recall posting a feature request for the slider to become logarithmic, which would allow fine control at the 5 / 10 minute end of the scale and longer periods at the other, but it’s never been implemented. I can’t really complain: it’s open source so I should have done it myself. One reason I haven’t is that it’s relatively easy to get around just by editing the relevant prefs file.

On my Ubuntu system, this is /var/lib/squeezeboxserver/prefs/server.prefs; Windows users will probably find it in C:\Documents and Settings\All Users\Application Data\Squeezebox\prefs\server.prefs (according to Logitech's documentation, although newer Windows versions may have a slightly different path). It seems wise to stop the server while you make your edits: I found that changes can occasionally be overwritten.

The section for each player is headed by its MAC address (look in the "Player" tab of the settings pages of the web interface: it's listed in the "Basic Settings" section). Incidentally, the all-zeros MAC address at the start of the file is used as a template when new players are added. Skip through all the _ts stuff and you’ll find several entries starting “alarm”. As you might guess, alarmTimeoutSeconds is the value of interest and this needs to be set accordingly. There are 3600 seconds in an hour, so I have 7200 set. Save, restart the server and you’re done.

A couple of notes. Firstly, this is obviously only applicable to those running Logitech Media Server* at home: if your alarm is set using Logitech’s online system (MySqueezebox.com*) you’re out of luck. I have no idea how the online system works and it may not be subject to the same restrictions anyway. Secondly, be careful about changing the settings using the web interface afterwards – the UI pages will obviously overwrite what’s in the prefs file and you may need to reset the timeout value if you change anything else on the alarm page (other settings changes are fine). As a result, if you edit your alarm times frequently, you may find this process is not worth the trouble.

You can set the alarm time here too (“alarms: <number>: _time”), which makes things easier if you don’t want to disturb the timeout you’ve set, but you will need to calculate the time you’re after in seconds form midnight if you do.

* Correct this week, but nomenclature changes frequently.

Friday, 24 February 2012

Apache mod_rewrite & CodeIgniter


This article isn’t really about CodeIgniter. I’m getting to grips with that at the moment, so I might write some more about it in the future. It is about Apache’s mod_rewrite module and trying to get it to work in a way that’s useful on a dev server for the way CodeIgniter (and other PHP) projects are set out.

What I wanted was to have a single server (i.e. one virtual host) with space for several different projects, or branches of a project. In my opinion, the easiest way to access each project is just to use http://server/project/ in the browser (there are other ways – notably virtual hosts – but they usually require configuration for each new project and / or each new dev machine). With simple websites, it’s fine to put each project in a sub-folder and access them as suggested. However, that does ignores a recommendation for CI projects and one that I think should be followed on any web project and that is to move code that does not need to be publicly accessible outside of the browsable section of your file system (in this case, CI’s “system” and “application” folders should be outside “webroot”, or whatever you want to call it).

My goal was to have the dev server set up so projects could be moved between it and a production server without modification and to have each project wholly contained in its own folder, which means that each project needs its own “webroot” and its own space outside “webroot”. I therefore want every request to //server/project/index.php to be rewritten to //server/project/webroot/index.php (and of course similar for other files in other folders below webroot): in essence, “webroot” needs to be injected after every project name. This means that files and folders other than "webroot" in the project folder become inaccessible to the browser, which isn’t just a matter of convenience for the developer, it means that no resources can be accidentally accessed outside the correct area of the web server’s file system and all relative links (stylesheets, images, etc.) must be properly located.

The first thing I learned is to put the rules directly in the <virtual host> section and not in a <directory> tag wherever possible. There are two reasons for this. Firstly, it’s more efficient – Apache deals with rewriting much faster if it’s not done on a per-folder basis which is because of (at least in part) the second reason, which is that <directory> entries (and .htaccess files in particular directories, which are equivalent) can be parsed multiple times as the request is processed. This can cause major headaches for the unwary because there’s nothing to stop Apache deciding it needs to run through the rules again (in fact, it always seems to do so if the URL has been rewritten) and rather than starting with the original URL, you get the modified one. This means that you can get into an infinite loop if you, say, simply add something on to the end of whatever URL comes in.

The difference between behaviour for rules located in different sections of the config file is not limited to multiple passes, unfortunately. The other thing that changes is the content of some of the variables that you can make use of in the rules. For this reason it is important to check (and potentially modify) any rules you see suggested unless you’re sure that the rules were designed to go in the same place that you want to put them.

I ended up with the following, the second part of which adds index.php after project names (if not present), whilst retaining the rest of the URL as parameters. It’s based on examples in the CodeIgniter documentation:
# Inject 'webroot/' if request starts with a valid folder
# and '/webroot' is not already 2nd folder
RewriteCond %{DOCUMENT_ROOT}$1 -d
RewriteCond $2 !/webroot
RewriteRule ^(/[^/]+)(/?[^/]*)(.*) $1/webroot$2$3

# Rewrite any */webroot/* file request to index.php
# Don't rewrite if file exists OR it's already
# index.php (even if 404)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !/index\.php$
RewriteRule ^(/[^/]+/webroot)/?(.*)$ $1/index.php/$2
I've used %{REQUEST_FILENAME} in conditions for the second rule. Although there are several other variables with similar content, be careful which you choose to use in situations like that above: not only do the values of some of them change depending on the location of the rules within the Apache config files, but I found that some of them had their contents rewritten by earlier rules and some did not (and I found no reference to this in the mod_rewrite documentation).

Tuesday, 10 January 2012

Linux Fileserver and ClamFS

I recently needed to provide a file server for a client that would work with Windows and OS X clients. For reasons of cost and maintenance we decided to use Ubuntu LTS Server. We also wanted anti-virus scanning as customer files are introduced to this server regularly. I decided to use the popular, open source ClamAV engine, with ClamFS providing the on-access scanning. I want to talk briefly about ClamFS in general, because there isn't much comment on it that I can find and then about a specific problem I had, because the solution is not necessarily obvious and uses an interesting feature of samba.

ClamFS seems to be most straightforward way to provide on-access scanning with ClamAV. It's a FUSE based daemon that mirrors one part of the file system to a mount point elsewhere, providing on-access protection for reads and writes to the mirrored version. I discovered the following about it:

  1. The version I installed from the Ubuntu repository doesn't include an init.d script – adding a line to rc.local seems to be the preferred method of boot time initiation. You can, of course, write your own init.d script
  2. The config file is written in XML, rather than the more readable and more easily editable (certainly on a GUIless server) familiar format that pretty much every other Unix-based config file uses. You need to include the config filename when starting ClamFS
  3. There is apparently no way to stop the process other than using kill and then manually umounting the FUSE mount associated with it
  4. Lack of permissions caused a bit of difficulty – the ClamAV user might need some additional permissions before your users can read and write protected files
  5. There is little documentation; a tutorial taking new users through the steps of installation and configuration would make its use clearer
  6. Once set up, it seems to work fine: I've had no problems with it.

My configuration is as follows: Truecrypt volumes (which are normal files, stored at a point we'll call location A) are mounted at another point in the filesystem (location B) and ClamFS mounts a copy of B to a third point (location C). Location C is then used for the samba share path.

I wondered if having ClamFS start at boot time and mounting a copy of B elsewhere would prevent TC (which doesn't start at boot time) mounting a volume to B later on, but it turns out mounting volumes "underneath" an existing ClamFS mount works fine.

I had another problem though. Because I have more than one share and more than one encrypted volume, I configured ClamFS to protect the directory above the one in which all the TC drives were mounted. Because of this (or maybe because of some other aspect of the redirection), the free space reported by samba was not that of the individual drives mounted within the ClamFS protected directory, but the space on the drive that contained those mount points (or the point which the ClamFS was mounting to, I'm not sure which as they are on the same partition).

This can be more than an annoyance because Windows systems from Vista onwards actually check this free space before attempting to write a file. If there isn't room, you can't write. In my case, reported size was on a partition that was almost full of TC volumes, so the reported free space (and therefore the maximum file size that could be written by Windows 7 clients) was severely curtailed.

There are two possible ways round this. The most obvious is to only allow ClamFS to mount to and from points inside any TC volumes you want to share. This will cause you headaches if either you have many shares and only want to have ClamFS configured to protect one directory or ClamFS needs to be started before TC mounts its volumes (common, because manual intervention is usually needed on TC mounts for security reasons).

The second solution is to use a feature of samba which allows you to override the internal free space code with a method of your design. The smb.conf man page explains the details – essentially you need to provide a command (writing a script seems to be the most common solution) that will return two numbers. These give the total number of 1K blocks in the filesystem and the number that are free, respectively. The man page makes a suggestion which I tailored slightly:

#!/bin/sh
df -P $1 | tail -1 | awk '{print $2,$4}'

The "-P" switch (added to the df command) forces the results for each drive onto a single line. If you don't do this and the path reported for the partition is longer than 20 characters, a line break is inserted and the positional parameters to awk will be incorrect.

You then need to make sure the definition in smb.conf for each affected share contains the following:

[Sharename]
   …
   path = /path/to/share  # loc C
   dfree command = /path/to/script.sh /path/to/TC/mount  # loc B

A quick side note: samba calls the script with the location it is trying to ascertain the size of as a first parameter. We've included a first parameter here, which simply pushes the samba-appended one into second position (which is then ignored). I have read that samba may call the script with the parameter "/", having chrooted to the share point before executing the script. I haven't investigated exactly what is happening in my test or production installations, but both work with the procedure I have outlined and this would not be the case if any chrooting were going on. I can only conclude that this is not the behaviour of current versions of samba (I'm using 3.4.7, courtesy of Ubuntu 10.04 LTS) or something else about my environments is altering that behaviour. I'd be interested to hear about different experiences.