On Clojure

March 23, 2010

Computing with units and dimensions

Filed under: Libraries — khinsen @ 11:28 am

Many computer programs work with data that represents quantities. Examples are numerous: the age of a person, the weight of a parcel, the duration of a video clip, the distance between two cities, etc. Usually quantities are simply represented by numbers, because numbers are very easy to handle in popular programming languages. However, quantities are not numbers: two years is not the same as the number two. A quantity is defined by a magnitude (which is a number) and a unit. The same quantity can be represented by different magnitude-unit pairs. For example, one minute is the same quantity as sixty seconds. The quality being measured by a unit is called its dimension. Time, length, and weight are examples of dimensions.

There are a few good reasons to represent quantities by magnitude-unit pairs rather than by plain numbers:

  • When quantities are represented by numbers, the units become a matter of convention, written down in a comment (if at all) rather than in the program code. This makes mistakes rather likely, with possibly serious consequences: NASA’s Mars Climate Orbiter crashed because of different units being used in different parts of the software that was used to calculate its flight trajectory.
  • With just numbers, it is not even possible to verify that a quantity passed into a function has the right dimension. With an additional unit, such a check is very easy to do.
  • The unit and dimension information provides additional documentation to the human reader, and aids in debugging.

A number of libraries for various programming languages therefore implement units, dimensions, and quantities, with the associated arithmetic and comparison operators and sometimes also mathematical functions. Clojure recently joined the crowd: the units library is available at Clojars.org and the source code is hosted by Google Code. In this post, I describe how the library works and give a few examples.

First, a simple example for illustration. Like any Clojure script, the first thing to do is to set up the namespace with all the stuff we need:

(clojure.core/use 'nstools.ns)
(ns+ unit-demo
  (:clone nstools.generic-math)
  (:from units dimension? in-units-of)
  (:require [units.si :as si]))

This looks rather complicated, so it deserves some explanation. We will want to be able to calculate with quantities, units, and dimensions, in particular do arithmetic (+ – * /) and comparisons ( min max) on quantities. Clojure’s built-in arithmetic and comparison functions work only on numbers, so they are not useful here. In clojure.contrib.generic, there are generic versions of these operations, meaning that they can be defined for any datatype for which they make sense. To achieve this goal, they are implemented as multimethods, which implies some bookkeeping overhead that reduces performance. In fact, it is for performance reasons that Clojure’s standard arithmetic functions are not generic.

Constructing a nice namespace for generic arithmetic using Clojure’s standard namespace management tools is a bit cumbersome: we’d have to use an explicit :refer-clojure clause in ns in order to exclude the standard arithmetic functions, and then have a
lengthy :use clause for adding the generic versions from the various submodules of clojure.contrib.generic. An easier way is to use the nstools library which defines a suitable namespace template that we can simply clone. We then add the dimension-checking predicate dimension? and the conversion function in-units-of from the units library and the shorthand si for referring to the namespace that defines the SI unit system that we will use.

Now we can start doing something useful. The following function calculates the force exerted by a spring of force constant k that has been compressed or extended by a displacement x:

(defn spring
  {:pre [(dimension? (/ si/force si/length) k)]}
  (fn [x]
    {:pre [(si/length? x)]}
    (- (* k x))))

The basic code looks just as if we had written it for use with plain numbers. The only difference are the preconditions that verify that the arguments k and x have the right dimensions: length for x, force constant for k. The test for length is simpler, because for all dimensions that have been assigned a name in the definition of the unit system, there is a direct test predicate, such as si/length?. There is no predefined dimension for “force divided by length”, so we have to use the generic predicate dimension? and construct the dimension arithmetically. The only operations defined on dimensions are multiplication and division, the rest (addition/subtraction, comparison) would not make sense.

Let’s use our function spring:

(def a-spring (spring (/ (* 5 si/N) si/cm)))
(prn (a-spring (si/cm 1/2)))

The first line defines a spring with a force constant of 5 N/cm. You can see in the expression that calculates it that units can be used like quantities in artithmetic. The unit “Newton” behaves just like the quantity “1 Newton”. However, these two values are represented differently internally, for a good reason that I will explain a bit later. The second line evaluates the force exerted by the spring when elongated by 1/2 cm. It shows another way to construct a quantity from unit an magnitude: units can be called as functions, with the magnitude supplied as the argument, returning a quantity.

The last line produces the output

#:force{-5/2 N}

The result thus has the dimension “force”, the magnitude “-5/2″ and the unit “Newton”. The dimension can be shown because it is a named dimension defined in the SI unit system. Otherwise the computer could not have guessed the name of the dimension. Let’s see what happens when we print a force constant:

(prn (/ (* 5 si/N) si/cm))

The output is

#:quantity{5 100.kg.s-2}

No dimension name, no unit name: the magnitude is 5, the unit is 100 kg/s^2, and it is expressed in SI base units plus a prefactor.

Let’s look at some more examples of unit arithmetic in the following REPL protocol:

unit-demo> (+ (si/m 1) (si/km 3))
#:length{3001 m}
unit-demo> (+ (si/km 3) (si/m 1))
#:length{3001/1000 km}
unit-demo> (= (+ (si/m 1) (si/km 3)) (+ (si/km 3) (si/m 1)))

This shows how units are converted in arithmetic: the result has the unit of the first argument. However, exchanging the argument still yields a result that is equal to the original one, as indeed “1 km” and “1000 m” are the same quantity.

Next, some more complicated examples: we calculate the kinetic energy of a car:

unit-demo> (/ (si/km 100) si/h)
#:velocity{100 5/18.m.s-1}
unit-demo> (let [v (/ (si/km 100) si/h)
		 m (si/kg 800)]
		 (* 1/2 m v v))
#:energy{4000000 25/324.m2.kg.s-2}
unit-demo> (let [v (/ (si/km 100) si/h)
		 m (si/kg 800)]
		 (in-units-of si/J (* 1/2 m v v)))
#:energy{25000000/81 J}

The last line shows how to convert a quantity to a different unit. Note that the result is always equal to the input quantity, only the representation changes.

At some point, one inevitable has to communicate with the number-only world, usually for I/O, or for plotting. So how do we convert a quantity to a number? It should be clear that this operation implies the choice of a unit. The simplest solution is to divide the quantity by the desired unit: the result will be dimensionless and thus a plain number:

unit-demo> (/  (a-spring (si/cm 1/2))  si/mN)

Another approach would be to convert to the desired units using in-units-of and then extracting the magnitude using the function magnitude from the units library:

unit-demo> (units/magnitude (in-units-of si/mN (a-spring (si/cm 1/2))))

At this point it should be clear that the units library defines three datatypes: dimensions, units, and quantities. It is less obvious that dimensions and units (and thus indirectly quantities) refer to a unit system. Without a unit system, the computer could not know that the quotient of a length and a time is a velocity, for example. Nor could it know that “Newton” is just a convenient name for “m kg/s^2″. A unit system defines base dimensions and base units. The SI system (SI = Système International) that is today used all over the world in science and engineering, as well as in daily life in most countries, defines seven base dimensions and associated units:

  • length (meter, m)
  • mass (kilogram, kg)
  • time (second, s)
  • electric current (ampere, A)
  • temperature (kelvin, K)
  • luminous intensity (candela, cd)
  • amount of substance (mole, mol)

Neither the choice of these particular dimensions nor even the choice of seven base dimensions is obvious. One could very well use the electric charge instead of the electric current, for example. And one could very well not have the dimension “amount of substance” at all. The choices made for the SI system reflect the state of the art in metrology, taking into account what can and what cannot be measured with high accuracy.

All dimensions other than the base dimensions are expressed as products of powers of the base units. For example, velocity is length^1 time^-1, and volume is length^3. The SI system is constructed to make sure that all powers are integers, but this is not true e.g. for the older cgs system, which has fractional powers for dimensions related to electricity. According to the principles of dimensional analysis, a dimension is in fact nothing else but a name for a collection of powers (seven integers for the SI system). Metrological reality is a bit more complicated because there can be multiple dimensions with the same set of exponents. For example, in the SI system, both frequency (measured in cycles per second) and radioactivity (measured in decays per second) are equivalent to time^-1, because neither “cycle” nor “decay” has its own dimension. The units library takes this into account and makes a distinction between frequency, radioactivity, and 1/time. The first two are not compatible with each other, meaning that you can’t add 1 Bq and 5 Hz. However, either one is compatible with 1/s, so you can add 1 Bq and 5/s. This feature requires that dimensions be represented by a specific data type; otherwise a list of exponents would be sufficient.

Units are handled much like dimensions: each base dimension has a base unit, and each non-base unit is defined as a product of powers of base units, plus a numerical prefactor. Quantities are then made up of a unit and a magnitude, which is typically a number. It is not strictly necessary to make the distinction between units and quantities, as in fact any quantity can be used as a unit. There are libraries around that use a single representation for both. However, there are two advantages to keeping the distinction:

  1. The units library permits magnitudes of quantities to be values of any type that implements generic arithmetic, whereas unit prefactors must be numbers. It it thus possible to use matrices as magnitudes, provided all elements have the same unit. This permits efficient implementations of many algorithms while still profiting from dimension checking and unit conversion.
  2. Without specific unit objects, every quantity would be represented as a prefactor with respect to a product of powers of the base units. The information of what unit the quantity was initially represented in is lost. While this doesn’t matter from the point of view of dimensional analysis, it does matter from a numerical point of view. For example, quantities at the atomic scale would have very small prefactors when expressed in terms of SI base units. With magnitudes expressed as floating-point values, there is thus a risk of underflow in unit arithmetic. It is in general preferable to keep quantities in their original units and apply conversion only when requested or when inevitable (such as in addition of two quantities).

To close this brief description of the design decisions behind the units library, a few words about temperatures. I have decided not to include support for temperature conversion in the initial versions of the library, and I am not sure if I will ever add it. Temperature is special in that the scales we use in daily life (nowadays mostly centigrades and Fahrenheit) have an arbitrarily chosen zero point that does not coincide with the “natural” zero point of temperature, which corresponds to the lowest possible energetic state of a system. Allowing for such units defined with an offset implies enormous complications: a distinction must be made between “differential” and “absolute” units, and arithmetic must be defined carefully to make sure that absolute units can be used only in addition with a differential unit or in subtraction. I don’t think that introducing that amount of complexity is justified, considering that daily-life temperatures are rarely combined in computations with quantities of other dimensions.

March 5, 2010

Pre- and post-conditions: a quest for a nicer syntax

Filed under: Uncategorized — khinsen @ 6:15 pm

When pre- and post-conditions appeared in Clojure (since version 1.1 they are “official”), I though that was a pretty neat feature and that I ought to use it. I added conditions to a couple of functions and was satisfied. But rather soon I noticed that I used conditions very sparingly, only where I expected wrong data to be fed into a function. Considering that bugs also happen where we don’t expect them, and that conditions are great documentation as well as error-checking code, I should use them more. So why don’t I?

Something I didn’t like about pre- and post-conditions is the rather heavy syntax. For short functions, the conditions take up more space than the code itself. Moreover, parsing them visually takes some effort as well, as much as reading the code itself. Wouldn’t it be nice if preconditions could be written somehow right in the argument list, and postconditions at the level of the function definition?

Well, this is Lisp, so if you don’t like some syntax, you just roll your own. I didn’t come up with a better general syntax though, but I think that what I describe below is much nicer and suitable for 90% of pre- and post-conditions used in practice. The main limitation is that each condition can depend on only one argument, or on the return value. For the other cases, there is still Clojure’s general syntax, which is perfectly compatible with my extension. For those who want to play with this themselves, here is the code.

As a first example, here’s a pretty stupid algorithm to calculate integer powers of a number:

(defn (number?) power
  [(number?) x
   (integer?) (pos?) n]
  (apply * (repeat n x)))

The preconditions check that the first argument is a number and that the second one is a positive integer. The postcondition checks that the result is a a number – the utility of this test is a bit dubious, but it serves as an illustration. Note that you can have multiple conditions per argument, and also multiple postconditions. The full form representing the condition is constructed by inserting the argument to be tested in the second position of the supplied list. The above function definition actually expands to

(defn power
  ([x n]
   {:pre [(number? x) (integer? n) (pos? n)], :post [(number? %)]}
   (apply * (repeat n x))))

One precondition in the above example is actually too strict. The argument n needn’t be positive, just non-negative. There is not simple test function for “non-negative” in Clojure, but with the above rule we can write this as:

(defn (number?) power
  [(number?) x
   (integer?) (>= 0) n]
  (apply * (repeat n x)))

Another possibility is to use the -> macro:

(defn (number?) power
  [(number?) x
   (integer?) (-> neg? not) n]
  (apply * (repeat n x)))

Preconditions can be combined with destructuring. Here is a variant of Clojure’s function second that actually verifies that its argument has at least two element:

(defn my-second
  [[f & (seq) r]]
  (first r))

There is however one limitation: I couldn’t find a way to use my new syntax with map destructuring. So for now at least it works only with vector destructuring.

Comments on this syntax are welcome. Do you like it? Can you come up with something better? Or do you think that Clojure’s standard syntax is just fine?

March 1, 2010

Anonymous function literals and syntax-quote don’t work well together

Filed under: Uncategorized — khinsen @ 10:43 am

Clojure has a couple of macro-like features built right into the reader, providing shorthand notation for commonly needed constructs. However, unlike real macros, which are built on a sound conceptual framework that guarantees arbitrary composability (macro calls can contain macros and expand into other macros), the different macro-like features in the reader don’t always play well together.

Consider the following piece of code:

(defmacro foo [x]
  `(map #(identity %) [~x]))

At first sight, nothing looks wrong with this, other than it doesn’t do anything useful. But any use of this macro causes an error message:

(foo [:a :b])
java.lang.Exception: Can't use qualified name as parameter: user/p1__3328

What’s going on here? The error message hints at a problem with a function argument. The only function being defined is here #(identity %), which uses a shorthand notation for function literals expanded by the reader. Let’s see what the macro call expands to:

(macroexpand-1 '(foo [:a :b]))


(clojure.core/map (fn* [user/p1__3328] (clojure.core/identity user/p1__3328)) [[:a :b]])

So here’s the problem: #(identity %) is expanded by the reader into (fn* [p1__3328] (identity p1__3328)). This is a perfectly valid function literal, but the other reader feature used here, syntax-quote, doesn’t know about function literals. It takes the expanded function literal as an arbitrary form and does namespace resolution on all symbols. This leads to a namespace-qualified symbol for a function parameter, which is not legal Clojure syntax.

Moral: use reader features sparingly, ideally one at a time. Except for syntax-quote, they only save you a couple of keystrokes, a convenience that you may end up paying a high price for in terms of debugging time.

The Shocking Blue Green Theme Blog at WordPress.com.


Get every new post delivered to your Inbox.