Sunday 18 November 2012

Static typed, non-static internationalisation with C#

Microsoft Visual Studio provides reasonable support for internationalising an application. Add a resource file (resx) containing a string table and attach a custom tool called ResXCodeFileGenerator (which the IDE does by default). For a C# project the ResXCodeFileGenerator produces C# code that defines a class providing static access to the strings:
  /// <summary>
  ///   Looks up a localized string similar to Invalid Username.
  /// </summary>
  internal static string ErrorInvalidUsername {
    get {
      return ResourceManager.GetString("ErrorInvalidUsername",
          resourceCulture);
    }
  }
This provides an easy way to reference the strings throughout your application such that if a string is removed from the string table then you'll get a compile time error rather than a problem at run-time. If you want your strings accessible from another assembly you can use PublicResXCodeFileGenerator instead. The class generated by ResXCodeFileGenerator also contains a static Culture property that allows the application to set a CultureInfo instance that will be used for resolving the strings.

There are a few problems with this approach.

Non-static access

The static approach employed by (Public)ResXCodeFileGenerator  is not appropriate for a multi-threaded application that needs to switch between cultures. A common case where this arises is a web-application. Each request is processed in a separate thread, potentially requiring different cultures for each request. If one thread needs to resolve a string in one culture while another thread needs to resolve a string in another culture then serialisation is required to avoid race-conditions. That's just too much complexity. A better solution is to use a non-static class with each thread holding an instance of the class, and each instance holding its own resourceCulture field. For a recent project I implemented my own custom tool to achieve this. It started out modelling the ResXCodeFileGenerator directly, just without all the static keywords. But I soon discovered there was much more I could do with this custom tool.

Format fields

In many cases the string contained in a resource file is a template that has place-holders replaced at run-time. Typically the string.Format method is used to achieve this:
  string.Format("The {0} is invalid: {1}", "username",
      "Only alphanumeric characters are permitted");
Placing the templated string in a resx file separates the definition of the string from the actual use. The compiler will not be able to detect that the number of arguments supplied in the call to string.Format matches the number of template place-holders. I figured that since I was using a Custom Tool to generate code for accessing the strings it would be easy enough to turn those accessor properties into methods with arguments matching the template place-holders. Even better, those arguments could be statically typed. To make this work I used the comment column of the resource string table to define the expected types. For example,
the template "Task failed at {0}% with error {1}" would have the following in the string comment:
  $params(int percent, string error)
The Custom Tool then generates a method as follows:
  public string TaskFailed(int percent, string error) { ... }
The rule I employed was that if a string contains place-holders then it must have a corresponding $params(...) comment with the same number of arguments, otherwise a warning is issued and no method is generated. Similarly if a string has a $params(...) comment that does not match the number of place-holders in the string then warning and no method. No comment is required for strings that contain no place-holders. Also, to recover the original ResXFileCodeGenerator behaviour a string containing place-holders can have a comment of $params() to have a simple accessor created.

Just for fun I extended the comment syntax to optionally include documentation comments for the arguments and a summary comment for the method (ignore the string wrapping):
  $params(int percent /* The task completion percent */, 
           string error /* Error message */) 
           /// Task percent failure
For strings without any place-holders a documentation summary comment can be used without $params().

Returning objects instead of strings

For a web-application, localising error messages presents a challenge for logging. The localised error message needs to be generated and presented to the user, but it is typically desirable to record a non-localised message in the log file. I wanted to avoid the overhead of having to maintain two sets of error messages and having additional code to select both the localised and non-localised error message.

The approach I settled on was for the string accessor methods and properties generated by my Custom Tool to return instances of a Result class rather than the string itself:
  public class Result
  {          
    private Strings _parent;            
    private string _name;            
    private object[] _params;
            
    internal Result(Strings parent, string name,
        params object[] parms)
    {
      _parent = parent;
      _name = name;
      _params = parms;
    }
            
    public static implicit operator String(Result result)
    {
      return result.Value(result._parent.Culture);
    }
            
    public string Value(System.Globalization.CultureInfo culture)
    {
      string value;
      value = _parent.ResourceManager.GetString(_name, culture);
      if ((_params.Length > 0))
        value = string.Format(value, _params);
      return value;
    }
            
    public string Name
    {
      get { return _name; }
    }
  }
The string property and method accessors each just construct and return a Result object:
  public Result ErrorInvalidUsername(string username)
  {
    return new Result(this, "ErrorInvalidUsername", username);
  }
The Result object records the string resource name and any place-holder parameters supplied. The string resource lookup is deferred until actually required and can be performed against an explicit culture using the Value method. An implicit conversion to string is included so that the Result object can be treated as a string and resolved using the default culture set for the parent resource class instance.

A nice side-effect of this is that the Result class can be used to enforce the use of string resources by writing functions to accept the Result class as parameter rather than a string. In particular, the function used for reporting errors (or perhaps the constructor of an exception object) accepts a Result class so that it is possible to use the implicit string resolution to report the error to the user in the appropriate culture, as well as resolving the string to a common culture for logging.

Sunday 4 November 2012

Pointers to member functions in C#

I found myself wanting to be able to create a table of pointers to member functions in C#. For C++ this is routine functionality:
#include <iostream>

class context {};

class A {
public:
        A(context* ctx) {}
        void a1() { std::cout << "a1" << std::endl; }
        void a2() { std::cout << "a2" << std::endl; }
};

class B {
public:
        B(context* ctx) {}
        void b1() { std::cout << "b2" << std::endl; }
        void b2() { std::cout << "b2" << std::endl; }
};

class route {
public:
        route(const char* path) : _path(path) {}
        virtual void invoke(context* ctx) = 0;
private:
        const char* _path;
};

template<class T>
class troute : public route {
public:
        typedef void (T::*F)();
        troute(const char* path, F f) : route(path), _f(f) {}
        void invoke(context* ctx) { (T(ctx).*_f)(); }
private:
        F _f;
};
 
int main()
{
        troute<A> r1("A/a1", &A::a1);
        troute<A> r2("A/a2", &A::a1);
        troute<B> r3("B/b1", &B::b1);
        troute<B> r4("B/b2", &B::b2);

        route* routes[] = { &r1, &r2, &r3, &r4 };

        context ctx;

        for (int i = 0; i < 4; i++)
                routes[i]->invoke(&ctx);

        return 0;
}
C# has delegates and generics in place of pointers to member functions and templates. My first attempt to port this to C# went as follows:
class context { };

class A {
        public A(context ctx) { }
        public void a1() { Console.WriteLine("a1"); }
        public void a2() { Console.WriteLine("a2"); }
}

class B {
        public B(context ctx) { }
        public void b1() { Console.WriteLine("b1"); }
        public void b2() { Console.WriteLine("b2"); }
}

class route {
        public delegate void F(context ctx);
        public route(string path, F f) {
                _path = path; _f = f; 
        }
        public void invoke(context ctx) { _f(ctx); }
        private string _path;
        private F _f;
}

static void Main(string[] args)
{
        route[] routes = {
                new route("A/a1", (c)=>new A(c).a1()),
                new route("A/a2", (c)=>new A(c).a2()),
                new route("B/b1", (c)=>new B(c).b1()),
                new route("B/b2", (c)=>new B(c).b2()),
        };

        context ctx = new context();

        for (int i = 0; i < 4; i++)
                routes[i].invoke(ctx);
}
A couple of notable differences. The first is that the C++ pointer to member function syntax is much cleaner than the corresponding delegate syntax, to my mind at least. The C# syntax is noisy. The second is that the A/B objects are instantiated by the delegate rather than within the route class. A nice side-effect of that is the route class itself has no need for polymorphism. In some cases this may be OK, but in others possibly not. This can be remedied as follows:
class context { };

class initable {
        public void Init(context ctx) { }
}

class A : initable {
        public void a1() { Console.WriteLine("a1"); }
        public void a2() { Console.WriteLine("a2"); }
}

class B : initable {
        public void b1() { Console.WriteLine("b1"); }
        public void b2() { Console.WriteLine("b2"); }
}
 
abstract class route {
        public route(string path) { _path = path; }
        public abstract void invoke(context ctx);
        private string _path;
} 

class troute<T> : route where T : initable, new() {
        public delegate void F(T t);
        public troute(string path, F f) : base(path) {
                _f = f;
        }
        public void invoke(context ctx) { 
                T t = new T(); t.Init(ctx); _f(t); 
        }
        private F _f;
} 

static void Main(string[] args)
{
        route[] routes = {
                new troute<A>("A/a1", (t)=>t.a1()),
                new troute<A>("A/a2", (t)=>t.a2()),
                new troute<B>("B/b1", (t)=>t.b1()),
                new troute<B>("B/b2", (t)=>t.b2()),
        };

        context ctx = new context();

        for (int i = 0; i < 4; i++)
                routes[i].invoke(ctx);
}
This approach reproduces the C++ pointer to member function technique more faithfully in the sense of preserving the structure of the code and maintaining the responsibilities in corresponding places. In this case the delegate provides precisely the same functionality as a C++ pointer to member function, although the C# syntax is still more ugly. The trade-off here is that C# generics do not allow for a constructor with parameters, so it's necessary to construct using a parameterless constructor and then initialise after the fact. This in turn necessitated adding the initable base clase to achieve polymorphism between classes A and B. Because of this the C# above is more convoluted than my original attempt, but at least it's a more faithful port. Furthermore, although not inherently required by the syntax, I suspect this polymorphism is resolved dynamically at run-time which highlights the run-time dynamic nature of C# generics compared to the completely compile-time static nature of the C++ template system.

Stroustrup's own thoughts on C# generics reflect precisely what is going on here:
generics are primarily syntactic sugar for abstract classes; that is, with generics (whether Java or C# generics), you program against precisely defined interfaces and typically pay the cost of virtual function calls and/or dynamic casts to use arguments.  
[C++] Templates supports generic programming, template metaprogramming, etc. through a combination of features such as integer template arguments, specialization, and uniform treatment of built-in and user-defined types. The result is flexibility, generality, and performance unmatched by "generics".
I can't help but wonder why Microsoft stripped out such useful parts of C++ when producing C#. Are they trying to protect us lowly developers from complexity? If so I'm not sure it has worked because as this discussion shows you have to do convoluted things to recover routine C++ functionality. A programming language is all about providing the tools to create a solution to a problem. Dumbing down tools on the grounds of simplification is not helpful. Possibly the real motivation for removing some of the C++ features from C# was more to simplify the implementation of the language rather than the language itself.

Saturday 3 November 2012

OpenBSD Xen DomU

It's no secret that Theo is no fan of virtualisation. Nonetheless I've been running OpenBSD as a Xen DomU for years and it works pretty well for the most part.

I understand that Theo's primary criticism is that people try to make a strained case that running multiple VMs increases security whereas the fact is that adding a hypervisor just adds more complexity with more bugs and more opportunities for exploitation. Hard to argue with that point of view, but for me it's about the convenience of being able to run many instances of OpenBSD, some i386, some amd64, for hacking or for hosting my own little services on my LAN behind my firewall (which is a physical Soekris net4801 also running OpenBSD). Xen gives me convenience, makes it easy to maintain backups of my images, cuts down on noise, power and cost of running lots of individual physical machines.

For a long time I've been running a slightly patched i386 GENERIC kernel. I found that under heavy load (building OpenBSD release sets) it would occasionally panic. I recently enabled the QEMU APIC and found that the i386 GENERIC.MP kernel worked fine with two CPUs. So far no crashes.

I've also been running the amd64 GENERIC kernel and never had a crash with that, with or without the QEMU APIC enabled. However the GENERIC.MP kernel fails to boot with more than one CPU. I haven't made much progress understanding what the cause of the problem is yet.

When I have some, I'd like to dedicate time to creating PV drivers for OpenBSD/Xen to improve performance. The first thing I'd like is the ability to force time to resync after the guest has been restored from a saved state.