Writing your very own Ruby extension with C

22 January 2007

Ruby is an extremely powerful and flexible language, but Ruby is also slow. According to performance zealots, when you use Ruby, you have signed a pact with the devil where you agreed to trade off pure effeciency for simplicity. It's a bit more complex than that though, because 90% of the time, your ruby script will either be waiting for I/O, or executing API functions written in C.

Still, there are many reasons why you would want to write C extensions for Ruby. Say, for instance, there is this wonderful library that does exactly what you need, but there is no Ruby binding available for it. You could write your very own binding for Ruby with C! Or maybe you have a block of code that requires a lot of calculation power, is executed millions of times or for some reason has to run very fast, it could be useful to write this part of your code in a faster language, like C, and call that code from your ruby script. Or perhaps you already have a perfectly good implementation of a certain algorithm in C, and you don't want to convert it to ruby.

If you're a serious ruby user, chances are that one day you'll need to write an extension in C, so you'd better read ahead.

What you need

Obviously, you will need a ruby interpreter and a C compiler. I will be using the gcc and ruby packages from Ubuntu/Edgy's repositories throughout this article. At the time of writing, this is gcc 4.1.2 and ruby 1.8.4. On Windows(R), I will be using Microsoft Visual Studio 6.0 Professional Edition. If you're using Visual Studio, you should replace make by nmake in the examples below and make the proper modifications to your %PATH% variable.

The secret ingredient needed to make your C code interact with Ruby is ruby.h. On Ubuntu it is included in the ruby1.8-dev package. On Windows(R), it's automatically installed if you use the One-Click Installer.

(Note that it is best to use the same C compiler as the one that was used to compile ruby, to avoid compatibility issues. The Windows binaries on Ruby-lang.org are compiled with MSVC 6.0, which is almost compatible with MinGW, but not with newer versions of MSVC)

Defining functions, modules, classes and methods from C

For starters, we will try and call a simple hello world. Here is the code we want to call:

    1#include <ruby.h>
    2#include <stdio.h>
    3
    4VALUE hello_world()
    5{
    6   puts("Hello World!");
    7   return Qnil;
    8}

There are a few things that you won't recognize if you haven't written a ruby extension before:

VALUE: VALUE is defined by ruby.h, and represents a ruby object. Every method or function that can be called from ruby has to return a VALUE. Every parameter passed from ruby will be a VALUE. I will discuss the use of VALUEs in more detail below.
Qnil: This is a variable representing the ruby nil object. Other variables like this are Qtrue, Qfalse and Qundef.

To call this code from ruby you should decide how and where you want to call the code: do you want it to be a global function, a module function (or a static function of a class, which is actually the same to ruby), or an instance method. Examples for all of these are included in the download at the bottom of the page, but I'll only show how to create an instance method here because this is the most used technique and we will expand the example later.

So we need to create a module, a class, and then define an instance method for that class. To do this, we need to write an Init method. When Ruby loads the extension, it automatically calls the Init_filename method, where filename is the name of your source file, without its extension. Here is what our Init method looks like:

   12void Init_test()

   13{
   14   hw_mMyModule = rb_define_module("MyModule");
   15   hw_cMyClass = rb_define_class_under(hw_mMyModule, "MyClass", rb_cObject);
   16
   17   rb_define_method(hw_cMyClass, "hello_world", hello_world, 0);
   18}

rb_define_module takes a single parameter: a string with the name of the new module. rb_define_class_under is a bit more complex though: the first parameter should be a VALUE pointing to the module your class is part of, the second is the string containing the name of your new class, and the third is the superclass of you class. In this case, we just want to inherit from the Object class, so we pass rb_cObject.

We also need to define 2 variables to hold the references to our class and module, outside any function. These are, of course, VALUEs:

    4VALUE hw_mMyModule, hw_cMyClass;

Modules defined by Ruby are usually named rb_mXXX where XXX is the name of the module, while classes are often named rb_cXXX. Errors and exceptions are named rb_eXXX. When you write your own modules and classes, however, it is recommended that you find your own naming scheme, to distiguish your code from Ruby's internals. I'll be using hw_yXXX, for 'Hello World' throughout this tutorial.

To define an instance method, you need to call rb_define_method(VALUE class, char* methodName, functionPtr, int parameterCount), where class is the class to which you want to add a new method, methodName is the name for the new method in ruby, functionPtr is the function pointer to the C function that corresponds to the new ruby method, and parameterCount is the number of parameters the value takes.

Now that we have an Init function, we should be all set to build our first ruby extension!

Building and running your code

To compile a ruby extension, you need to create a ruby script that creates a Makefile for you, and then run make. Here is what such a script would look like:

    1require 'mkmf'
    2
    3abort 'need stdio.h' unless have_header("stdio.h")
    4
    5dir_config('test')
    6create_makefile('test')

First, you need to require mkmf, which gives us a set of functions that make it easier to set up a Makefile. In the example script I have used a useless demand that stdio.h is available to the compiler. This is always the case if you have a compiler installed, but I included it just to show you how it works.

There is also a have_library('libraryName', 'libraryFunction') function that takes a library name and a function that should be provided by the library.

After all the checks have passed, we call dir_config('test') and create_makefile('test') to create the actual Makefile. Obviously, you should replace test with the name of your C source code file.

Save the file as extconf.rb and then run ruby extconf.rb and make.

Now, you can use your code as you would with any ruby script. For instance, with irb you could try:

irb(main):001:0> require 'test'
=> true
irb(main):002:0> foo = MyModule::MyClass.new
=> #<MyModule::MyClass:0xb7d12de4>
irb(main):003:0> foo.hello_world
Hello World!
=> nil
irb(main):004:0>

Data conversion from and to Ruby

Now that we have a working piece of code, lets make it do someting a bit more useful. We want it to print out a hello message for every value in an array, which we will be passing as a parameter. The ruby code will look like this:

greeter = MyModule::MyClass.new
greeter.greet ['Nick', 'Wim', 'everyone']

and the output should be:

hello Nick, hello Wim, hello everyone, I'm a machine.

You're going to see a lot of new stuff in the next snippet, but don't worry, I'll explain every last bit of it later. For now, just read through it. I included the whole file so you can keep track of what has changed:

    1#include <ruby.h>
    2#include <stdio.h>
    3
    4VALUE hw_mMyModule, hw_cMyClass;
    5
    6VALUE greet(VALUE self, VALUE names)
    7{
    8   int i;
    9   struct RArray *names_array;
   10   
   11   names_array = RARRAY(names);
   12   
   13   for(i = 0; i < names_array->len; i++)
   14   {
   15       VALUE current = names_array->ptr[i];
   16       if(rb_respond_to(current, rb_intern("to_s")))
   17       {
   18           VALUE name = rb_funcall(current, rb_intern("to_s"), 0);
   19           printf("hello %s, ", StringValuePtr(name));
   20       }
   21   }
   22   printf("I'm a machine.\n");
   23   
   24   return Qnil;
   25}
   26
   27void Init_test()
   28{
   29   hw_mMyModule = rb_define_module("MyModule");
   30   hw_cMyClass = rb_define_class_under(hw_mMyModule, "MyClass", rb_cObject);
   31
   32   rb_define_method(hw_cMyClass, "greet", greet, 1);
   33}

First thing to notice is the function header.

    6VALUE greet(VALUE self, VALUE names)

We've added two parameters to the signature. Two? Yes, two. Even though we only want one parameter, the function needs an extra VALUE for the self-reference. Because C is not object oriented, it has no way of knowing what object the function is being called for, or even if it's an instance method or not. So, to be able to use the object oriented approach Ruby uses, we need an extra parameter containing the instance that the method is called for, the self-reference.

Next, you'll see struct RArray and RARRAY(...) which are used to handle Ruby's arrays. I will discuss those later on.

Before we move on to data conversion, let's take a brief look at

   16if(rb_respond_to(current, rb_intern("to_s")))

and

   18VALUE name = rb_funcall(current, rb_intern("to_s"), 0);

The three functions I used here, rb_intern, rb_respond_to and rb_funcall, are very important if you want to interact with Ruby. So far, the only thing we've done is just writing Ruby code that calls C functions. If we want C code to call Ruby methods, we need to use rb_intern and rb_funcall. rb_intern takes a null terminated string and returns the corresponding VALUE object. You can use this on other things than just functions, too.

rb_funcall calls a ruby method for you. It takes 3 or more parameters:

The first is the object you want to call the method for, which is of course a VALUE.
The second is the VALUE that describes the method you want to call. Most of the time, you will fetch this with rb_intern.
The third is the number of parameters you need to pass to the method you're calling. This also defines how many extra parameters you need to pass to rb_funcall.
If you pass a value larger than 0 as the third parameter, parameter 4 and up correspond to parameter 1 and up for your Ruby function. Obviously, these all need to be VALUEs, as they will be fed directly to your Ruby code.

rb_funcall raises an NameError when you try to call a method that doesn't exist. Because that's not what we want in our example, I used rb_respond_to. This function takes two parameters, a VALUE containing the object you want to test, and another VALUE that describes a method, and is the direct counterpart of Object#respond_to? from Ruby. It returns 0 when the function does not exist for the object, and 1 if it does. This allows us to gracefully skip objects that can't be converted into strings, rather than raising an error.

Numbers

Numbers are easy to convert. There are simple macros to convert numbers between Ruby and C. NUM2INT() and NUM2UINT() take a VALUE and return the signed int or unsigned int representation of that value. If it is not possible to convert the object to an integer, the appropriate error will be raised. To convert a signed int or unsigned int to a ruby Number, you can use INT2NUM() and UINT2NUM(). Isn't that convenient?

There are also NUM2LONG, LONG2NUM, NUM2ULONG, ULONG2NUM and NUM2DBL, DBL2NUM macros for dealing with long integers and double precision floating point numbers, that work in the exact same way. Note that there are no UDBL variations.

Strings

There is a STR2CSTR macro as well, but that one has been declared deprecated, and it is recommended to use a new macro called StringValuePtr instead. The StringValuePtr macro also takes a VALUE and returns a char* pointing to a C string representation of that parameter.

I've used it in the example above, on line 19:

   19printf("hello %s, ", StringValuePtr(name));

Converting a char* to a string VALUE requires the use of either rb_str_new(char*, int) or rb_str_new2(char*). Typically, you'd use the second, as it is the most simple version that just takes your null terminated string and makes a new String object from it. The first takes an extra parameter, which should be the length of your string. This is useful when you already know the length and don't want to waste time letting rb_str_new2() calculate the length again.

Arrays

Handling arrays is just as easy, but it requires an extra data structures, struct RArray. Again, there is a simple macro for converting a VALUE that stores an array to an RArray pointer: RARRAY(). It's use is demonstrated in the example above:

    9struct RArray *names_array;
   10
   11names_array = RARRAY(names);

The RArray structure has two important fields: len and ptr. len is the number of elements in the array, while ptr is the pointer to the VALUEs that are stored inside the array. Pretty straight forward, isn't it? Here's some code that shows how you can iterate over an array:

   13for(i = 0; i < names_array->len; i++)
   14{
   15   VALUE current = names_array->ptr[i];
   16   ...
   21}

Wrapping your C data in Ruby objects

Garbage collection

Before I tell you how to wrap a C struct into a VALUE, you need to know more about Ruby's memory management. As you probably know, Ruby uses garbage collection (GC), which takes care of allocating and freeing memory for you. Of course, your C code still needs to manage it's memory correctly, but you have to know about how things happen when you're writing ruby code to do so.

Ruby uses a mark-and-sweep algorithm to determine if memory is still in use or if it can be freed by the garbage collector. To write correctly working Ruby extensions, you need to understand how this mechanism works. Mark and sweep, as the name implies, consists of two phases:

During the mark phase, all memory that can be reached by the interpreter is marked as reachable, leaving unreachable objects unmarked.
During the sweep phase, all memory that has not been marked as reachable, is freed.

This simple algorithm requires a notion of 'reachable' objects. This is done recursively: Ruby marks all variables than can be reached directly in the current scope, and those objects mark the objects they can reach, and so on. Usually, Ruby takes care of this all by itself, but when you're using C to extend Ruby, Ruby has no idea of what objects you can reach, so you have to mark them for Ruby.

So you need to write a mark method for your class. But that's not all. Since you want to create custom Ruby objects, Ruby has no idea how these objects should be allocated or freed, either! So we need an allocate function and a free function as well.

Storing our C struct in a ruby object

Now that you understand how Ruby's garbage collector works, let's write some code! Let's say we want to store the name of our greeter in a structure. We have a lot to take care of, so I'm going to guide you through it step by step.

First, we create a structure to hold our name,

    6typedef struct greeter_s {
    7   VALUE name;
    8} greeter_t;

and an allocate function:

   32VALUE greeter_allocate(VALUE klass)
   33{
   34   greeter_t *g = malloc(sizeof(greeter_t));
   35   g->name = Qnil;
   36   return Data_Wrap_Struct(klass, greeter_mark, greeter_free, g);
   37}

Data_Wrap_Struct turns any structure into a VALUE. It takes four parameter: the class of the object (klass), a function pointer to our mark function (greeter_mark), a function pointer to our sweep function (greeter_free), and our data structure (g). You always get the class of our object as a parameter, so we don't have to use our hw_cMyClass variable here. The return value should be our freshly create object.

Please note: the allocate function is NOT a constructor. It's only in charge of allocating the memory required by a certain kind of object and correctly inializing it. If you want to do something more, like setting certain fields to a custom value, you should define an initialize method for your object, just like you would in Ruby. We'll do that later to set the name of our greeter to a more meaningful value.

Now we have to implement our greeter_free and greeter_mark methods. These are pretty simple:

   32void greeter_mark(greeter_t* self)
   33{
   34   rb_gc_mark(self->name);
   35}
   36
   37void greeter_free(greeter_t* self)
   38{
   39   free(self);
   40}

The mark method should just call rb_gc_mark for all the VALUE fields in the struct. This causes their mark method to be called and marks them as reachable. You should never call another objects mark method directly.

The free function should free the memory used by the C structure and any memory that has been allocated for it's fields. It should also close file handles or sockets, and take care of all other resources used by the object. It should not do anything special for other VALUE fields in the structure - you don't know if they are still being used by other objects. The garbage collector will take care of them in due time.

We also need to register the allocate function, so Ruby knows where to find it. To do this, we have to put the following code in our Init function:

   53rb_define_alloc_func(hw_cMyClass, greeter_allocate);

Getting your data back

Now that we have stored our data in a VALUE, it's time to learn how to get it out of its wrapping. Of course, in Ruby's spirit of simplicity, there is a simple Data_Get_Struct, that can convert a VALUE to a C structure. It takes three parameters: the ruby object, the kind of data stored in the object, and a pointer that will point to the C structure after the function call. Here's our new and improved greet method:

   10VALUE greet(VALUE self, VALUE names)
   11{
   12   int i;
   13   struct RArray *names_array;
   14   greeter_t* greeter;
   15   VALUE my_name;
   16   
   17   Data_Get_Struct(self, greeter_t, greeter);
   18   my_name = rb_funcall(greeter->name, rb_intern("to_s"), 0);
   19   names_array = RARRAY(names);
   20   
   21   for(i = 0; i < names_array->len; i++)
   22   {
   23       VALUE current = names_array->ptr[i];
   24       if(rb_respond_to(current, rb_intern("to_s")))
   25       {
   26           VALUE name = rb_funcall(current, rb_intern("to_s"), 0);
   27           printf("hello %s, ", StringValuePtr(name));
   28       }
   29   }
   30   printf("I'm %s.\n", StringValuePtr(my_name));
   31   
   32   return Qnil;
   33}

Now, since you can't give a name to your greeter yet, it will always print an empty string as its name. We should write an initialize method to set the name to a more meaningful value. But to do that, I would like to use another feature we haven't discussed yet.

Raising errors

That's right, the last thing I'm going to teach you is what to do when everything goes wrong. Ruby has its raise-rescue error handling system, but in C, there is no such thing! However, it could still be useful to raise an error from you C code. There is a function in ruby.h that does just that: rb_raise.

But before we go into any more details, let's write our initialize method without raising errors:

   52VALUE greeter_initialize(VALUE self, VALUE name)
   53{
   54   greeter_t* greeter;
   55   Data_Get_Struct(self, greeter_t, greeter);
   56   greeter->name = name;
   57   return self;
   58}

Now, how could we possibly use errors here? What could possibly cause an error in our simple hello world example? We could pass an invalid value as a name, a value that cannot be converted into a string. This would not cause an error right away, but would raise one whenever the user tried to call our greet method. Not really the behaviour you'd expect from an object that was corrupted by a constructor parameter, don't you think? It would be much better if we just raised the error once, when the object is created. Here's how we do it:

   52VALUE greeter_initialize(VALUE self, VALUE name)
   53{
   54   greeter_t* greeter;
   55   
   56   if(!rb_respond_to(name, rb_intern("to_s")))
   57       rb_raise(rb_eArgError, "name should respond to to_s");
   58   
   59   Data_Get_Struct(self, greeter_t, greeter);
   60   greeter->name = name;
   61   return self;
   62}

rb_raise takes two parameters: one that indicates what kind of error we want to raise (rb_eArgError) and one containing an error message. Here we want to raise an ArgumentError, because an invalid argument was being passed to our method. Other popular choices are rb_eTypeError, rb_eNotImpError (Not Implemented Error, my personal favorite), rb_eNameError and rb_eIOError. For a complete list of standard error classes, you should check ruby.h.

The Dark Side of mixing C and Ruby

Now that you have seen how easy it is to mix C and Ruby code, you can probably think of a few places where you could put your new knowledge to good use. You can now use the power of C with the flexibility of Ruby! Great! But before you do so, think at least twice about it: do you really need to use C code in your Ruby script?

Using C comes with an array of new problems: you have to get your C code to run on every platform you target. Unlike Ruby, C code is not completely portable, especially not after it has been compiled. You need to provide binaries for all those platforms, or have the end user compile your C extension himself, which is not always a good idea, or even a possibility (for instance, on Windows this would require the user to have a copy of Visual Studio 6.0). This means that you will need to spend a lot of time building and packaging your script, valuable time you could have spent working on the next version of your project!

And even if that doesn't scare you away, writing C code is something that requires a lot more care than writing Ruby code. Your Ruby code is run from inside the interpreter, and the worst thing that could happen is that you raise an error which is not caught and causes your script to exit. Big deal. In C on the other hand, errors are harder to detect and can have more serious consequences. It can crash your computer, for instance, or even worse, it can keep running after it has corrupted a portion of the process' memory.

Writing C extensions ussually takes a lot more time then writing the same functionality in plain ruby code. This is the main reason why you're using Ruby, remember? So don't be to hasty when you decide to 'optimize' your code, and think about whether you really need to.

Conclusion

Extending Ruby with C is both easy and fun. A lot of Ruby's spirit can still be felt throughout its C API. Using C can speed up your ruby code, or connect your script with libraries that it couldn't use before. However, writing C code still requires a lot more time and gives the developer more responsibility to ensure the correct behaviour of the application, so you shouldn't use it if you don't really need it.

Where to go from here

This article is only an introduction to Ruby's C interface. A lot of the interface is undocumented, but it is easy to figure out how things work most of the time. The best way to learn more about mixing C and Ruby is by using your knowledge and playing around with the API. If you need to do something and you don't know how, try looking through ruby.h. Another valuable resource is the source code of Ruby's core classes. They often show you how things should be done. If you can't find it there, and even the almighty Google doesn't know the answer, you could try asking on the Ruby mailing lists or forums about software development and Ruby on the web.