Ruby is an extremely powerful and flexible language, but Ruby is also slow. According to performance zealots, when you use Ruby, you have signed a pact with the devil where you agreed to trade off pure effeciency for simplicity. It's a bit more complex than that though, because 90% of the time, your ruby script will either be waiting for I/O, or executing API functions written in C.
Still, there are many reasons why you would want to write C extensions for Ruby. Say, for instance, there is this wonderful library that does exactly what you need, but there is no Ruby binding available for it. You could write your very own binding for Ruby with C! Or maybe you have a block of code that requires a lot of calculation power, is executed millions of times or for some reason has to run very fast, it could be useful to write this part of your code in a faster language, like C, and call that code from your ruby script. Or perhaps you already have a perfectly good implementation of a certain algorithm in C, and you don't want to convert it to ruby.
If you're a serious ruby user, chances are that one day you'll need to write an extension in C, so you'd better read ahead.
Obviously, you will need a ruby interpreter and a C compiler. I will be using the gcc and ruby
packages from Ubuntu/Edgy's repositories throughout this article. At the time of writing, this is gcc 4.1.2
and ruby 1.8.4. On Windows(R), I will be using Microsoft Visual Studio 6.0 Professional Edition. If you're
using Visual Studio, you should replace make
by nmake
in the examples below and make the proper
modifications to your %PATH%
variable.
The secret ingredient needed to make your C code interact with Ruby is ruby.h
. On Ubuntu it is included
in the ruby1.8-dev package. On Windows(R), it's automatically installed if you use the
One-Click Installer.
(Note that it is best to use the same C compiler as the one that was used to compile ruby, to avoid compatibility issues. The Windows binaries on Ruby-lang.org are compiled with MSVC 6.0, which is almost compatible with MinGW, but not with newer versions of MSVC)
For starters, we will try and call a simple hello world. Here is the code we want to call:
1#include <ruby.h>
2#include <stdio.h>
3
4VALUE hello_world()
5{
6 puts("Hello World!");
7 return Qnil;
8}
There are a few things that you won't recognize if you haven't written a ruby extension before:
VALUE
is defined by ruby.h
, and represents a ruby object. Every method or function that
can be called from ruby has to return a VALUE
. Every parameter passed from ruby will
be a VALUE
. I will discuss the use of VALUE
s in more detail below.nil
object. Other variables like this are Qtrue
,
Qfalse
and Qundef
.To call this code from ruby you should decide how and where you want to call the code: do you want it to be a global function, a module function (or a static function of a class, which is actually the same to ruby), or an instance method. Examples for all of these are included in the download at the bottom of the page, but I'll only show how to create an instance method here because this is the most used technique and we will expand the example later.
So we need to create a module, a class, and then define an instance method for that class. To do this, we need to write
an Init
method. When Ruby loads the extension, it automatically calls the Init_filename
method, where filename is
the name of your source file, without its extension. Here is what our Init
method looks like:
12void Init_test()
13{
14 hw_mMyModule = rb_define_module("MyModule");
15 hw_cMyClass = rb_define_class_under(hw_mMyModule, "MyClass", rb_cObject);
16
17 rb_define_method(hw_cMyClass, "hello_world", hello_world, 0);
18}
rb_define_module
takes a single parameter: a string with the name of the new module. rb_define_class_under
is a bit
more complex though: the first parameter should be a VALUE
pointing to the module your class is part of, the second
is the string containing the name of your new class, and the third is the superclass of you class. In this case, we
just want to inherit from the Object class, so we pass rb_cObject.
We also need to define 2 variables to hold the references to our class and module, outside any function. These are, of
course, VALUE
s:
4VALUE hw_mMyModule, hw_cMyClass;
Modules defined by Ruby are usually named rb_mXXX
where XXX is the name of the module, while classes are often named rb_cXXX
. Errors and
exceptions are named rb_eXXX
. When you write your own modules and classes, however, it is recommended that you find your own naming scheme, to distiguish your code from Ruby's internals. I'll be using hw_yXXX
, for 'Hello World' throughout this tutorial.
To define an instance method, you need to call rb_define_method(VALUE class, char* methodName, functionPtr, int parameterCount)
, where class
is the class to which you want to add a new method, methodName
is the name for the new
method in ruby, functionPtr
is the function pointer to the C function that corresponds to the new ruby method, and
parameterCount
is the number of parameters the value takes.
Now that we have an Init
function, we should be all set to build our first ruby extension!
To compile a ruby extension, you need to create a ruby script that creates a Makefile for you, and then run make
.
Here is what such a script would look like:
1require 'mkmf'
2
3abort 'need stdio.h' unless have_header("stdio.h")
4
5dir_config('test')
6create_makefile('test')
First, you need to require mkmf
, which gives us a set of functions that make it easier to set up a Makefile. In the
example script I have used a useless demand that stdio.h
is available to the compiler. This is always the case if you
have a compiler installed, but I included it just to show you how it works.
There is also a have_library('libraryName', 'libraryFunction')
function that takes a library name and a function that
should be provided by the library.
After all the checks have passed, we call dir_config('test')
and create_makefile('test')
to create the actual Makefile.
Obviously, you should replace test with the name of your C source code file.
Save the file as extconf.rb
and then run ruby extconf.rb
and make
.
Now, you can use your code as you would with any ruby script. For instance, with irb
you could try:
Now that we have a working piece of code, lets make it do someting a bit more useful. We want it to print out a hello message for every value in an array, which we will be passing as a parameter. The ruby code will look like this:
and the output should be:
You're going to see a lot of new stuff in the next snippet, but don't worry, I'll explain every last bit of it later. For now, just read through it. I included the whole file so you can keep track of what has changed:
1#include <ruby.h>
2#include <stdio.h>
3
4VALUE hw_mMyModule, hw_cMyClass;
5
6VALUE greet(VALUE self, VALUE names)
7{
8 int i;
9 struct RArray *names_array;
10
11 names_array = RARRAY(names);
12
13 for(i = 0; i < names_array->len; i++)
14 {
15 VALUE current = names_array->ptr[i];
16 if(rb_respond_to(current, rb_intern("to_s")))
17 {
18 VALUE name = rb_funcall(current, rb_intern("to_s"), 0);
19 printf("hello %s, ", StringValuePtr(name));
20 }
21 }
22 printf("I'm a machine.\n");
23
24 return Qnil;
25}
26
27void Init_test()
28{
29 hw_mMyModule = rb_define_module("MyModule");
30 hw_cMyClass = rb_define_class_under(hw_mMyModule, "MyClass", rb_cObject);
31
32 rb_define_method(hw_cMyClass, "greet", greet, 1);
33}
First thing to notice is the function header.
6VALUE greet(VALUE self, VALUE names)
We've added two parameters to the signature. Two? Yes, two. Even though we only want one parameter, the function needs an extra
VALUE
for the self-reference. Because C is not object oriented, it has no way of knowing what object the function is being
called for, or even if it's an instance method or not. So, to be able to use the object oriented approach Ruby uses, we need
an extra parameter containing the instance that the method is called for, the self-reference.
Next, you'll see struct RArray
and RARRAY(...)
which are used to handle Ruby's arrays. I will discuss those later on.
Before we move on to data conversion, let's take a brief look at
16if(rb_respond_to(current, rb_intern("to_s")))
and
18VALUE name = rb_funcall(current, rb_intern("to_s"), 0);
The three functions I used here, rb_intern
, rb_respond_to
and rb_funcall
, are very important if you want to interact with Ruby.
So far, the only thing we've done is just writing Ruby code that calls C functions. If we want C code to call Ruby methods, we need to use
rb_intern
and rb_funcall
. rb_intern
takes a null terminated string and returns the corresponding VALUE
object. You can use this
on other things than just functions, too.
rb_funcall
calls a ruby method for you. It takes 3 or more parameters:
VALUE
.VALUE
that describes the method you want to call. Most of the time, you will fetch this with rb_intern
.VALUE
s, as they will be fed directly to your Ruby code.rb_funcall
raises an NameError when you try to call a method that doesn't exist. Because that's not what we want in our example, I used
rb_respond_to
. This function takes two parameters, a VALUE
containing the object you want to test, and another VALUE
that describes a
method, and is the direct counterpart of Object#respond_to?
from Ruby. It returns 0 when the function does not exist for the object, and
1 if it does. This allows us to gracefully skip objects that can't be converted into strings, rather than raising an error.
Numbers are easy to convert. There are simple macros to convert numbers between Ruby and C. NUM2INT()
and NUM2UINT()
take a VALUE
and
return the signed int
or unsigned int
representation of that value. If it is not possible to convert the object to an integer, the
appropriate error will be raised. To convert a signed int
or unsigned int
to a ruby Number
, you can use INT2NUM()
and UINT2NUM()
.
Isn't that convenient?
There are also NUM2LONG
, LONG2NUM
, NUM2ULONG
, ULONG2NUM
and NUM2DBL
, DBL2NUM
macros for dealing with long integers and double
precision floating point numbers, that work in the exact same way. Note that there are no UDBL
variations.
There is a STR2CSTR
macro as well, but that one has been declared deprecated, and it is recommended to use a new macro called StringValuePtr
instead. The StringValuePtr
macro also takes a VALUE
and returns a char*
pointing to a C string representation of that parameter.
I've used it in the example above, on line 19:
19printf("hello %s, ", StringValuePtr(name));
Converting a char* to a string VALUE
requires the use of either rb_str_new(char*, int)
or rb_str_new2(char*)
. Typically, you'd use the
second, as it is the most simple version that just takes your null terminated string and makes a new String
object from it. The first takes
an extra parameter, which should be the length of your string. This is useful when you already know the length and don't want to waste time
letting rb_str_new2()
calculate the length again.
Handling arrays is just as easy, but it requires an extra data structures, struct RArray
. Again, there is a simple macro for converting a
VALUE
that stores an array to an RArray
pointer: RARRAY()
. It's use is demonstrated in the example above:
9struct RArray *names_array;
10
11names_array = RARRAY(names);
The RArray
structure has two important fields: len
and ptr
. len
is the number of elements in the array, while ptr
is the pointer to
the VALUE
s that are stored inside the array. Pretty straight forward, isn't it? Here's some code that shows how you can iterate over an array:
13for(i = 0; i < names_array->len; i++)
14{
15 VALUE current = names_array->ptr[i];
16 ...
21}
Before I tell you how to wrap a C struct
into a VALUE
, you need to know more about Ruby's memory management. As you probably know, Ruby uses garbage collection (GC), which takes care of allocating and freeing memory for you. Of course, your C code still needs to manage it's memory correctly, but you have to know about how things happen when you're writing ruby code to do so.
Ruby uses a mark-and-sweep algorithm to determine if memory is still in use or if it can be freed by the garbage collector. To write correctly working Ruby extensions, you need to understand how this mechanism works. Mark and sweep, as the name implies, consists of two phases:
This simple algorithm requires a notion of 'reachable' objects. This is done recursively: Ruby marks all variables than can be reached directly in the current scope, and those objects mark the objects they can reach, and so on. Usually, Ruby takes care of this all by itself, but when you're using C to extend Ruby, Ruby has no idea of what objects you can reach, so you have to mark them for Ruby.
So you need to write a mark
method for your class. But that's not all. Since you want to create custom Ruby objects, Ruby has no idea how these objects should be allocated or freed, either! So we need an allocate
function and a free
function as well.
Now that you understand how Ruby's garbage collector works, let's write some code! Let's say we want to store the name of our greeter in a structure. We have a lot to take care of, so I'm going to guide you through it step by step.
First, we create a structure to hold our name,
6typedef struct greeter_s {
7 VALUE name;
8} greeter_t;
and an allocate function:
32VALUE greeter_allocate(VALUE klass)
33{
34 greeter_t *g = malloc(sizeof(greeter_t));
35 g->name = Qnil;
36 return Data_Wrap_Struct(klass, greeter_mark, greeter_free, g);
37}
Data_Wrap_Struct
turns any structure into a VALUE
. It takes four parameter: the class of the object (klass
), a function pointer to our mark function (greeter_mark
), a function pointer to our sweep function (greeter_free
), and our data structure (g
). You always get the class of our object as a parameter, so we don't have to use our hw_cMyClass
variable here. The return value should be our freshly create object.
Please note: the allocate function is NOT a constructor. It's only in charge of allocating the memory required by a certain kind of object and correctly inializing it. If you want to do something more, like setting certain fields to a custom value, you should define an initialize
method for your object, just like you would in Ruby. We'll do that later to set the name of our greeter to a more meaningful value.
Now we have to implement our greeter_free
and greeter_mark
methods. These are pretty simple:
32void greeter_mark(greeter_t* self)
33{
34 rb_gc_mark(self->name);
35}
36
37void greeter_free(greeter_t* self)
38{
39 free(self);
40}
The mark method should just call rb_gc_mark
for all the VALUE
fields in the struct. This causes their mark method to be called and marks them as reachable. You should never call another objects mark
method directly.
The free function should free the memory used by the C structure and any memory that has been allocated for it's fields. It should also close file handles or sockets, and take care of all other resources used by the object. It should not do anything special for other VALUE
fields in the structure - you don't know if they are still being used by other objects. The garbage collector will take care of them in due time.
We also need to register the allocate function, so Ruby knows where to find it. To do this, we have to put the following code in our Init
function:
53rb_define_alloc_func(hw_cMyClass, greeter_allocate);
Now that we have stored our data in a VALUE
, it's time to learn how to get it out of its wrapping. Of course, in Ruby's spirit of simplicity, there is a simple Data_Get_Struct
, that can convert a VALUE
to a C structure. It takes three parameters: the ruby object, the kind of data stored in the object, and a pointer that will point to the C structure after the function call. Here's our new and improved greet
method:
10VALUE greet(VALUE self, VALUE names)
11{
12 int i;
13 struct RArray *names_array;
14 greeter_t* greeter;
15 VALUE my_name;
16
17 Data_Get_Struct(self, greeter_t, greeter);
18 my_name = rb_funcall(greeter->name, rb_intern("to_s"), 0);
19 names_array = RARRAY(names);
20
21 for(i = 0; i < names_array->len; i++)
22 {
23 VALUE current = names_array->ptr[i];
24 if(rb_respond_to(current, rb_intern("to_s")))
25 {
26 VALUE name = rb_funcall(current, rb_intern("to_s"), 0);
27 printf("hello %s, ", StringValuePtr(name));
28 }
29 }
30 printf("I'm %s.\n", StringValuePtr(my_name));
31
32 return Qnil;
33}
Now, since you can't give a name to your greeter yet, it will always print an empty string as its name. We should write an initialize
method to set the name to a more meaningful value. But to do that, I would like to use another feature we haven't discussed yet.
That's right, the last thing I'm going to teach you is what to do when everything goes wrong. Ruby has its raise-rescue error handling system, but in C, there is no such thing! However, it could still be useful to raise an error from you C code. There is a function in ruby.h
that does just that: rb_raise
.
But before we go into any more details, let's write our initialize
method without raising errors:
52VALUE greeter_initialize(VALUE self, VALUE name)
53{
54 greeter_t* greeter;
55 Data_Get_Struct(self, greeter_t, greeter);
56 greeter->name = name;
57 return self;
58}
Now, how could we possibly use errors here? What could possibly cause an error in our simple hello world example? We could pass an invalid value as a name, a value that cannot be converted into a string. This would not cause an error right away, but would raise one whenever the user tried to call our greet
method. Not really the behaviour you'd expect from an object that was corrupted by a constructor parameter, don't you think? It would be much better if we just raised the error once, when the object is created. Here's how we do it:
52VALUE greeter_initialize(VALUE self, VALUE name)
53{
54 greeter_t* greeter;
55
56 if(!rb_respond_to(name, rb_intern("to_s")))
57 rb_raise(rb_eArgError, "name should respond to to_s");
58
59 Data_Get_Struct(self, greeter_t, greeter);
60 greeter->name = name;
61 return self;
62}
rb_raise
takes two parameters: one that indicates what kind of error we want to raise (rb_eArgError
) and one containing an error message. Here we want to raise an ArgumentError
, because an invalid argument was being passed to our method. Other popular choices are rb_eTypeError
, rb_eNotImpError
(Not Implemented Error, my personal favorite), rb_eNameError
and rb_eIOError
. For a complete list of standard error classes, you should check ruby.h
.
Now that you have seen how easy it is to mix C and Ruby code, you can probably think of a few places where you could put your new knowledge to good use. You can now use the power of C with the flexibility of Ruby! Great! But before you do so, think at least twice about it: do you really need to use C code in your Ruby script?
Using C comes with an array of new problems: you have to get your C code to run on every platform you target. Unlike Ruby, C code is not completely portable, especially not after it has been compiled. You need to provide binaries for all those platforms, or have the end user compile your C extension himself, which is not always a good idea, or even a possibility (for instance, on Windows this would require the user to have a copy of Visual Studio 6.0). This means that you will need to spend a lot of time building and packaging your script, valuable time you could have spent working on the next version of your project!
And even if that doesn't scare you away, writing C code is something that requires a lot more care than writing Ruby code. Your Ruby code is run from inside the interpreter, and the worst thing that could happen is that you raise an error which is not caught and causes your script to exit. Big deal. In C on the other hand, errors are harder to detect and can have more serious consequences. It can crash your computer, for instance, or even worse, it can keep running after it has corrupted a portion of the process' memory.
Writing C extensions ussually takes a lot more time then writing the same functionality in plain ruby code. This is the main reason why you're using Ruby, remember? So don't be to hasty when you decide to 'optimize' your code, and think about whether you really need to.
Extending Ruby with C is both easy and fun. A lot of Ruby's spirit can still be felt throughout its C API. Using C can speed up your ruby code, or connect your script with libraries that it couldn't use before. However, writing C code still requires a lot more time and gives the developer more responsibility to ensure the correct behaviour of the application, so you shouldn't use it if you don't really need it.
This article is only an introduction to Ruby's C interface. A lot of the interface is undocumented, but it is easy to figure out how things work
most of the time. The best way to learn more about mixing C and Ruby is by using your knowledge and playing around with the API. If you need
to do something and you don't know how, try looking through ruby.h
. Another valuable resource is the source code of Ruby's core classes. They often show you how things should be done. If you can't find it there, and even the almighty Google doesn't know the
answer, you could try asking on the Ruby mailing lists or forums about software development and Ruby on the web.