Scribus & GNU FriBidi

NOTE: Please do not use this as a reference or tutorial for fribidi, it contains incorrect information, I’m keeping it unchanged just for the historical record. Some stuff here is just plain wrong.

In my previous post I talked briefly about scribus’ problem.

The result I got so far depended solely on GNU FriBidi. No HarfBuzz yet.

It’s true that HarfBuzz is the library to use for text layout, but:

http://mces.blogspot.com/2009/11/pango-vs-harfbuzz.html

HarfBuzz only does shaping (….) [it] doesn’t provide:

  • An itemizer
  • A Unicode Bidirection Algorithm implementation
  • A Unicode Line Breaking implementation
  • Glyph rasterization
  • Glyph metrics information
  • etc

So it will get us the shaping and stuff, but not the bidi ordering and line breaking.

The GNU FriBidi API is quite simple, though not in an obvious way; at least if you’re studying it for the first time without prior exposure to the bidi issue and the unicode bidirectional algorithm.

The “core” of the api is the get_embedding_levels function. Embedding levels are used to determine directional runs.

Here’s the setup code I used (roughly):

    embeddingLevels = new FriBidiLevel[inputLength];
    FriBidiCharType *bidi_types = new FriBidiCharType[inputLength];
    fribidi_get_bidi_types (inputString, inputLength, bidi_types);
    baseDir = fribidi_get_par_direction(bidi_types, inputLength);
    FriBidiLevel ok = fribidi_get_par_embedding_levels(bidi_types, inputLength, &baseDir, embeddingLevels);

I’m not entirely sure if calling fribidi_get_par_direction is actually needed, but besides that, the embedding levels allow you to determine if a certain character is part of an RTL or LTR run. If the embeddign level is even, then it’s part of an LTR run, else if it’s odd then it’s part of an RTL run.

    /**
        Does character at index have an RTL embedding level?
     */
    bool BidiInfo::isRtlEmbedding(int index)
    {
        return embeddingLevels[index] % 2 == 1; // odd embedding levels are part of an RTL run
    }

Then we want to get ranges for runs, so we just scan the text until the run changes, and we have the start and end of a run. The way I did that is simple: nextRun(start, limit) searched for the start of the next run, starting the search from start and ending it at limit. The usage is intended to be something like this:

    start = 0
    end = start
    while(start < length):
        end = nextRun(start, length)
        // (start, end) is now a run, do something with it
        start = end

With that, we check if the run is RTL, and if so, we reverse the characters in that run to get a bidirectional display of the text. The way I did the reversing is a bit too much of a detail to be included here.

I only do this stuff after the textframe layout method has done its work, and that’s for a good reason: we have to do the reordering on a per-line basis, otherwise you get problems. And so we need to find out where lines start and end, and so what I did was “watch” the layout process as it happens, and whenever we spot a new line occuring, we record it; in other words, I injected some code everywhere I saw code that handles line breaks, which was about 4 places. This resulted of course in some duplicate code, but I tried to keep to a minimum: 1 line.