en_dmitriid


Tigers, and lions, and bears, oh my!


Previous Entry Share Next Entry
Erlang Bit Syntax
happy
dmitriid wrote in en_dmitriid
I've decided to see whether all is as well with Erlang's bit syntax as advertised..

A friend of mine has been struggling with SOny's OMA format which is being used in Sony's players. The only feature this format has that distinguishes it from "normal" formats is its lack of documentation and whatever documentation there is differs significantly from whatever actual files have to offer :)

There are some good folk out there and they've reverse engineered the format and wrote a Java program to maniplate it. The program is available here: http://dmitriid.com/files/projects/erlang/OMA.zip. The program is accompanied by documentation on Sony players' directory structure and, what's more important, on OM file format as well.

You can see the OMA header format in this file: http://dmitriid.com/files/projects/erlang/OMA.html. It doesn't look to horrible, does it? Let's see how Erlang handles it.

The first obstacle comes from the sample file that comes with the application (trunk/dist/OMGAUDIO/10F00/10000001.OMA). The spec says it has to start with
"E" "A" "3" 3 0 0 0 0 17  76 "T" "I" "T" "2" 0 0
You wish. In reality it starts with
"e" "a" "3" 3 0 0 0 0 17 "v" "G" "E" "O" "B" 0 0
And there's lots of info before we reach TIT2. Oh well, hex-editor to the rescue. The files does contain the tags shown in the spec. However, they are placed differently and there's a bunch of other, unknown tags. Whatever shall we do? I propose moving byte by byte until we reach tags that we know of or the end of the header.

I'll tell you from the start that I cheated. I don't read the codec info from the header, I stop as soon as I reach
"E" "A" "3" 2 0 60 ff ff
So, the code

Open the file and read data from it
parse_header(File) ->
    case file:open(File, [read, binary, raw]) of
        {ok, S} ->
            {ok, Header} = file:pread(S, 0, 16#0c60),
            H = case read_header(Header) of
                error ->
                    {error, invalid_header};
                Data ->
                    Data
            end,
            file:close(S),
            H;
        _ ->
            {error, file_cannot_be_opened}
    end.
Well, that was easy. Open the file. Read it. Call internal read_header function to actuall parse the header and return the value that we receive from the function.

The read_header function couldn't be simpler
read_header(<<$e, $a, $3, Rest/binary>>) ->
    Data = decode_header(Rest, []),
    Data;

read_header(_) ->
    error.
It couldn't be simpler because of the pattern-match. If the chunk of data that comes in starts with
"e" "a" "3"
it's passed to the upper function. Otherwise, the lower function is called.

The actual parsing is contained in several decode_header functions which parse a single tag at a time. Here' an example:
% Title
decode_header(<<$T, $I, $T, $2, _:2/binary,
      Var:2/integer-unit:8, _:4/binary,
      Rest/binary>>, L) ->

        TitleLength = Var - 2,
        <<Title:TitleLength/binary, Rest2/binary>> = Rest,
        decode_header(Rest2, [L|{title, Title}]);
What happens here? Keep the spec before your eyes.

Here we go. The song title is stored as follows
T I T 2 0 0 Var1 Var1 0 0 2 TitleString
where TitleString is [0x0 String]. That's exactly what we've specified in the pattern:
T        $T
I        $I
T        $T
2        $2
0        _:2/binary            simply skip the two zeroes
0
Var1     Var:2/integer-unit:8  a number containd in two bytes
Var1
0        _:4/binary            simply skip 0 0 2 0
0
2
0
TitleString Rest/binary
All we have to do now is get the title from Rest. Empirically I've guessed that the length of this title in our case is Var - 2. Having decided on that, we can get the title using pattern matching, again:
        TitleLength = Var - 2,
        <<Title:TitleLength/binary, Rest2/binary>> = Rest,
This way all we have to do is to parse all the remaining tags according to the rules and test them against a real file. The real file may give us a headache because it doesn't confirm to the spec (albeit informal) and adds several new and unknown tags. That is why we need to augment decode_header with:
decode_header(<<>>, L) ->
    L;

decode_header(Bin, L) ->
    {_, NewBin} = split_binary(Bin, 1),
    decode_header(NewBin, L).
We move forward byte by byte. If we match a known tag, it is caught by the corresponding decode_header. If we've reached the end of the header, this will be natched by decode_header(<<>>, L). Otherwise we move forward by one more byte and repeat the steps nce again.

The entire file is available here: http://erltag.googlecode.com/svn/trunk/src/oma.erl. If you want to practice, download this file: http://erltag.googlecode.com/files/erltag-release.zip(6.1 MB). You can find the sample audiofile in the test folder.

By the way. The code doesn't pretend to be the most effecient and correct way of writing Erlang code. Most likely the code is rather opposite of that :)
Tags:

  • 1

Synatx = Syntax

(Anonymous)
cool, but too much error handling in parse_header and read_header. code for great success!

Re: Synatx = Syntax

Thank you!

As I said, this code is quite opposite of what real code should do :)

  • 1
?

Log in