Bitstring binary matching is hugely powerful to parsing fixed data. It does lead to considerable repetition and depending on the data, converting different fields different ways leads to more repetition, shadowing variables and so on. For my particular situation it seemed worth the time to try a macro. In the process I developed an approach that made writing the macro easier and interactive.



First let's start with what I'm trying to alleviate:

  def line2kw(
    <<
    gtin_prefix :: binary-1,
    ean :: binary-13,
    upc :: binary-14,
    isbn10 :: binary-10,
    onhands :: binary-56, # 7*8
    onorders :: binary-56, # 7*8
    sugg_price :: binary-7,
    pub_price :: binary-7,
    fill :: binary-1,
    discount_level :: binary-3,
    fill2 :: binary-10,
    pub_status :: binary-2,
    stock_flags :: binary-8,
    publication_date :: binary-size(8),
    on_sale_date :: binary-size(8),
    returnable_indicator :: binary-size(1),
    return_date :: binary-size(8),
    filler_3 :: binary-size(5),
    backorder_only_indicator :: binary-size(1),
    media_mail_indicator :: binary-size(1),
    product_type :: binary-size(1),
    imprintable_indicator :: binary-size(1),
    indexable_indicator :: binary-size(1),
    filler_4 :: binary-size(15),
    weight :: binary-size(6), #lbs_stv
    filler_5 :: binary-size(11),
    ingram_publisher_number :: binary-size(4),
    filler_6 :: binary-size(5),
    restricted_code :: binary-size(1),
    discount_category_code :: binary-size(5),
    filler_7 :: binary-size(1),
    product_availability_code :: binary-size(2),
    ingram_title_code :: binary-size(9),
    product_classification_type :: binary-size(2),
    filler_8 :: binary
    >> ) do
    [    gtin_prefix: gtin_prefix,
         ean: ean,
         upc: upc,
         isbn10: isbn10,
         onhands: onhands,
         onorders: onorders,
         sugg_price: sugg_price,
         pub_price: pub_price,
         fill: fill,
         discount_level: discount_level,
         fill2: fill2,
         pub_status: pub_status,
         stock_flags: stock_flags,
         publication_date: publication_date,
         on_sale_date: on_sale_date,
         returnable_indicator: returnable_indicator,
         return_date: return_date,
         filler_3: filler_3,
         backorder_only_indicator: backorder_only_indicator,
         media_mail_indicator: media_mail_indicator,
         product_type: product_type,
         imprintable_indicator: imprintable_indicator,
         indexable_indicator: indexable_indicator,
         filler_4: filler_4,
         weight: weight,
         filler_5: filler_5,
         ingram_publisher_number: ingram_publisher_number,
         filler_6: filler_6,
         restricted_code: restricted_code,
         discount_category_code: discount_category_code,
         filler_7: filler_7,
         product_availability_code: product_availability_code,
         ingram_title_code: ingram_title_code,
         product_classification_type: product_classification_type,
         filler_8: filler_8
    ]
    |> Enum.map( fn { k, v } -> { k, String.trim(v) } end )
  end

So that's long and ugly and repeats names three times. I also have many files that need to be parsed this way. It also doesn't do anything for conversion of numbers, it just trims everything.

So let's start with a Macro skeleton:

defmodule FixedRead do
  defmacro defline( name, default_conversion \\ :none, columns ) do
    quote do
      def unquote(name)( unquote( { :<<>>, [], to_pat( columns ) } ) ) do
        unquote( to_kw( columns, default_conversion ) )
      end
    end
  end
end

There's one thing here that's not obvious, not part of Elixir, not simple to look up, that { :<<>>, [], to_pat( columns ) }, where did it come from? If you look back at the long code, you'll see the binary pattern matching looks like


def something( << var1 :: binary- size(4) >> ) do
  var1
end

How to figure out what this needs to look like in the macro world? It's quite simple! Let's just quote it:

iex(25)> quote do def something( << var1 :: binary- size(4) >> ), do: var1 end
{:def, [context: Elixir, import: Kernel],
 [
   {:something, [context: Elixir],
    [
      {:<<>>, [],
       [
         {:"::", [],
          [
            {:var1, [], Elixir},
            {:-, [context: Elixir, import: Kernel],
             [{:binary, [], Elixir}, {:size, [], [4]}]}
          ]}
       ]}
    ]},
   [do: {:var1, [], Elixir}]
 ]}

This is Elixir's AST. You would not want to write it, but quote will write it for you. AST is prefix notation, like Lisp the ultimate AST as a language. You can see that :<<>> is the prefix notation for the bitstring match generator special form. If you're not familiar with AST, each node is a three tuple:

  • atom
  • context
  • children

I don't consider myself an AST expert, but that's not required if you can generally understand what quote generates from what you give it. To write macro code that generates a bitstring match, just repeat this pattern.

Let's go back to FixedRead and follow the to_pat function call. If the simplified input looks like this:

columns = [
  first: 2,
  second: 4,
  third: 1
]

Then to_pat is just going to be a map:

  defp to_pat( columns ) do
    Enum.map( columns, &pate/1 )
  end

So on a per tuple basis, what does pate need to emit:

  defp pate( { k, s } ) do
    {:"::", [],
     [
       {k, [], Elixir},
       {:-, [context: Elixir, import: Kernel],
        [{:binary, [], Elixir}, {:size, [], [s]}]}
     ]}
  end

Look back at the AST from somthing, you'll see that this exact pattern is there. The difference is that instead of using the literal :var1 and 4 this code is using the variables k and s, which are coming from the column definition keyword list we gave.

This is an incomplete example of how to use quote generated AST to build templates for Macros. But the approach is complete and that means you can (if you want) finish this macro to do fix length decoding. You have the tools now.