An interactive approach to writing Elixir macros
Bitstring binary matching is hugely powerful to parsing fixed data. It does lead to considerable repetition and depending on the data, converting different fields different ways leads to more repetition, shadowing variables and so on. For my particular situation it seemed worth the time to try a macro. In the process I developed an approach that made writing the macro easier and interactive.
First let's start with what I'm trying to alleviate:
def line2kw(
<<
gtin_prefix :: binary-1,
ean :: binary-13,
upc :: binary-14,
isbn10 :: binary-10,
onhands :: binary-56, # 7*8
onorders :: binary-56, # 7*8
sugg_price :: binary-7,
pub_price :: binary-7,
fill :: binary-1,
discount_level :: binary-3,
fill2 :: binary-10,
pub_status :: binary-2,
stock_flags :: binary-8,
publication_date :: binary-size(8),
on_sale_date :: binary-size(8),
returnable_indicator :: binary-size(1),
return_date :: binary-size(8),
filler_3 :: binary-size(5),
backorder_only_indicator :: binary-size(1),
media_mail_indicator :: binary-size(1),
product_type :: binary-size(1),
imprintable_indicator :: binary-size(1),
indexable_indicator :: binary-size(1),
filler_4 :: binary-size(15),
weight :: binary-size(6), #lbs_stv
filler_5 :: binary-size(11),
ingram_publisher_number :: binary-size(4),
filler_6 :: binary-size(5),
restricted_code :: binary-size(1),
discount_category_code :: binary-size(5),
filler_7 :: binary-size(1),
product_availability_code :: binary-size(2),
ingram_title_code :: binary-size(9),
product_classification_type :: binary-size(2),
filler_8 :: binary
>> ) do
[ gtin_prefix: gtin_prefix,
ean: ean,
upc: upc,
isbn10: isbn10,
onhands: onhands,
onorders: onorders,
sugg_price: sugg_price,
pub_price: pub_price,
fill: fill,
discount_level: discount_level,
fill2: fill2,
pub_status: pub_status,
stock_flags: stock_flags,
publication_date: publication_date,
on_sale_date: on_sale_date,
returnable_indicator: returnable_indicator,
return_date: return_date,
filler_3: filler_3,
backorder_only_indicator: backorder_only_indicator,
media_mail_indicator: media_mail_indicator,
product_type: product_type,
imprintable_indicator: imprintable_indicator,
indexable_indicator: indexable_indicator,
filler_4: filler_4,
weight: weight,
filler_5: filler_5,
ingram_publisher_number: ingram_publisher_number,
filler_6: filler_6,
restricted_code: restricted_code,
discount_category_code: discount_category_code,
filler_7: filler_7,
product_availability_code: product_availability_code,
ingram_title_code: ingram_title_code,
product_classification_type: product_classification_type,
filler_8: filler_8
]
|> Enum.map( fn { k, v } -> { k, String.trim(v) } end )
end
So that's long and ugly and repeats names three times. I also have many files that need to be parsed this way. It also doesn't do anything for conversion of numbers, it just trims everything.
So let's start with a Macro skeleton:
defmodule FixedRead do
defmacro defline( name, default_conversion \\ :none, columns ) do
quote do
def unquote(name)( unquote( { :<<>>, [], to_pat( columns ) } ) ) do
unquote( to_kw( columns, default_conversion ) )
end
end
end
end
There's one thing here that's not obvious, not part of Elixir, not simple to look up, that { :<<>>, [], to_pat( columns ) }
, where did it come from? If you look back at the long code, you'll see the binary pattern matching looks like
def something( << var1 :: binary- size(4) >> ) do
var1
end
How to figure out what this needs to look like in the macro world? It's quite simple! Let's just quote it:
iex(25)> quote do def something( << var1 :: binary- size(4) >> ), do: var1 end {:def, [context: Elixir, import: Kernel], [ {:something, [context: Elixir], [ {:<<>>, [], [ {:"::", [], [ {:var1, [], Elixir}, {:-, [context: Elixir, import: Kernel], [{:binary, [], Elixir}, {:size, [], [4]}]} ]} ]} ]}, [do: {:var1, [], Elixir}] ]}
This is Elixir's AST. You would not want to write it, but quote will write it for you. AST is prefix notation, like Lisp the ultimate AST as a language. You can see that :<<>>
is the prefix notation for the bitstring match generator special form. If you're not familiar with AST, each node is a three tuple:
- atom
- context
- children
I don't consider myself an AST expert, but that's not required if you can generally understand what quote
generates from what you give it. To write macro code that generates a bitstring match, just repeat this pattern.
Let's go back to FixedRead and follow the to_pat
function call. If the simplified input looks like this:
columns = [ first: 2, second: 4, third: 1 ]
Then to_pat
is just going to be a map:
defp to_pat( columns ) do Enum.map( columns, &pate/1 ) end
So on a per tuple basis, what does pate
need to emit:
defp pate( { k, s } ) do {:"::", [], [ {k, [], Elixir}, {:-, [context: Elixir, import: Kernel], [{:binary, [], Elixir}, {:size, [], [s]}]} ]} end
Look back at the AST from somthing
, you'll see that this exact pattern is there. The difference is that instead of using the literal :var1
and 4
this code is using the variables k
and s
, which are coming from the column definition keyword list we gave.
This is an incomplete example of how to use quote
generated AST to build templates for Macros. But the approach is complete and that means you can (if you want) finish this macro to do fix length decoding. You have the tools now.