Mateusz Loskot :: hacking on, working out, living up

Moving posts from WordPress to Hakyll

06 Jan 2013 | mloskot

In my Powered by Hakyll announcement, I promised to explain the simple procedure I used to migrate my blog posts from WordPress to Markdown format for use with Hakyll.

Shortly after I learned about Hakyll and decided to replace my WordPress-based website with it, I found the Hakyll discussion group with old thread on Converting from WordPress to Hakyll?. I bumped it up and discussed a few possible options.

I eventually decided to use exitwp utility written in Python. There are a few reasons why I decided to use the exitwp:

The exitwp is simple to use and the few steps outlined in its README just work.

However, I had to solve a few problems encoutered. They were mostly related to some particular features and plugins I used with WordPress or to my idea of the final migration output generated.

First, I had to deal with the WordPress syntax highlighting tags [text], [sourcecode] and its language-specific variants, I simply configured the exitwp to replace it with <pre> tag. Without this conversion, exitwp was exiting prematurely throwing a translation error. This can be handled using body_replace property in YAML configuration file used by exitwp.

Next, I noticed that during translation some titles were wrapped with extra apostrophe. I haven’t debugged it deeper, but it seems the html2text engine used by the exitwp adds extra apostrophe to titles consisting of some punctuation marks like colon or hash. So, I modified exitwp adding these lines:

s_title = s_title.replace(' ', '_')
s_title = s_title.strip(' \t\n\r\'')

I also had to modify the Markdown metadata output to:

Finally, I couldn’t decide about how I want to structure posts in my new website and what URLs I want to generate, so I decided to update exitwp with support of two types of build output:

After some experiments, I went for the second option. Later, I converted my blog structure further and moved posts in folders using a new Bash script.

I forked exitwp on GitHub and created dedicated branch to maintain all the modifications I’ve made. Here is my Git repository with the exitwp-hakyll branch with my version of the exitwp scripts, a bit of documentation in README-hakyll.markdown and Hakyll-specific configuration in config-hakyll.yaml.

It’s worth to note, that my fork of the exitwp does not deliver any major difference in the conversion workflow or functionality. The changes are mostly minor fixes and custom cosmetic tweaks, helpful though.

Fork me on GitHub