Importing Blogger to Jekyll, Part 2
While waiting on PVP queues to pop in World of Warcraft, I decided to muck around with importing my blog again, and ended up closing the game because this was a far more interesting problem. I had started out with this script to import from blogger’s “export” feature, but had to make a few changes. First of all, Jekyll was choking on several page titles because of various invalid characters. Replacing these with HTML entities cleared up most of it, except for one entry that had a backslash in it’s title.
Next I decided to try editing the script to add the geotagging data in a sensible format that the mapping plugin can use. This is probably an attrocious hack, but it’s my first time ever poking around with Ruby and I was mostly too lazy to go look up what any of the operators mean:
--- import.rb.old 2014-04-11 22:12:40.000000000 +1000
+++ import.rb 2014-04-25 17:24:22.000000000 +1000
@@ -3,6 +3,7 @@
require 'fileutils'
require 'date'
require 'uri'
+require 'htmlentities'
# usage: ruby import.rb my-blog.xml
# my-blog.xml is a file from Settings -> Basic -> Export in blogger.
@@ -39,9 +40,9 @@
end
def write(post, path='_posts')
- puts "Post [#{post.title}] has #{post.comments.count} comments"
+# puts "Post [#{post.title}] has #{post.comments.count} comments"
- puts "writing #{post.file_name}"
+# puts "writing #{post.file_name}"
File.open(File.join(path, post.file_name), 'w') do |file|
file.write post.header
file.write "\n\n"
@@ -81,12 +82,27 @@
end
def title
- @title ||= @node.at_css('title').content
+ @title ||= HTMLEntities.new.encode(@node.at_css('title').content.gsub('\\', '\'))
end
def content
@content ||= @node.at_css('content').content
end
+
+ def location
+ out = ''
+ point ||= @node.css('georss|point')[0]
+ unless point.nil?
+ out = out + "mapping:\n latitude: " + point.content.split[0]
+ out = out + "\n longitude: " + point.content.split[1] + "\n"
+
+ loc ||= @node.css('georss|featurename')[0]
+ unless loc.nil?
+ out = out + 'location: ' + loc.content
+ end
+ end
+ @location ||= out
+ end
def creation_date
@creation_date ||= creation_datetime.strftime("%Y-%m-%d")
@@ -122,6 +138,7 @@
%{title: "#{title}"},
%{date: #{creation_datetime}},
%{comments: false},
+ location,
categories,
'---'
].compact.join("\n")
It works thus far. The plugins I’ve set up so far, to get it to almost-blogger-functionality:
- Jekyll Mapping
- generate_sitemap (the one linked on the Jekyll plugins page is broken)
- Archive Generator (linked from Jekyll plugins page)
I’ve still got a bit more to do - I need to set up category pages, and then a bunch of template work (I will effectively have to throw away my very simple site layout because Blogger’s template engine is an absolute mess). There’s a few other things I’d like to do simply because I never could with blogger, like a better calendar view, having the day’s entries listed in chronological order, and so on.
It probably won’t get finished today…