Using Pandoc to Convert html Files

A few days ago, I wrote about batch converting video files using ffmpeg. A few days later, I faced a similar problem of needing to convert a directory of .html files. “Need” is perhaps too strong of a word. I was experimenting with how to save pages from a PBworks wiki.

PBworks allows the user the download a .zip file of all of the pages from a wiki.[1] My downloaded backup contained 44 .html files, many of which were nested into subfolders. Instead of figuring out to recursively loop thought the subfolders, I used a find command, which searches subfolders by default. In my script below, the find command is inserted using command substitution. The converted files are saved to the original subdirectory, keeping .html in the filename, but adding .md as the file extension.

I tried out two scripts to do the text conversion. First, I tried html2text, which worked great. Out of curiosity, I also tried using Pandoc. I ended up preferring how Pandoc formatted the final Markdown text. However, one feature of html2text I liked was the option to use --ignore-links, since most of the links were relative to the PBworks domain and would be broken when used offline. I decided it might be useful to see where the original link pointed to, so I decided to skip the --ignore-link option.

Here is the script I created:

 1  #!/bin/bash
 3  # Usage: html2md /path/to/file
 5  # Set $IFS so that filenames with spaces don't break the loop
 7  IFS=$(echo -en "\n\b")
 9  # Loop through path provided as argument
10  for x in $(find $@ -name '*.html')
11  do
12      pandoc -f html -t markdown -o $ $x
13  done
15  # Restore original $IFS

Line 6 is necessary so that the script will work with filenames that contain spaces. The trick, as suggested in a Linux forum, is to set the internal field separator not to use spaces.[2]

  1. For a paid account, PBworks allows the user to download all pages, past revisions and files, but I was using a free account.  ↩

  2. A discussion at Stack Overflow suggests a similar fix using IFS=$'\n', but I found I still needed \b at the end for my script to work.  ↩

Using ffmpeg to Convert Video Files

Recently, my wife had students in her library create Photostory projects. This wasn’t her first choice of applications for a student project, but the Mac lab was in use for testing. Photostory outputs .wmv files, but my wife wanted to be able to merge the files using iMovie so that teachers could cue up one movie on their classroom presentation stations, which are Macs.

My wife thought she would need to use a service such as Zamzar to convert the files from .wmv into a format that iMovie could import, which seemed like a tedious, impractical task. I thought that perhaps ffmpeg, a command line tool, could help.

I found a Stack Exchange article that suggested this syntax to convert .wmv to mp4:

ffmpeg -i input.wmv -c:v libx264 -crf 23 -c:a libfaac -q:a 100 output.mp4  

I then created the following script to automate the process:


for x in $@
    ffmpeg -i $x -c:v libx264 -crf 23 -c:a libfaac -q:a 100 ${x}.mp4

Thus, using wmv-convert * would loop through all of the files in a directory, converting all .wmv files to .mp4, while keeping the same base filenames.

Each file took several minutes to convert, but I was able run the loop during dinner. Then my wife was able to merge the files using iMovie later that evening.

Recommended Reading: How to Count

Image of the book cover for How to Count
The cover image for "How to Count". The hand is showing the number 28 in binary.

Steven Frank, the co-founder of Panic Software, creators of Transmit, my favorite ftp program, recently published an ebook titled How to Count: Programming for Mere Mortals, Vol. 1. I highly recommend buying a copy as unprotected pdf or epub for ¢299. While there are many introductory articles available for free that explain how to count in binary, this book quickly moves on to more advanced topics such as hexadecimal, signed integers and floats.

Converting Flash Animations with Swiffy

Earlier in my teaching career, I experimented with using Flash to create animations for my students. Now that I’m an avid iOS user, I’ve given up on that platform. I have hours invested into projects that I can’t run on my phone or iPad, but fortunately Google has created a utility named Swiffy that will convert .swf files into html5 and javascript that will run in modern web browsers. Below are two projects I originally created in 2004 that I just converted using Swiffy.

Title screen of the Bering Land Bridge Theory animation
Click image to play the Bering Land Bridge Theory animation.


Title page for the Triangular Trade animation
Click image to play the interactive Triangular Trade animation.