Using Pandoc to Convert html Files

A few days ago, I wrote about batch converting video files using ffmpeg. A few days later, I faced a similar problem of needing to convert a directory of .html files. “Need” is perhaps too strong of a word. I was experimenting with how to save pages from a PBworks wiki.

PBworks allows the user the download a .zip file of all of the pages from a wiki.[1] My downloaded backup contained 44 .html files, many of which were nested into subfolders. Instead of figuring out to recursively loop thought the subfolders, I used a find command, which searches subfolders by default. In my script below, the find command is inserted using command substitution. The converted files are saved to the original subdirectory, keeping .html in the filename, but adding .md as the file extension.

I tried out two scripts to do the text conversion. First, I tried html2text, which worked great. Out of curiosity, I also tried using Pandoc. I ended up preferring how Pandoc formatted the final Markdown text. However, one feature of html2text I liked was the option to use --ignore-links, since most of the links were relative to the PBworks domain and would be broken when used offline. I decided it might be useful to see where the original link pointed to, so I decided to skip the --ignore-link option.

Here is the script I created:

 1  #!/bin/bash
 2  
 3  # Usage: html2md /path/to/file
 4  
 5  # Set $IFS so that filenames with spaces don't break the loop
 6  SAVEIFS=$IFS
 7  IFS=$(echo -en "\n\b")
 8  
 9  # Loop through path provided as argument
10  for x in $(find $@ -name '*.html')
11  do
12      pandoc -f html -t markdown -o $x.md $x
13  done
14  
15  # Restore original $IFS
16  IFS=$SAVEIFS  

Line 6 is necessary so that the script will work with filenames that contain spaces. The trick, as suggested in a Linux forum, is to set the internal field separator not to use spaces.[2]


  1. For a paid account, PBworks allows the user to download all pages, past revisions and files, but I was using a free account.  ↩

  2. A discussion at Stack Overflow suggests a similar fix using IFS=$'\n', but I found I still needed \b at the end for my script to work.  ↩

Posted in Articles | Tagged , | Comments Off

Using ffmpeg to Convert Video Files

Recently, my wife had students in her library create Photostory projects. This wasn’t her first choice of applications for a student project, but the Mac lab was in use for testing. Photostory outputs .wmv files, but my wife wanted to be able to merge the files using iMovie so that teachers could cue up one movie on their classroom presentation stations, which are Macs.

My wife thought she would need to use a service such as Zamzar to convert the files from .wmv into a format that iMovie could import, which seemed like a tedious, impractical task. I thought that perhaps ffmpeg, a command line tool, could help.

I found a Stack Exchange article that suggested this syntax to convert .wmv to mp4:

ffmpeg -i input.wmv -c:v libx264 -crf 23 -c:a libfaac -q:a 100 output.mp4  

I then created the following script to automate the process:

#!/bin/bash

for x in $@
do
    ffmpeg -i $x -c:v libx264 -crf 23 -c:a libfaac -q:a 100 ${x}.mp4
done  

Thus, using wmv-convert * would loop through all of the files in a directory, converting all .wmv files to .mp4, while keeping the same base filenames.

Each file took several minutes to convert, but I was able run the loop during dinner. Then my wife was able to merge the files using iMovie later that evening.

Posted in Articles | Tagged , | Comments Off

Simple English Wikipedia

Hacker News has a discussion about the Simple English Wikipedia, which uses easier vocabulary and less complex sentence constructions than the regular English Wikipedia. This is great for mid-level readers and this year I’m going to start directing students to use this version during research projects.

Posted in Links | Tagged | Comments Off

Keyboard-Type Input

In a recent interview with the The Chronicle of Higher Education1, Bill Gates was asked how tablet computers can make a difference in education and responded:

Just giving people devices has a really horrible track record. You really have to change the curriculum and the teacher. And it’s never going to work on a device where you don’t have a keyboard-type input. Students aren’t there just to read things. They’re actually supposed to be able to write and communicate. And so it’s going to be more in the PC realm—it’s going to be a low-cost PC that lets them be highly interactive.

I agree with the first part of his statement because the curriculum and pedogy need to change to make any technology worthwhile. However, he is wrong about the need for keyboards. My middle school students have done complex work on our iPod Touches, such as creating documents that use desktop publishing skills involving typing, creating charts and inserting images. Due to Apple’s intuitive software design, students are quickly able to get past the lack of hardware keyboard. Moreover, they don’t necessarily see the lack of keyboard of as a limitation. Since a third of their lives have been dominated by touch devices, they don’t see keyboards as a prerequisite for using a computer.

Posted in Articles | Tagged | Comments Off

Technology and Standardized Tests in the New York Times

The New York Times published a lengthy, must-read article examining how investments in technology by school districts have not resulted in higher test scores. The article focuses on the Kyrene School District in Arizona, which has considerable investments in technology but stagnated test scores when compared to Arizona as a whole.

This finding doesn’t surprise me, but I also don’t think it is a problem. I have long thought that standardized tests focus on a very narrow range of skills, not including 21st century skills such as applying technology to solve problems. Thus, technology use in schools has great value beyond teaching just basic skills. The article addresses this as well, quoting Karen Cator, Director of the Office of Educational Technology:

“In places where we’ve had a large implementing of technology and scores are flat, I see that as great,” she said. “Test scores are the same, but look at all the other things students are doing: learning to use the Internet to research, learning to organize their work, learning to use professional writing tools, learning to collaborate with others.”

The article also explores other factors affecting student performance in fair detail, such as economic factors and the difficulty of doing long-term studies of student performance related to technology.

Posted in Links | Comments Off

Recommended Reading: How to Count

Image of the book cover for How to Count

The cover image for "How to Count". The hand is showing the number 28 in binary.

Steven Frank, the co-founder of Panic Software, creators of Transmit, my favorite ftp program, recently published an ebook titled How to Count: Programming for Mere Mortals, Vol. 1. I highly recommend buying a copy as unprotected pdf or epub for ¢299. While there are many introductory articles available for free that explain how to count in binary, this book quickly moves on to more advanced topics such as hexadecimal, signed integers and floats.

Posted in Links | Tagged , | Comments Off

Converting Flash Animations with Swiffy

Earlier in my teaching career, I experimented with using Flash to create animations for my students. Now that I’m an avid iOS user, I’ve given up on that platform. I have hours invested into projects that I can’t run on my phone or iPad, but fortunately Google has created a utility named Swiffy that will convert .swf files into html5 and javascript that will run in modern web browsers. Below are two projects I originally created in 2004 that I just converted using Swiffy.

Title screen of the Bering Land Bridge Theory animation

Click image to play the Bering Land Bridge Theory animation.

 

Title page for the Triangular Trade animation

Click image to play the interactive Triangular Trade animation.

Posted in Articles | Tagged | Comments Off

iPads in Kindergarten

The Seattle Times recently ran a story about a school in Maine that is supplying iPads in kindergarten classes. There is little mention about how they will be used other than references to “apps for phonics, building words, letter recognition and letter formation”. I hope they end up pushing the capabilities of the devices far beyond these uses.

Posted in Links | Tagged | Comments Off

When Technology Gets Out of the Way…

I love this new iPad 2 commercial. From my experience with iPods in the classroom, getting technology out of the way increases time on educational tasks. My students don’t spend ten minutes watching the computers login, then another 30 seconds for an application to launch. Instead, they get right to work with a device that is barely noticeable in a room full of learning.

Posted in Articles | Tagged | Comments Off

Passing the Laura Ingalls Test

Writing for Slate.com, Linda Perlstein recently proposed the Laura Ingalls test. Imagine if this prairie girl were to time travel to the present day and consider how she would respond to modern-day technology. If you brought her to an Apple Store or handed her a cell phone, she wouldn’t know what to make of it. Yet, if you brought her to the nearest 5th grade classroom, she would immediately recognize it as a school, something nearly unchanged from her time. Perlstein then asks her readers to describe the ideal modern-day classroom. Their ideas are recorded as comments to her post.

Continue reading

Posted in Articles | Tagged | Comments Off