The other day I was tasked with creating a list of links for a reference project. The content for the list was a folder containing files with numeric filenames like 2344567786.html.
The client wanted to be able to search through the list using his browser find feature and the numeric links did not provide much to go on for searching. So, I figured I’d construct new links using the document title as the link text which would give a clue as to the file contents. Great idea, huh?
Did I mention that there were over 1,700 document files in the folder? Seems like a good place to bring in some automation!
Here’s what I did. First create a ruby file to write your code in. I’ll call it linkJob.rb. Then we start adding code:
Dir.chdir('C:\rubyTest')
resultsfile = File.open('C:\rubyTest\results.html', "w")
These first lines of ruby code change my working directory to where my data files are located and create and open a new file to receive my results.
Dir["data/**"].each do | thisfile |
# processing of the files will go here
end
The each block is what will loop through all of the files contained in the data directory. We could specify only certain types of files in the opening line of the block instead of the ** wildcard.
tmp = File.open(thisfile)
myfile = tmp.read
We’ve inserted these two lines into the each block. It reads the entire contents of the file into our variable myfile. We want to extract the main heading from the document to use as the text for our link, so we’ll look for the contents of the h1 tag.
mysub = myfile.split("</h1>")
mysub2 = mysub[0].split("<h1>")
mytitle = mysub2[1]
OK. This is a little weird. What happens is we use the split function to break the document into an array using the closing h1 tag as the delimiter. We’re using it to discard everything after the closing tag. Then we split on the opening h1 tag and discard everything before it. What we’re left with is the tag contents!
mylink = "<li><a href=\"/data/#{thisfile}\">#{mytitle.chomp}</a></li>"
resultsfile.puts mylink
Next, we construct a list item with our link inside and assign it to the mylink variable. The chomp method removes any end-of-line-characters that may be there. The last line adds our newly created html fragment to our new html doc.
To use the process, I open my linkJob.rb file in Scite . You could use the Windows command prompt as well. In Scite, press Tools / Go and the script runs. I then open C:\rubyTest\results.html and there is the list of links!
All that is left now is to wrap the list items with the appropriate markup and we’re done.
Creating automation processes do take some time to setup and fine-tune. But in most cases, the investment of time has a pretty good return.