Hacked By Imam

Hacked By Imam with love

One-liner for a URL’s Last-Modified header as Date

I’m working on a project where I want to look at the last modified date of a URL for comparative purposes, and decided to come up with a brief method of fetching the headers and parsing it as a java.util.Date.

As with many HTTP-related operations, the go-to project here was the awesome Dispatch project, which is backed by the trusty (and proven) Apache HttpClient but throws on all sorts of Scala-y goodness though a nifty DSL of verbs for executors, requests, and handlers that wrap a number of common HTTP operations.

Without further ado, the “one-liner” (plus a couple of imports and a bit of line wrapping for less scrolly): (Note: See edit at bottom for enhanced version)

import dispatch._
import java.text.SimpleDateFormat._
Http(:/("dispatch.databinder.net").HEAD >:> {h => 
  (new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz"))
                .parse(h("Last-Modified").head)
})

If this gives you what you’re looking for, you can stop here, but for educational purposes let’s break it down further.

The Http(...) bit defines a Dispatch executor. In this case, I’m using the singleton executor that blocks the current thread and can be safely shared across threads. Other types of executors can be used via direct constructors that offer options including executing on background threads or using NIO.

Within the executor, we essentially define a request and it’s handler. The request portion is :/("dispatch.databinder.net").HEAD, where /: is a verb used to define the host, one of many ways to specify the URL. The HEAD verb modifies the request so that we’re only requesting the HEAD elements rather than a conventional GET request. Other verbs are available to, among other things, add path elements, append query strings, or POST data to the URL.

After this we have our handler, in this case the most verbose portion of the code: >:> {h => (new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz")).parse(h("Last-Modified").head)}. The first portion of this is the handler verb, >:>, which processes the header data as a Map passed to the function block that follows. Within this block, we hardcode a java.text.SimpleDateFormat that we know can parse out the date format used for “Last-Modified” dates–if this code was going to be run repeatedly we’d obviously define the date format object once and reuse it (see also edit at bottom for version using DateUtils). The h("Last-Modified").head actually pulls the value out of the Map of headers. The .head is required because the values portion of the Map actually contain sets of String to accomodate headers with multiple values, e.g. “Content-Type”. Since we know that we only will get a single “Last-Modified” date, we can confidently only care about the first (and in this case only) element of the Set.

Technically the return of this is a dispatch.Http.HttpPackage[Date], but this can be interacted with directly as if it were a java.util.Date

This could easily be adapted to process other header values at the whim of the coder, and all with much less effort than coding directly against HttpClient.

Edit:
The HttpClient library offers a DateUtils library that is able to parse various date formats that appear in the Last-Modified header. This is preferable to the method used above because (a) it accounts for different localities that format the date string differently and (b) SimpleDateFormat isn’t thread-safe so if you’re reusing it from multiple threads you will need to synchronize on it. Plus it’s less verbose. Given that, our one-liner can be rewritten as:

import dispatch._
import org.apache.http.impl.cookie.DateUtils
Http(:/("dispatch.databinder.net").HEAD >:> {h => 
  DateUtils.parseDate(h("Last-Modified").head)
})

Configuring Scala compile options in sbt 0.10.x versus previous versions

Upgrading a few existing sbt-based projects from 0.7.7 to 0.10.0 was relatively straightforward, but came with a few hurdles. One of these was the enabling of Scala compiler options such as -unchecked and -deprecation. Previously these could be declared by creating (or editing) a project definition file in project/build, something like the following:

class MyProject(info: ProjectInfo) extends DefaultProject(info) {
  override def compileOptions = super.compileOptions ++ Seq(Unchecked, Deprecation)
}

This method no longer works in 0.10.x, where instead compiler options are declared in the build.sbt file of the project root (along with many other properties formerly defined in the project definition) using the scalacoptions property:

scalacOptions ++= Seq("-unchecked", "-deprecation")

This configuration parameter is of course well-documented on the sbt github page, but the change in names hung me up a bit. On the positive side, the new convention allows arbitrary compiler options more succinctly than previously, where non-built-in compiler options required use of the compileOptions(options: String*) method.

Lightweight type erasure matching

Scala’s pattern matching is much touted as a killer feature, and generally easy and convenient to use, but due to type erasure one needs to be somewhat careful with typed collections.

Check out the following bit of code:

scala> List("You say goodbye", "I say hello") match {
     |   case list: List[String] => list foreach println
     | }
warning: there were unchecked warnings; re-run with -unchecked for details
You say goodbye
I say hello

It’s tempting to say, “this worked,” and move along, but consider the following similar code where we instead provide a list of Ints and add a matcher for that type:

scala> List(27, 35, 82) match {
     |   case strList: List[String] => println("List of Strings")
     |   case intList: List[Int] => println("List of Ints")
     | }
warning: there were unchecked warnings; re-run with -unchecked for details
List of Strings

Damn, that’s not what we wanted at all. I guess we should see what that “unchecked warning” message is all about by restarting the interpreter with that option…


scala> List(27, 35, 82) match {                                                  
     |   case strList: List[String] => println("List of Strings")
     |   case intList: List[Int] => println("List of Ints")
     | }
<console>:7: warning: non variable type-argument String in type pattern List[String] is unchecked since it is eliminated by erasure
         case strList: List[String] => println("List of Strings")
                       ^
<console>:8: warning: non variable type-argument Int in type pattern List[Int] is unchecked since it is eliminated by erasure
         case intList: List[Int] => println("List of Ints")
                       ^
List of Strings

Aha! Type erasure! You see, Scala is forced to do type erasure due to the fact that the Java Virtual Machine (JVM) isn’t aware of generics, which underlie typed collections such as the Lists we’re looking at here. As such, any type of List will match the “List[String]” case and we never reach the expected case for Int lists.

This specific topic is covered in a post over on Stack Overflow, with several excellent responses there as well as links to two workarounds.

However, my immediate needs in this area were limited in scope, only affecting a few lines of code and occurring within a single source file. I essentially wanted to send a List[String] to an Actor and have it handle this appropriately in a match block. I was reluctant to rely on a method that was termed “obscure” and “experimental”, and moreover wanted something lightweight that didn’t add a lot of code. Besides, it doesn’t seem like something like this should have to tap into the reflection libraries.

What I came up with was essentially wrapping List[String] inside of a single-argument case class, which could then be passed around and matched against:

case class StringListHolder(list:List[String])

So back to the original example, rewritten:

scala> StringListHolder(List("You say goodbye","I say hello")) match {
     |   case holder: StringListHolder => holder.list foreach println
     | }
You say goodbye
I say hello

Exactly the output we want, no compiler warnings, and a class that by definition can’t hold anything other than a List of Strings. Sometimes it pays to keep it simple.

Down-casing a String using implicit conversions

The Scala RichString class provides some nice hooks for string manipulation beyond those provided by java.lang.String, including a capitalize method that returns a Java String with the first letter up-cased. For whatever, reason, an “uncapitalize” equivalent is not defined to downcase the first character. Using implicit conversions, one can relatively straightforwardly add such a method that can be executed against a Java String.

First define the following implicit def, which handles null/empty String values and otherwise downcases the first letter. I put this in a StringConversions object which can be imported for use in various other areas of code.

object StringConversions {
  implicit def string2Lower(ref: String) = new {
    def uncapitalize(): String = {
      if(ref == null) null
      else if(ref.length == 0) ""
      else {
        val chars = ref.toCharArray
        chars(0) = chars(0).toLower
        new String(chars)
      }
    }
  }
}

To use this, we can simply do a wildcard import of StringConversions and very compactly downcase any String.

scala> import StringConversions._    
import StringConversions._

scala> println("Hello!".uncapitalize)
hello!

Following a similar pattern, we could easily add additional implicit def operations to StringConversions, creating a whole suite of additional functionality.

Getting IP address, MAC address, and subnet mask of localhost

Determining local IP address is easy, but determining MAC address, and subnet mask in Java has always been one of those hairy problems that sends me Googling. Using some of those same Java utilities, we can add a few Scala-isms and shorten the whole thing up significantly.

To get the IP address we can just statically retrieve an java.net.InetAddress to localhost and get the address String from that .

scala> import java.net._
import java.net._

scala> val localhost = InetAddress.getLocalHost
localhost: java.net.InetAddress = mymachine.domain.local/10.35.0.51

scala> val localIpAddress = localhost.getHostAddress               
localIpAddress: java.lang.String = 10.35.0.51

scala> println(localIpAddress)                      
10.35.0.51

Wrapping our localhost address object in a java.net.NetworkInterface, we can get the MAC address, or hardware address, formatted as an array of bytes, which we can convert to a Scala List, allowing us to format as hex and concatenate with the “:” separator with relatively little effort. Note that NetworkInterface.getNetworkAddress() requires Java 1.6.

scala> val localNetworkInterface = NetworkInterface.getByInetAddress(localhost)
localNetworkInterface: java.net.NetworkInterface = 
name:en0 (en0) index: 4 addresses:
/10.35.0.51;
/fe80:0:0:0:223:32ff:fec6:8a8a%4;

scala> val localMacAddress = localNetworkInterface.getHardwareAddress.toList.map(b => String.format("%02x",b.asInstanceOf[AnyRef])).mkString(":")
localMacAddress: String = 00:23:32:c6:8a:8a

We can use the same java.net.NetworkInterface instance to get us the subnet mask by retrieving the addresses for all interfaces (also a Java 1.6-only method) and for each of these retrieve the network prefix length, which essentially represents the subnet mask. We import scala.collection.JavaConversions._ to give us implicit Java-to-Scala list conversions, which I believe was introduced in Scala 1.8.

scala> import scala.collection.JavaConversions._       
import scala.collection.JavaConversions._

scala> val localSubnetCidrs = localNetworkInterface.getInterfaceAddresses.map(_.getNetworkPrefixLength)
localSubnetCidrs: scala.collection.mutable.Buffer[Short] = ArrayBuffer(24, 64)

Hmmm…that’s not quite right. 24 and 64 aren’t what we expect to see for subnet masks. But worry not, we’re on our way, right now we have the CIDR notation version of the subnet masks, in IPv4 and IPv6 respectively. We then just need to filter out the IPv6 address and convert the CIDR notation to our expected format, which I do using the import org.apache.commons.net.util.SubnetUtils class from Apache Commons Net

scala> import org.apache.commons.net.util.SubnetUtils
import org.apache.commons.net.util.SubnetUtils

scala> val localSubnetAddresses = localSubnetCidrs.filter(l => !(l < 0 || l > 32)).map(c => (new SubnetUtils("0.0.0.0/"+c)).getInfo.getNetmask)
localSubnetAddresses: scala.collection.mutable.Buffer[java.lang.String] = ArrayBuffer(255.255.255.0)

scala> println(localSubnetAddresses(0))
255.255.255.0

Voilé!

println(“Hello World!”)

Ah, the dreaded “first post”. Gotta put something.

Ideally I’d dive right in and offer up a useful chunk of info, but I’d like to get something showing on the site so here it is.

Soon you’ll see my own unique take on things I’m picking up as I work with and better learn Scala, hopefully in a way that can help others do the same.

Enjoy!