One-liner for a URL’s Last-Modified header as Date

I’m working on a project where I want to look at the last modified date of a URL for comparative purposes, and decided to come up with a brief method of fetching the headers and parsing it as a java.util.Date.

As with many HTTP-related operations, the go-to project here was the awesome Dispatch project, which is backed by the trusty (and proven) Apache HttpClient but throws on all sorts of Scala-y goodness though a nifty DSL of verbs for executors, requests, and handlers that wrap a number of common HTTP operations.

Without further ado, the “one-liner” (plus a couple of imports and a bit of line wrapping for less scrolly): (Note: See edit at bottom for enhanced version)

import dispatch._
import java.text.SimpleDateFormat._
Http(:/("dispatch.databinder.net").HEAD >:> {h => 
  (new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz"))
                .parse(h("Last-Modified").head)
})

If this gives you what you’re looking for, you can stop here, but for educational purposes let’s break it down further.

The Http(...) bit defines a Dispatch executor. In this case, I’m using the singleton executor that blocks the current thread and can be safely shared across threads. Other types of executors can be used via direct constructors that offer options including executing on background threads or using NIO.

Within the executor, we essentially define a request and it’s handler. The request portion is :/("dispatch.databinder.net").HEAD, where /: is a verb used to define the host, one of many ways to specify the URL. The HEAD verb modifies the request so that we’re only requesting the HEAD elements rather than a conventional GET request. Other verbs are available to, among other things, add path elements, append query strings, or POST data to the URL.

After this we have our handler, in this case the most verbose portion of the code: >:> {h => (new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz")).parse(h("Last-Modified").head)}. The first portion of this is the handler verb, >:>, which processes the header data as a Map passed to the function block that follows. Within this block, we hardcode a java.text.SimpleDateFormat that we know can parse out the date format used for “Last-Modified” dates–if this code was going to be run repeatedly we’d obviously define the date format object once and reuse it (see also edit at bottom for version using DateUtils). The h("Last-Modified").head actually pulls the value out of the Map of headers. The .head is required because the values portion of the Map actually contain sets of String to accomodate headers with multiple values, e.g. “Content-Type”. Since we know that we only will get a single “Last-Modified” date, we can confidently only care about the first (and in this case only) element of the Set.

Technically the return of this is a dispatch.Http.HttpPackage[Date], but this can be interacted with directly as if it were a java.util.Date

This could easily be adapted to process other header values at the whim of the coder, and all with much less effort than coding directly against HttpClient.

Edit:
The HttpClient library offers a DateUtils library that is able to parse various date formats that appear in the Last-Modified header. This is preferable to the method used above because (a) it accounts for different localities that format the date string differently and (b) SimpleDateFormat isn’t thread-safe so if you’re reusing it from multiple threads you will need to synchronize on it. Plus it’s less verbose. Given that, our one-liner can be rewritten as:

import dispatch._
import org.apache.http.impl.cookie.DateUtils
Http(:/("dispatch.databinder.net").HEAD >:> {h => 
  DateUtils.parseDate(h("Last-Modified").head)
})
, , , , , , ,

7 Comments

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>