One-liner for a URL’s Last-Modified header as Date
I’m working on a project where I want to look at the last modified date of a URL for comparative purposes, and decided to come up with a brief method of fetching the headers and parsing it as a java.util.Date.
As with many HTTP-related operations, the go-to project here was the awesome Dispatch project, which is backed by the trusty (and proven) Apache HttpClient but throws on all sorts of Scala-y goodness though a nifty DSL of verbs for executors, requests, and handlers that wrap a number of common HTTP operations.
Without further ado, the “one-liner” (plus a couple of imports and a bit of line wrapping for less scrolly): (Note: See edit at bottom for enhanced version)
import dispatch._
import java.text.SimpleDateFormat._
Http(:/("dispatch.databinder.net").HEAD >:> {h =>
(new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz"))
.parse(h("Last-Modified").head)
})
If this gives you what you’re looking for, you can stop here, but for educational purposes let’s break it down further.
The Http(...) bit defines a Dispatch executor. In this case, I’m using the singleton executor that blocks the current thread and can be safely shared across threads. Other types of executors can be used via direct constructors that offer options including executing on background threads or using NIO.
Within the executor, we essentially define a request and it’s handler. The request portion is :/("dispatch.databinder.net").HEAD, where /: is a verb used to define the host, one of many ways to specify the URL. The HEAD verb modifies the request so that we’re only requesting the HEAD elements rather than a conventional GET request. Other verbs are available to, among other things, add path elements, append query strings, or POST data to the URL.
After this we have our handler, in this case the most verbose portion of the code: >:> {h => (new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss zzz")).parse(h("Last-Modified").head)}. The first portion of this is the handler verb, >:>, which processes the header data as a Map passed to the function block that follows. Within this block, we hardcode a java.text.SimpleDateFormat that we know can parse out the date format used for “Last-Modified” dates–if this code was going to be run repeatedly we’d obviously define the date format object once and reuse it (see also edit at bottom for version using DateUtils). The h("Last-Modified").head actually pulls the value out of the Map of headers. The .head is required because the values portion of the Map actually contain sets of String to accomodate headers with multiple values, e.g. “Content-Type”. Since we know that we only will get a single “Last-Modified” date, we can confidently only care about the first (and in this case only) element of the Set.
Technically the return of this is a dispatch.Http.HttpPackage[Date], but this can be interacted with directly as if it were a java.util.Date
This could easily be adapted to process other header values at the whim of the coder, and all with much less effort than coding directly against HttpClient.
Edit:
The HttpClient library offers a DateUtils library that is able to parse various date formats that appear in the Last-Modified header. This is preferable to the method used above because (a) it accounts for different localities that format the date string differently and (b) SimpleDateFormat isn’t thread-safe so if you’re reusing it from multiple threads you will need to synchronize on it. Plus it’s less verbose. Given that, our one-liner can be rewritten as:
import dispatch._
import org.apache.http.impl.cookie.DateUtils
Http(:/("dispatch.databinder.net").HEAD >:> {h =>
DateUtils.parseDate(h("Last-Modified").head)
})
dispatch, head, header, headers, http, httpclient, last-modified, modified
Is it possible to request both Header and Content in the Same HTTP request (HEAD + GET)
The Dispatch page actually includes a great example of that here, showing how to use the tupler-handling verb.
Modifying their example slightly, we can create a tuple of the last modified date and the contents:
import dispatch._
import org.apache.http.impl.cookie.DateUtils
val (lastmod, contents) =
Http(:/("dispatch.databinder.net") >+ { req =>
(req >:> {h => DateUtils.parseDate(h("Last-Modified").head)}, req as_str)
})
Sweet …
Thanks Jamie. Great article BTW
Thanks Conrad, I appreciate it.
[...] Dispatch as an embarrassing wart on the Scala community, it seems to be generally highly regarded, even praised for its abysmal [...]
Hi Jamie
How would I go about setting a cookie using dispatch?
Hey Conrad…I don’t believe dispatch directly exposes the cookies, but since it is Apache HttpClient under the covers there’s presumably a way to access session cookies from the underlying HttpClient.