Greasemonkey on the server

A search for the term "Greasemonkey on the Server" would lead you to a project that uses aservlet filter to insert scripts and seems to be quite old (and not under development). An alternative suggested in the forums, called SiteMesh seems to have the same fate. Even if we were to use them now, they work only with code that you own.
The framework I wanted should have the following
  1. Proxy any webpage, letting me insert my own scripts into the page
  2. Should not require me to configure the server as a proxy. Instead, I should be able to hit a specific url to view the page
  3. Images, CSS, and even AJAX should work like it would work on the original page.
Some application that could leverage such a framework would be
  • Dynamicall marking pages for automatic scraping, like dapper.net.
  • Adding more data to a page and saving it as our version
  • Tracking user mouse movements, studying user behaviour.
  • Walking users through sites
  • Colloborative browsing
The list goes on. There are a lot of sites that try to achieve the effect but without 100 % success. Here are some methods we can use to achieve near perfect pages, though they are fetched by our servers.

  • For URLs (img, script, src), instead of rewriting the fetched HTML, we may as well have a handler at the root of our domain that redirects to the actual source. In this case, if the URL is absolute, it works just fine; in case of relative URLs, they hit our server and are redirected
  • Insert a script at the top of the page that substitutes the XMLHTTPRequest.open to call our proxyUrl.
  • Use our script to rewrite a href attributes to have control over the target pages.
  • Use our script to rewirte targets of form submit.
  • Send cookies set in the browser to our proxy for them to be relayed by the proxy to the actual server.
The idea is still rough and some of the elemets that are a problem include (but not limited to) Flash, top.location and window.location changes, script in the page tampering with the injected script, etc. A container like Caja (or Cajita) could come in handy to tinker with elements that have to be changed at the server side.
The idea is crude, but as I refine it, I would be posting updates.