Greasemonkey on the server

A search for the term "Greasemonkey on the Server" would lead you to a project that uses aservlet filter to insert scripts and seems to be quite old (and not under development). An alternative suggested in the forums, called SiteMesh seems to have the same fate. Even if we were to use them now, they work only with code that you own.
The framework I wanted should have the following
  1. Proxy any webpage, letting me insert my own scripts into the page
  2. Should not require me to configure the server as a proxy. Instead, I should be able to hit a specific url to view the page
  3. Images, CSS, and even AJAX should work like it would work on the original page.
Some application that could leverage such a framework would be
  • Dynamicall marking pages for automatic scraping, like dapper.net.
  • Adding more data to a page and saving it as our version
  • Tracking user mouse movements, studying user behaviour.
  • Walking users through sites
  • Colloborative browsing
The list goes on. There are a lot of sites that try to achieve the effect but without 100 % success. Here are some methods we can use to achieve near perfect pages, though they are fetched by our servers.

  • For URLs (img, script, src), instead of rewriting the fetched HTML, we may as well have a handler at the root of our domain that redirects to the actual source. In this case, if the URL is absolute, it works just fine; in case of relative URLs, they hit our server and are redirected
  • Insert a script at the top of the page that substitutes the XMLHTTPRequest.open to call our proxyUrl.
  • Use our script to rewrite a href attributes to have control over the target pages.
  • Use our script to rewirte targets of form submit.
  • Send cookies set in the browser to our proxy for them to be relayed by the proxy to the actual server.
The idea is still rough and some of the elemets that are a problem include (but not limited to) Flash, top.location and window.location changes, script in the page tampering with the injected script, etc. A container like Caja (or Cajita) could come in handy to tinker with elements that have to be changed at the server side.
The idea is crude, but as I refine it, I would be posting updates.

Quick Analytics

As mentioned in an earlier post, sneak-o-scope is dead due to opensocial templates. However, the work done with sneak-o-scope could be used to build a simple analytics system. The system could be used for
  • Know how many of your twitter followers actually click a link you post
  • Track down that stalker who sends me disgusting scraps on Orkut
  • Know how many times my thread has been viewed in a forum
  • Tracking if a friend really clicked on a page send over the IM to him
  • Track how far an email travels, who view it, etc.
Hence, the requirements for the system should be
  • Require no registration, should be as simple as TinyUrl
  • Give a url that redirects to the original page, to an image or open a 1x1 transparent png.
  • Show easy to understand analytics
  • Send emails when someone is tracked
  • Allow to have the analytics private
I am still working on the UI to keep it as simple as possible, so if you have suggestions, please do write in. I would also appretiate a name for the project.

How Tynt works - Technical details

The idea of Tynt is simple and effective - whenever stuff is copied from your site, track it and insert a small link-back to your website. It may now be a 100% effective, but the idea would work for people who don't play around with the content that is copied. This post describes how they track stuff that is copied and some enhancements that I think would be useful.
Preventing text to be copied from a website has been around for ages. The onCopy method gives us a handler just before content is copied allowing the actual copied text to be manipulated. The tynt site requires us to insert a script that registers handlers. Here is the step by step explanation, once their script loads on our website.
  1. Register handlers for onCopy, onDrag, etc on the window Object
  2. Get a unique URL that will be used as a tracker
  3. When any of the registered event occurs
    1. Send an event to the server
    2. On firefox, create a new node with the data that has to be displayed with the content that is copied. Set selection to existing node and this new node.
    3. On IE, add extra text to the current Selection
    4. Cancel the propagation of the current event.
This works fine for most websites as not many use the on copy event. However, i find the text that is appended to the existing selection a little too obtrusively. Instead of adding such huge content, it would be easy if a simple image is included, with all the links on the image. The image still can be removed, but in my opinion, the chances that a simple small image would be ignored instead of such a huge block of text is high. The image can also serve as a pointer to the places where the copied content travels. There should also be some attribution text added when the client to which content is copied does not support rich text. Something simple in braces should do the job.
As for how the analysis was done, all I had to do was use Fiddler to load a formatted version of Tracer.js?user= to understand the code. Then I had to check out the eventHandlers that led me to handleTracing(). The function has a could of inline functions and one of them, called H() is responsible for replacing the text.
To summarize, the idea is nice but it is the javascript implementaion that stole my attention. :)

How Opensocial templates on Orkut killed Sneak-O-Scope

Hi,

A few months ago, I had posted a hack on Orkut opensocial that allows directed phishing attacks on Orkut accounts. The orkut team has acted upon it now and has disabled profile views on orkut. The current specification mandates the use of templates for the profile view. Templates with data pipelining would be fine for displaying information but would completly diable all forms of interaction.
This is a problem for the sneak-o-scope application as it relies on the fact that it can make a couple of javascript calls to record visit time, ip address, etc. With templates this is no longer possible. Without javascript, there would be no way to record the time when the user leaves a page. We can record enter time using requests [os:HTTPRequest] , but that would still not give us browser details like IP Address, user agent, etc. These requests are proxied through the Google URL Fetch bot. Adding images to the page also are replaced by the proxied version of the images.
The problem was with cross site script attacks on the applications. The template approach seems to be too limiting. A better stategy would be to use Caja to sanitize the applications better. Since the application runs in its own global object, the "top" javascript can be cajoled to disallow open frames. This would prevent the attack I published, but we will have to take a look at the other possible attacks.
So, as of now, Sneak-O-Scope is suspended. It can start working only when javascript is allowed on the profile page.

YAHOO Open Mail Application : Email Header

Hi,

A few days ago, I had written an article about the Yahoo Open Mail application that performed redaction of emails. This is a post about another such application that may prove useful.
There are a lot of websites like this, this and this that require us to paste the email header in the text box and they display the information in a much more readable format. Having a tool like this inside the YAHOO Open mail would be a lot more simpler.
Technically, the way to create it is simple. We have to attach an drag-drop listener handler that activates when an email is dropped onto the application. The function would require "full" details and hence, we would also get the header in the "part" property of the passed JSON object. Once that is done, its a matter of parsing the information and displaying the information, either in a new tab or in a pop up dialog box. An addition would be to use free ip to geolocation tools to determine the location of the hop servers and display them as a map. A furthur visualization could be actually animating the mail to give the user, an idea of the delay, etc.
I would love to get this out as quickly as possible, unfortunately, I seem to have lost my bouncer ID for the mail applications. If you know of any, please do let me know.

Search with your Google Custom Search Engine on the side

Hi,

I had earlier written about a custom google search engine that I had created that helped me look through my delicious bookmarks. The search engine was great, just that I did not have other features that Google was offering, things like definations, books, etc. It looks like the Google default search cannot be done away with, thats the reason I wrote this small greasemonkey script that shows the results from my search engine side by side of Google results.
The script itself is simple, it gets the search parameter and pages from the location, sets them in the Google CSE Url and makes an AJAX call. The results are put into a div thats placed next to the results div. All decorations except the search results itself are removed. That done, the sponsored results table is also blanked out to use that space for custom search results.
If you want to add your custom search engine, simply reaplce the first variable in the page with your value - this is the cx parameter passed when you search on your custom search engine.

While I am at it, I was also planning to convert this to a ubiquity extension.