Gmail Burglar Alarm : Statistics now Inside Gmail

Hi,

A few days ago, I came across the Twitter Gmail Gadget. It has a neat interface with all data displayed inside Gmail; no external pages, etc. Since Gmail Gadgets are also opensocial applications, they have a canvas view, in addition to the profile view that show up on the side bar. The canvas view takes up the larger part of the screen.
We were also running out of space at the bottom of the gadget that had cryptic options to set the various parameters. The latest release has just one link at the bottom of the gadget visible on the left pane. Clicking on this takes you to the larger details view that opens in the right pane.
The right pane shows the statistics recorded in Google Calendar. It also has a tab showing the visits recorded by bit.ly. If a URL is not configured, there is text in the tab trying to explain how URL trackers like bit.ly can be used and how a bit.ly URL can be configured inside the gadget.
As far as the code is concerned, the change was not much. Previously, the two pages were opened using window.open. Now, they are simply shown as iFrames inside the tab pages. The height of the gadget is also adjusted after user actions for aesthetics.
If your gadget has not refreshed automatically, you may have to delete your existing gadget and add a new instance. This sure may delete the total time shown, but that should be fixed in the next release. I am also planning to have a home page for the gadget site hosted on Google App engine.

Chroma-Hash - Gradient implementation using Canvas

A few posts ago, I had written about an implementation of Chroma-Hash using a huge background image. The idea was to use the image for generating a unique gradient as the background of the password field.
Some more changes were made to the idea a few days ago that included using canvas for HTML 5 browsers as a gradient generator, and using a salt unique to every user. The project is is available here. The additions in current HEAD revision are as follows.
When "canvas" is selected as a visualization type, a canvas with opacity less than one is placed exactly on top of the password box. Since the canvas is placed above the password box, it may intercept all clicks intended for the password box. Hence, whenever the mouse moves over it or clicked on it, it is hidden. It shows up again after sometime, or if the mouse is moved out of the password box. Apart from this, the hash of the password is taken and a gradient is drawn by splitting the hash into 4 colors.
Another change is the addition of a salt to prevent hashes from being recognized. The salt is unique to a user. Similarly, an option was added to get the salt out of the domain name; a way to protect against phishing. As too many changes to the salt makes it hard to recognize the gradient, I am currently working on a way to indicate the domain color as the starting of the gradient instead of the salt.
The final change is the upgrade to the greasemonkey script. The Greasemonkey simulator was for testing. Currently, it inserts the script into the page and Chroma Hash is activated. One side effect of this is that the script can be found in the edit area of the pages. The next release would remove this and bake the Chroma Hash logic inside the greasemonkey script. Watch out this space for updates.

Ubiquity Command - Linkify upgraded to Parser 2

A few posts ago, I had written about the changes to ubiquity commands due to the change in the ubiquity parser.r. This post details the changes to the linkify command.
As written earlier, the preview of the command cannot be used for interaction. Hence, the command does not take any inputs except the selected text when it is invoked. Considering this as a search term, the command uses YAHOO BOSS API and displays search results like its earlier versions. The only difference here is that the search results and the other UI is embedded on the page.
The user can change the search term before clicking on the appropriate link that makes the selection a link. The only trick here is that when the user changes the search term, the selection on the page changes. Hence, the selection is stored in a local variable to be used when creating the link. When the user clicks the link, the saved selection range is activated and the browser selects the same text when the command was activated.
You can subscribe to the command by visiting the command page.

Screen Scraping with Javascript, Firebug, Greasemonkey and Google Spreadsheets

Most of the web page scrapers I have seen are usually written in PERL, mostly due to the power of Regex. However, nothing could get friendlier when parsing web pages if not Javascript.
The pages to scrape are usually like this or this.
With the power of regex inherited from PERL, DOM parsing from Jquery and its likes, scraping data from web pages is a lot easier. This article outlines the various techniques that makes screen scraping using web pages easy.
Writing a web page scraper usually involves the following steps.
  1. Identification : identifying Page Elements
  2. Selection : getting data out of the selected nodes
  3. Execution : running the code in the context of the page.
  4. Submissing : saving the parsed data so that it can be used later
  5. Timeouts : to introduce delay so that the server is not overwhelmed.
Identifying nodes can get tricky, but with Firebug, its a simple point and click exercise. Firebug gives us the entire XPath to the node in question. Web Scraping usually involves picking up data from a structured document that has elements repeated in some pattern. "Inspecting" elements would reveal a pattern, usually a common class name or a hirearchy.

Once identified, we could get the code to select all the elements interactively using the Firebug console. This would usually be a combination of XPath expressions, getElementsByTagName, getElementsByClassName, etc. Once you have all the elements (as rows usually), you could possible dive deeper into each element till you extract the innerHTML or href. This is what goes into the code.

Once you have the code returning useful data (checked using Firebug console), you would need a way to run it for the page and possibly load the next page for parsing once the current page is done. The javascript code could be inserted using a bookmarklet, but that would require the user to click the bookmarklet for every page. I choose to add the small greasemonkey header and convert these scripts to greasemonkey scripts. This ensures that the we parse the current page and move the the next automatically.

Once reason why most people don't use javascript for parsing is its inability to store the parsed data. This is where Google Spreadsheets come to the rescue. Google spreadsheets lets us create forms to which we can POST data. The script would need to create a form with its action set to a url that resembles "http://spreadsheets.google.com/formResponse?formkey=yourkey". You would also have to create input elements with names resembling entry.0.single, entry.1.single and so on. You could check the actual data submitted to spreadsheets using Tamper Data. So we now have all our data in a spreadsheet, giving it sorting and filtering capabilities.

The last point is about preventing loops and switching to timeouts instead. This would ensure that you don't overload the server with requests. A good feedback mechanism, something like coloring the background of fields that are parsed successfully, would be an added bonus.

To conclude, scraping data this way may require your browser to be open all the time, but some of the benefits over the command line way (that I could think of ) are

  1. Easiest way to handle the DOM, no regex, just DOM traversing
  2. Visual indication of what data is parsed, real time
  3. Proxy and TOR configuration out of the box if your IP is blocked :)
  4. With webworkers, complex parsing could be done a lot easier
Writing all this from scratch is hard, may be you could use this template. The templates is littered with placeholders where you could insert your code.


The Chroma Hash effect

There has been quite a lot of discussions on the usability issues with password masking. I came across efforts like using a mask, the hash represented with a graph, and more lately representing the hash with colored bars. The idea was interesting and I decided to fork my own project and experiment with other visualization techniques.
The idea is to represent the 32-long hex in a way that the user easily recognizes. The representation should also ensure sufficient delta between two similar passwords. With a
3.402823669209385e+38 different representations, finding the right visualization may be hard. The plan was to divide this by four and display unique pictures that are fetched from the internet, but getting a consistent source of so many pictures is hard. Though the number seems huge, the practially used has values would be far less. Compromising for accuracy, a gradient image was used as a background to the text box. The backgound-position of the image is moved to display different gradients for different passwords. The image is 3000x900 and hence position is the hash value modulo dimensions.
The forked project is also written in a to add other visualization techniques easily. You can also find a quick greasemonkey script that adds this feature to all password fields here.