Parashuram's blog: July 2007

URLs, AJAX and Script Injection

Hey,

The paradigm of a URL being a Uniform Resource Locater seems to no longer hold true. Examples of this include personalized search results returned by Google, and more importantly AJAX. It gets irritating when you send across a link, expecting people to see the same web page as you did, but the result is a complete shock. Page personalization seems fair, and there are ways out of it, but the way AJAX prevents information from being shared using a URL is definitely a concern. With social bookmarking and site sharing on the rise, any website that denies the convince of URL sharing is sure on a back seat.
The main problem that AJAX has with manipulating URLs to represent correct information, or page state is that if it directly rewrites document.location or window.url, the page is caused to refresh, defeating the purpose of AJAX. Developers are smart, and thats where they started to exploit the underused feature of page anchors. Most web browsers do not refresh a page if all that changes in the URL occurs after "#". Hence, it is now a recommended practice to save the state of the page as an anchor. Most pages typically look like "http://page.com?params#state=1".
Another advantage of this approach is that it does not upset the behavior of the "BACK" navigation, another pain point that was a result of transitioning from web pages to the web applications paradigm.
This hack also opens up a wide list of security vulnerabilities. May be I could make myself clear with an example.
There was this session at the Barcamp that I attended yesterday on mashups, mapping and the promising Yahoo India Maps. Yahoo maps is still in beta, but the fact that impressed me was it attracted a lot of local janta wanting to mash up. The session by Shivku that revealed the extensive use of JavaScript got me probing.
Yahoo India Maps is a classy example of URL rewriting using the # function. Hence, I can plainly send something like this to indicate a location in Bangalore. http://in.maps.yahoo.com/#?lat=12.922420&amp;amp;amp;lon=77.593195&z=1&addr=jayanagar%2C%20bangalore
The advantage of this over Google Maps is that I just copied this from the address bar. The security problems in this case may also be evident. People would become wary of links that say something like javascript:(alert(''))();, but a url like the one for Yahoo India Maps is something that people are fine with. Now, if we are able to get some SQL injection done on a URL like that, given the fact that the URL is actually parsed at the client side, it makes an excellent case for Script and HTML injections. I know that this is nothing new, and people have been doing injections like this for ages, I just wanted to list down my experience of finding such a hack.
Let me put down the results of my analysis. Though the Yahoo Maps code is compressed, we can quickly figure out that in the global namespace, the function "getParams(arg)" does the actual parsing of the URL. A second search on getParams gives us all the places where getParams is actually used. Closer inspection reveals that the possible parameters [i.e. loc, lat, pg, ms, mt, lc] directly correspond to the URL parameters .This is to be the source of our injection.
Now looking at the place where we should inject, a quick search looking for all places where the parameters [i..e DEFAULT_LAT, DEFAULT_LON, etc] are used. Unfortunately for us, the there is no place where the latitude or longitude is used. Interestingly however, there was a place where the location was used. It turns out to be written to a text field, and we have hit another dead end. I ran a couple of tracer scripts to see if I could see any other injection targets, and I realize even the category text is a number; the real text comes from a back end call.
After some searching, I finally stumbled upon the print link, and saw that in the print page, the 'addr' argument was 'unescaped' and written. Now, all I had to do was to check that function out, get the escape value of "<" (so that unescaping inserts the character). A few minutes later, I cooked up the following link.
http://in.maps.yahoo.com/php/ymi_map_print.php?lat=65.183030&lon=0.351563&z=15&l=1042&lc=Police%20Stations&pg=1&addr=%3Cscript%3E%20%20alert(%22hw%22)%3C/script%3E

Well, now the question of potential exploits. Consider the case where a user is logged into YAHOO account. Yahoo Maps obviously has his cookies, and if I insert a script to include a malicious js (instead of the current alert hello world) i could cleanly steal cookies. Technically, that would let me log in as any user, and propagate it to friends, and friends of friends.
Thought I have not really tried, and hope that it does not work, I could use a service like YAHOO answers to post the cookie (like the cookie jars available on orkut) as a reply to a question. That would still keep me undetected, and I am not really posting data to a server.

To summarize, the idea of sanitizing the URL paradigm is great, but I believe that the Javascript developers must be aware of the potential problems that it may bring in.

BarCamp 4 Collectives Edition

This weekend, I was at the BarCamp, edition 4

The day is finally over, and I did find some time to follow up on stuff that I got curious about at the Barcamp that I attended yesterday. There were some really good sessions, and I did meet a lot of techies. Though I could not make it on Saturday, owing to a full-night Diablo II, I think the sessions on Sunday were in itself pretty filling.
I also got a couple of ideas, so watch out my blog for all random thoughts that would be a carry over from yesterday.

Meebo Scripts Modified

Last week, Meebo modified its Javascript code, rendering many of the Greasemonkey scripts that I had written. Looks like a lot of new objects have been added, and some methods have changed as well. Some changes include the gAjax object that holds the session key, change in method names of gLogon. I still have not figured out the place where the login function is, and hence, the meebo auto login is broken as of now.
As I was going through this process, I was thinking about an interesting feature that Firebug could provide. In addition to the Inspect feature, may be firebug could also list all Javascript events attached to particular HTML elements. This is more in line with aspect oriented programming, and I was thinking of quickly writing a script that injects a listener to all Javascript functions so that I could trace though them. This is specially required in the absence of "Pretty Printing" in firebug. May be Joe Hewitt could make Firebug extensible to allow people plugin custom Javascript editors. That would allow developers to inspect javascript code even if it is compressed or obfuscated. best of all, setting break points would be really easy in non-compressed code. So let me complete that code injector, and see if I can turn that into a FireBug enhancement.

Laszlo - Is it the "BIG" thing ?

Laszlo is a "big" thing in the Rich Internet Application development platform. The advantages it has springs from the very fact that ramp up and development using the platform is quick. However, drawing from my experience, I seemed to have noticed that the code quickly gets out of control.
Despite good practices and informed design decisions, there are times when the responsiveness goes downhill. I have seen projects where the initial performance was top notch,but as the developers continue chuck functionality at the code base, project managers tend to get edgy over the interactivity and maintainability of the code.
In addition, the size of the SWF file finally obtained after the final compilation seems to be large enough not to serve on one request. Most applications would allow lazy loading of parts of the SWF that would be accessed later.Though Laszlo provides the 'initstage' attribute, the attribute only serves to initialize the properties of the view (that is subsequently compiled into a MovieClip). The size monster still persists, resulting in slower response times for huge SWFs that Laszlo generates.
Thinking logically, the reason that this is a problem with Laszlo and not HTML is that in case of HTML (even if it is AJAX driven), the entire web application is not loaded at one go. People tend to split up the page, and then employ lazy loading if required. Technically, it is not impossible to do with Laszlo either. Being based on ECMA Script, Flash, and hence Laszlo, do support the eval and 'LoadMovie' functions. I have noticed that two Flash Runtimes with smaller SWFs loaded perform better that one Flash Runtime with a big SWF file. Also, may be Laszlo can natively support dynamic loading of content so as to keep the size of the SWF small enough.
This could be better than multiple movies as Laszlo comes with a tag of about 21K of LFC bundled with every SWF that it generates. If Laszlo could natively allow sharing these foundation classes across multiple SWFs on the same page, the download size could be a lot better.
To summarize, though Laszlo gives us a great programming paradigm, in terms of defining the Views in XML, Controller in JavaScript and model in datasets, it still needs to tweak its runtime to deliver smaller SWF content, that don't lose out on scalability. I have written a pre compiler for Laszlo, but I am not sure how it may scale to generalized applications. So if you are interested in trying it out, do drop me an email.

Picture Perfect

Hey,

Even today, when I have to post my real picture as a display picture, I get a little weary. With pranksters like these littered over the internet, no wonder the social networking sites are trying their best to protect profile pictures. Though people have used techniques ranging from intercepting right clicks to masking the pictures, most techniques don't really work the with the knowledgeable. After all, once the information is on the client, the question is only about making it difficult to reproduce the information by a prankster.
There are many sites that disable the right click on pictures using Javascript, but most people know how to get around this.
Orkut interestingly uses a combination of <DIV> and <IMAGE> tags to protect the profile pictures. The exact hierarchy in orkut is something like

<DIV style = "background-image=profile-pic.jpg">
---<DIV style = "background-image=blank.jpg">
-------<IMAGE src= "blank.jpg" />
---</DIV>
</DIV>

In this case, when the user right clicks on the image, though he gets the option to save the image, he only ends up with saving blank.jpg. The work around however is to simply to use a tool like Firebug to check the profile-pic.jpg, of see the HTML source to get the profile image.
There was a technique to overcome this too. Sites started embedding images on compile time into flash files. There were other sites that started "rendering" the images into the <CANVAS> tag. The latter did not really become popular as it is "HTML 5 only" tag. Nonetheless, all these techniques get defeated the moment the user does a "Print Screen"

This was something I was thinking about for quite some time when I came up with a thought that seemed like a potential solution. The profile picture can be embedded into a flash object that would be activated only as long as it gets certain random numbers from the server. The "getting random number" clause is added specifically to prevent users from saving the flash object, or faking the server replies. Also, to prevent Print Screens, the flash object (that renders at rates like x frames per second) can add noise to the image. As the noise changes at a very frequency, the human perception of the image would still be good. However, when a Print Screen is tried, all that would get copied is the image with the noise. If the noise is made sufficiently random, the the prankster can get is an image with noise in it. Even if the prankster tries to get multiple snapshots, depending on the noise spread, he may have to try may screen shots before actually reconstructing the actual images.
I realize that this requires a lot of bandwidth, and hence this may be used only in cases of where the localization of images is to be prevented for a fitting reason. I also got some comments from a couple of friends, telling me that this could be useful for web pages that display wallpapers, and even for the porn sites that want people visiting them rather than getting the pictures offline !!! :)

Honey, you cannot hide from me !!!

Hey,

I hate it when people ignore me. Especially when they are online, in the invisible mode on their YAHOO messengers. I just stumbled across this service called www.xeeber.com, that tells me if a user is online, offline, of just invisible. I use meebo as my instant messenger, and that being web based, I sure can use xeeber to see if people in my buddy list are offline or hiding from me. The process is simple. I can run thought the meebo's javascript, check the gBuddyList object and iterate through all the buddies, sending a request that looks something like http://www.xeeber.com/index.php?user=userID.
The only problem that I see here is that I have almost 100 buddies, and the xeeber server seems to go down quite often. So I will have to decide on the interval at which I have to send the request so that I don't overwhelm that server. Also, I will have to decide on the way to trigger this discovery process., i.e. will it be on the click of a button, or will it be done on setInterval.
Now about the name, this is
xeeber + meebo = xeebo ?
meebo + xeeber = meeber ?
Well, lets get to that later. Lets just finish the greasemonkey script quickly.
By the way honey, you can no longer hide from me online !! [:)]

Anti Phishing and Greasemonkey

Hey,

The moment we stop realizing that the '1' sandwiched between a 't' and a 'b' in the browser address bar is not an 'i', the phisher takes over. Creating identical websites is easy and quick, I have even heard about toolkits that replicate pages.
Apart from this, we never actually check the long URL in the address bar that are opened as a result of clicking on links that our banks send us, informing about suspension of credit cards, etc. The idea of phishing is so simple, and the irony is that anti-phishing is even simpler. Some very conspicuous domain specific personalization on the web page and the fake sites could never impersonate the genuine web sites.
Talking about domain specific personalization, I was thinking of greasemonkey, where i specify the //@include www.site.com/* in the header comment. In a sense, that is also domain specific personalization of a web page. So, can greasemonkey scripts be used to fight phishing ? In this case, the problem of man in the middle does not really come into picture, as all the personalization is done by a script that never really travels over the network. The script would simple sit on the client browser, and once it sees the domain of a site that is to be protected, changes it to a form that is wanted. There could be an argument that the scheme wont really work if the script is compromised, but the first premise of this idea is the trust on the client browser.
The idea is having a secure seal as YAHOO mail shows is good, but not all sites have it implemented. Hence, a user could protect himself from phishing on using some greasemonkey script to personalize the page.
Taking the idea a step further, would it be a better idea to have a generic greasemonkey script that does this personalization, or use something like Platypus to generate GM scripts for every page that we need to protect. I am still exploring this idea, and I would be glad to hear about any ideas that you may have.

Crawling the Rich.....

Hey,

For a web developer [specifically, someone who respects the beauty of Javascript] it get disheartening when the Search Engine Optimizers come along with reports of how JavaScript had harmed the rank of the site on major search engines. All the interactivity and usability build with engineering excellence are just ignored by these crawlers of the Web 1.0 times.
Though there have been numerous changes to the search algorithm, the crawlers have remained very static parts of the equation. Indexing has improved, more machines are thrown to improve response times, even semantic searches are now becoming mainstream. If web 2.0 is not only about HTML pages, then crawlers of search engines better adapt themselves to this. The idea of page ranks was great, but links were the focal point of the technique. Unfortunately, the web 2.0 paradigm refuses to confine HTML documents to pages with links.
It is about interactivity, about usability, and as it is not about information loading about page loading, conventional crawlers may as well miss all the information that are downloaded that are available due to user actions. Though I am not trying to propose a theoretically correct crawler model, I definitely am suggesting in the blog about a crawler that understands rich internet applications and ranks pages including information obtained by user interaction.
A crawler of this genre could mimic a screen reader, trying to perform user actions (clicks, mouse over for help, etc.) It could rate pertinence based on the depth of information, i.e. number of clicks or user actions required to get to that information. Other heuristics could be developed to assess the importance that a real user would attach to such dynamic divs and information obtained through AJAX.
The crawler could also create a data model to represent the data available at the site, and associate relevance based on the user action required to fetch the data, or availability of data. People may argue that the main reason that JavaScript is not supported is to fight search engine spam, but crawlers designed to represent such a data model of the page with user interactivity as a dimension could automatically have logic to push spam lower the rank. Hence, the problem of redirects and Javascript iFrames could be solved depending on Object visibility, that would be a parameter in the user interaction dimension.
This model is not perfect, and still has problems. Captchas may block the flow, but that is a problem with HTML crawlers as well. Privacy could be a concern, but crawlers could obey robots.txt, as they are doing now. Finally, the biggest hurdle to be solved is the concept of mapping user interactivity to ranking values. This could have certain fuzziness, but is not an impossible problem to solve. There are some low hanging fruits [e.g. one click is better than 2 clicks] that could be leveraged, to start with.
I tried investigating efforts in this field, but google did not help me much. I just found initiative of Adobe Flash that allowed crawling the rich internet applications and crawlers looking for javascript code, but nothing seemed satisfying. They were still talking in terms of links. I would appreciate if you let me know of any such initiative you are aware of.
To conclude, I strongly feel that it is time that JavaScript is given its due in the Search Engine World. Is this a google killer - no, this would only serve to complement the current technologies.
Search Engines should stop forcing people to play cheap tricks of putting key words in invisible <div> and <noscript> tags, and enhance credibility by looking at a page for information just like a real human would look for and rank it....

Laszlo Ant Deployment

Hi,

The Laszlo compiler itself is a servlet, that internally
calls a org.openlaszlo.compiler.Main to compile the application into a
SWF file. You could also use the lzc.bat in the bin directory to get
compiled applications. Well, there sure are numerous documents for
deploying Laszlo applications either as standalone application, or as
proxy applications, but like most build writers, I would really want
some quick ANT code snippet that I can copy paste in my build script.

So here is a quick script that you can put into your ant file to get Laszlo compiled with the other build stuff.
<property name="out.dir" value = "${basedir}\html\swf" />
<property name="laszlo.home" value="C:\OpenLaszlo3.3.3\Server\lps-3.3.3"/>
<property name="laszlo.src.dir" value="${basedir}/lzx"/>

<property name="laszlo.debug" value="false"/>
<property name="laszlo.proxied" value = "false"/>

<path id = "laszlo.lib">
    <pathelement location="${laszlo.home}/WEB-INF/lps/server/build" />
    <pathelement location="${laszlo.home}/WEB-INF/classes" />
    <fileset dir = "${laszlo.home}/3rd-party/jars/dev" includes="**/*.jar"/>
    <fileset dir = "${laszlo.home}/WEB-INF/lib" includes="**/*.jar"/>
</path>

<target name = "compile.laszlo">
    <mkdir dir= "${out.dir}" />
    <java classname = "org.openlaszlo.compiler.Main" fork = "true"
        newenvironment="true"
        failonerror="true"
        dir = "${laszlo.src.dir}"
        classpathref="laszlo.lib"
       >
        <jvmarg value = "-DLPS_HOME=${laszlo.home}"/>
        <jvmarg value = "-Dlzdebug=${laszlo.debug}"/>
        <jvmarg value = "-Dlzproxied=${laszlo.proxied}"/>
        <jvmarg value = "-Xms1024M"/>
        <jvmarg value = "-Xmx1024M"/>
        <arg line="--dir ${out.dir} --runtime=swf7 --mcache on --onerror warn main.lzx" />
    </java>
</target>

laszlo.home : Path of the Laszlo Server, till the lps-*.*.* directory
main.lzx : The main entry point where the canvas is placed
out.dir : The directory where the main.swf is to be placed
laszlo.src.dir : Place where main.lzx is placed
laszlo.lib.dir : The classpath reference to all the jar files
laszlo.debug : same as canvas.debug.
laszlo.proxied : same as canvas.proxied. false if you want a standalone swf on tomcat.

Once the standalone SWF is created, you can deploy it on tomcat (if compiled in SOLO / NON-Proxied mode).

Parashuram's blog