Earlier, I had posted an excel sheet that contained information about the ending cut off marks for Tamil Nadu engineering colleges, per branch, per college. This was the greasemonkey script that I used to parse the web site and get information.
Firstly, I think that the original site is not very usable; most queries made would be of the form "give me possible colleges ( and possible branches)" for my cut off marks. In the original web site, you had to visit individual colleges and check out the details.
The script that I wrote was a five minute hack that scraped the site and came up with the data. IT was a two step scraping process, and I know that I could have written it better. I just wanted to get the job done, and spend as minimum effort writing code as possible.
The first job was to get the identification numbers of individual colleges. This included selecting each district and picking up the college link. At line 202, the function addList added the colleges to a list that I was maintaining in a cookie. I did not use GM_setValue as I initially intended to distribute the script as a bookmarklet along with the datasheet.
Once the cookie was created, I manually pasted it in the script with the "tnColleges" variable, as indicated in line 13.
The second task was to visit individual college pages and pick up data from there. By this time, I was pretty convinced that people would be fine with just the data sheet, so I wrote the departments and the cut off marks into GM_setValue. The script also checked if the page was not loaded; and waited some time till the page loaded with the data correctly. Every time a collge was visited and the data picked up, the page was redirected to the next data. Thus, all the data for every single college was available in the greasemonkey storage.
The final job was to publish the datasheet; this is done in line numbers 53 to 103. The function reads data using GM_getValue and writes it to a table. It also classified the departments, and all departments that were not listed were shown on the firebug console. Copying the table to MSExcel gave us the data sheet ready.
The entire process was complete in about 15 minutes of coding and 10 minutes of scraping. I hope it did save a lot of time for people looking for this information. I did the scraping at night so as to have minimal impact of the servers.