Though there have been numerous changes to the search algorithm, the crawlers have remained very static parts of the equation. Indexing has improved, more machines are thrown to improve response times, even semantic searches are now becoming mainstream. If web 2.0 is not only about HTML pages, then crawlers of search engines better adapt themselves to this. The idea of page ranks was great, but links were the focal point of the technique. Unfortunately, the web 2.0 paradigm refuses to confine HTML documents to pages with links.
It is about interactivity, about usability, and as it is not about information loading about page loading, conventional crawlers may as well miss all the information that are downloaded that are available due to user actions. Though I am not trying to propose a theoretically correct crawler model, I definitely am suggesting in the blog about a crawler that understands rich internet applications and ranks pages including information obtained by user interaction.
A crawler of this genre could mimic a screen reader, trying to perform user actions (clicks, mouse over for help, etc.) It could rate pertinence based on the depth of information, i.e. number of clicks or user actions required to get to that information. Other heuristics could be developed to assess the importance that a real user would attach to such dynamic divs and information obtained through AJAX.
This model is not perfect, and still has problems. Captchas may block the flow, but that is a problem with HTML crawlers as well. Privacy could be a concern, but crawlers could obey robots.txt, as they are doing now. Finally, the biggest hurdle to be solved is the concept of mapping user interactivity to ranking values. This could have certain fuzziness, but is not an impossible problem to solve. There are some low hanging fruits [e.g. one click is better than 2 clicks] that could be leveraged, to start with.
Search Engines should stop forcing people to play cheap tricks of putting key words in invisible <div> and <noscript> tags, and enhance credibility by looking at a page for information just like a real human would look for and rank it....