Normally, to scrape data from a web page that you don't own, you need "tight" cooperation from a server (either yours to be a proxy, or the data owner's to open up permissions for cross-site scripting or even to serve a script that can "phone home" like in some google suggest tools ). The YQL method only requires the pages to be scraped to be accessible by robots. In which case, Yahoo's YQL can do the fetching work.
The fetched content is made accessible as a string (either XML or JSON format, see YQL examples) available for any kind of manipulation.
- load content from developer.yahoo.com page
- this is a broken link
- this is an external broken link
- this url takes 10 seconds to serve content and times out (paste SELECT * FROM html WHERE url="http://icant.co.uk/articles/crossdomain-ajax-with-jquery/waiting-for-godot.php" in the YQL console to verify the timeout)