The Deep Web / The Invisible Web

  • Google and Yahoo provide access to billions of documents
  • students don't get access to all the high quality resources using G & Y
  • Specialized tools are needed.
  • Part of the web is invisible to general purpose search engines
  • Google database - HTML files, PDF, word, powerpoint, text files
    • does not contain: Flash files, query-able databases (this is the deep web), real time information, proprietary information (password protected)
  • The deep web is much larger than the surface web
    • 20-100 billion documents
    • less than 10 billion in google
    • 450,000 queryable databases
    • 7,500 terabytes of data
  • Google Scholar is google's attempt to make the deep web accessible
  • Google scholar has good and bad qualities
  • Google Books is good - only service of its kind (on a massive scale)
  • multiple languages
  • library links
  • BAD: Coverage gaps, unknown coverage, really bad search software
  • What do we want from a specialized search engine?
    • Comprehensive - lots of sources, lots of different sources. both current and in the past.
    • Integrated - do searches on an author and get get only their articles
    • Transparent - enter your query, get results -> Know where and when those articles came from
  • Alternatives - Turbo10, Scirus, bnet, librarians' internet index
  • Don't just fall back on google scholar. it isn't good enough!
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License