Deep web

This article deals with a part of the publicly accessible web.

  • For a very small part of it, an encrypted peer-to-peer overlay network that is not accessible with standard web browsers, see Darknet.

This article or section needs revision. More details should be given on the discussion page. Please help improve it, and then remove this tag.

The Deep Web (also known as Hidden Web or Invisible Web) is the part of the World Wide Web that cannot be found using normal search engines. In contrast to the Deep Web, the web pages accessible via search engines are called Clear Web, Visible Web, or Surface Web. The Deep Web consists largely of subject-specific databases and web pages. In summary, this is content that is not freely accessible and/or content that is not indexed by search engines or that is not intended to be indexed.

Types of the Deep Web

According to Sherman & Price (2001), five types of the Invisible Web are distinguished: "Opaque Web", "Private Web", "Proprietary Web", "Invisible Web" and "Truly invisible Web".

Opaque Web

The opaque web are web pages that could be indexed, but are currently not indexed for reasons of technical performance or cost-benefit ratio (search depth, visit frequency).

Search engines do not consider all directory levels and subpages of a website. When capturing web pages, web crawlers steer through links to the following web pages. Web crawlers themselves cannot navigate, even get lost in deep directory structures, fail to capture pages, and cannot find their way back to the home page. For this reason, search engines often consider five or six directory levels at most. Extensive and thus relevant documents can be located in deeper hierarchy levels and cannot be found by search engines due to the limited indexing depth.

In addition, there are file formats that can only be partially captured (for example, PDF files, Google indexes only part of a PDF file and provides the content as HTML).

There is a dependency on the frequency of indexing a website (daily, monthly). In addition, constantly updated data sets, such as online measurement data, are affected. Websites without hyperlinks or navigation systems, unlinked websites, hermit URLs or orphan pages also fall under this category.

Private Web

The Private Web describes web pages that could be indexed but are not indexed due to access restrictions imposed by the webmaster.

These can be web pages in the intranet (internal web pages), but also password protected data (registration and possibly password and login), access only for certain IP addresses, protection against indexing by the Robots Exclusion Standard or protection against indexing by the meta tag values noindex, nofollow and noimageindex in the source code of the web page.

Proprietary Web

Proprietary Web refers to websites that could be indexed, but are only accessible after accepting a condition of use or by entering a password (free or paid).

Such websites are usually only accessible after identification (web-based specialist databases).

Invisible Web

The Invisible Web includes web pages that could be indexed from a purely technical point of view, but are not indexed for commercial or strategic reasons - such as databases with a web form.

Truly Invisible Web

Truly Invisible Web refers to web pages that cannot (yet) be indexed for technical reasons. These can be database formats that originated before the WWW (some hosts), documents that cannot be displayed directly in the browser, non-standard formats (for example Flash), as well as file formats that cannot be captured due to their complexity (graphic formats). In addition, there are compressed data or web pages that can only be served via a user navigation that uses graphics (image maps) or scripts (frames).

Databases

Dynamically created database web pages

Web crawlers almost exclusively process static database web pages and cannot reach many dynamic database web pages, as they can only reach deeper pages through hyperlinks. However, those dynamic pages can often only be reached by filling out an HTML form, which a crawler cannot do at the moment.

Cooperative database providers allow search engines to access the contents of their database through mechanisms such as JDBC, as opposed to (normal) non-cooperative databases that only provide database access through a search form.

Hosts and subject databases

Hosts are commercial information providers that bundle specialized databases of different information producers within one interface. Some database providers (hosts) or database producers themselves operate relational databases whose data cannot be retrieved without a special access facility (retrieval language, retrieval tool). Web crawlers do not understand the structure or language needed to read information from these databases. Many hosts have been operating as online services since the 1970s, and some of them operate database systems in their databases that predate the WWW.

Examples of databases: library catalogues (OPAC), stock exchange quotations, timetables, legal texts, job exchanges, news, patents, telephone directories, web shops, dictionaries.

Questions and Answers

Q: What is the Deep Web?


A: The Deep Web is the part of the World Wide Web that cannot be searched on common search websites such as Google. It is also known as the Invisible Web or Hidden Web.

Q: Who first used the term "Deep Web"?


A: Mike Bergman, a computer scientist, was the first person to use the term "Deep Web" in 2000.

Q: Is darknet and Dark Web same thing as Deep Web?


A: No, they are not. A darknet is a type of computer network that is private or closed and it can be difficult to access. The Dark web is located in darknets and since no darknet can be found by Google or any other search website, it too falls under Deep web.

Q: What does IP stand for?


A: IP stands for Internet Protocol which contains important information about where a user is accessing the internet from.

Q: Why do people want privacy in internet?


A: People want privacy in internet for many reasons including doing things forbidden by governments such as piracy (sharing files protected by copyright laws).

Q: How do you access a darknet?


A: To access a darknet you would need to know a password, use specific computer programs, change your web browser configuration and other things depending on what network you are trying to access. Tor is an example of a commonly used darknet.

Q: What does 'piracy' mean?


A: Piracy means sharing files protected by copyright laws without permission from its owner/creator.

AlegsaOnline.com - 2020 / 2023 - License CC3