Deep Web: The invisible side of the internet

If you want to find something on the internet, you usually google it. But there are websites that even the popular search engine cannot find. Because they are in the so-called deep web.

The Internet is best compared to our universe. Just as the universe is getting bigger, the amount of content on the internet is increasing every day. Another thing they have in common: we only know a tiny fraction of the vastness of space. This also applies to the Internet, which consists of two parts, the visible and the invisible part, the so-called deep web. The amazing thing: The deep web is many times larger than all the content that can be found via Google, for example.

No search engine and no human knows how big the Deep Web actually is. However, there are assumptions. A study from 2001 speculates that the deep web is about 550 times larger than the visible web, i.e. the visible part of the Internet. The extent has probably increased in the past 20 years.

In order to best depict the two parts, experts therefore like to use the motif of an iceberg. The small part above the water surface symbolizes the visible content and the much larger part below stands for all invisible websites.

Is the Deep Web another word for the Dark Web?

If content cannot be found or should not be found, the assumption is obvious: Criminals are up to mischief here who want to keep all data traffic secret. This is not entirely true of the deep web. Illegal arms deals or the exchange of prohibited files happen within the Dark Web or Darknet. However, this dark part of the Internet belongs to the Deep Web.

The dark web cannot simply be tracked down and entered. This is a kind of private club to which someone must be invited. However, this is not possible with Firefox or Google Chrome. This requires special browsers, such as the Tor browser. In order to actually get into the dark web, in-depth IT knowledge is required. In any case, the Deep Web has nothing to do with the Dark Web.

Also read: Can you really surf the Internet anonymously with the Tor browser?

Why are websites invisible?

Even the deep web cannot be reached with the usual browsers. Even for the largest search engine in the world, Google, a large part of the Internet remains completely invisible. How can that be? In principle, every visible website can be made invisible with very simple methods. A small digression on how a search engine detects content in the first place.

Basically, the visible web works like a table of contents. There is a so-called index. Basically, if a website is indexed, it can be found by any search engine.

Simply put, there are two possibilities why web pages do not appear in the index:

  • Technical reasons: A search engine excludes pages from the index because the content is deeply nested and very extensive.
  • Wanted or self-inflicted: Here the site operator has deliberately or unconsciously prevented indexing by programming.

A typical programming trick works on the source code of a website, i.e. the level at which programming determines how a website should look on the screen. Adding the HTML command noindex automatically lands a webpage on the deep web.

The Five Categories of the Deep Web

Taking a closer look at the deep web, there are even more reasons why websites cannot be found via the usual search engines and are therefore invisible.

Experts divide the Deep Web into five categories:

  • Invisible Web: These websites are deliberately not indexed by the site operators, for example with the noindex command. One can speculate about the reasons. A criminal background is irrelevant.
  • opaque web: Opaque means opaque or opaque. These websites can basically be indexed. Due to the depth of the website, the content cannot be completely penetrated by the search engine. It also includes certain media and file types, such as large PDF documents. Spam sites also fly out of the index. These are websites that have only been programmed to get higher up in Google rankings. Even new websites are included because a search engine needs a few days to completely penetrate and index new pages. For example, news articles also fall through the cracks because a topic is only relevant for a very short time.
  • private web: This includes all websites that do not have a URL, only an IP address, password-protected pages and the large databases of libraries, colleges or universities. Of course, all intranet pages too.
  • Proprietary Web: Similar to the private web, search engines do not have access because, for example, the terms of use have to be agreed to or registration is necessary. The content behind it can be valuable, but is invisible to the search engine. While Google can do a lot, it can’t fill out a page registration—yet.
  • Truly Invisible Web: This primarily includes non-standard file formats such as Flash or software-specific formats.

Is the Deep Web really invisible?

Clear answer: no. Even the invisible content can be found. There are thousands of special search engines for this, including for

  • Scientific websites – https://www.wolframalpha.com/
  • News articles that are particularly up-to-date and therefore not found by general search engines – https://paperball.news/
  • Special file formats – https://duckduckgo.com/ (simply add to the search term filetype:pdf for different document formats or contains:mp3 for audio or video formats)

Educational institutions usually have their own search engines to browse through specialist databases. These are organized, for example, via a so-called Online Public Access Catalog – OPAC for short. This is online access for libraries, for example at universities.

ttn-35