Dark Web
January 15, 2024

Unveiling the Depths: A Comprehensive Introduction to the Dark Web and Its Intricacies

The Internet is an immense archive that contains extremely large amounts of data. One of the most sophisticated methods for collecting data from the Internet is called "web data extraction," also known as "web scraping" or "web harvesting”. Such tools can help companies with the following: 

 

1) Gain market and competitive intelligence, 

2) Keep up to date with changes to compliance and regulation terms, and 

3) Stay abreast with developments in their industry. 

 

This level of data extraction provides access to a large repository of content that is usually hidden. The figure below shows the Internet’s primary levels:

 

 

The Internet’s three primary levels 

The Surface Web

The Surface Web, also known as the “Indexed Web”, or “Visible Web”, or “Lightnet”, is anything you can find on the regular World Wide Web. It is readily available to the general public and the primary entry point for most people. This portion of the Internet consists of websites indexed by regular search engines, i.e., Google, Bing, Yahoo, etc. It covers around 5 percent of what is available on the Internet. The Surface Web can be accessed by using standard web browsers that do not require special configuration, i.e., Internet Explorer, Google Chrome, Opera, and others.

 

The Deep Web

The Deep Web, which is located below the Surface Web, carries 90% of all Internet traffic. It contains most of the Internet’s data, and standard search engines do not index its contents. In fact, it is impossible to determine with any degree of accuracy how many pages or websites are present on the Deep Web. Specifically, it includes websites that require authentication to access them, i.e., accessing your private Facebook page by using a username and password. Most of the Deep Web’s contents are related to academic journals, private databases, online banking, medical records, and many others. The Deep Web also includes the portion of the Internet known as the Dark Web. 

 

The Dark Web

Websites that are not indexed and are only reachable through specialized browsers can be found on the Dark Web. The Dark Web is regarded as a portion of the Deep Web, although being far smaller than the little Surface Web. The Dark Web would be the bottom point of the buried iceberg in the iceberg analogy. To access this part of the Internet, you need software such as the TOR browser. This software connects you to an overlay network that masks your IP address, making your online activity highly anonymous and secure. The Dark Web can be used for legitimate purposes as well as to conduct criminal activities related to weapons, drugs, child pornography, assassinations, zoophilia, modern slavery, organ trade, and many others. The Dark Web is at the center of the debate over whether anonymity on the Internet should be maintained despite the illegal activity that it enables. Policy-makers must fully understand the Dark Web to engage effectively in the debate and enact efficient Dark Web policies.

 

What is the Dark Web and how it works

Many people around the world have heard of the Dark Web. It is portrayed as a den of illicit activity. Like most stereotypes, that is a misconception with some sort of truth behind it. To shed light on the Dark Web, users must first understand what it is and how it differs from what most people wrongly consider to be the Internet. The Dark Web is often confused with the Deep Web, but it is essential to understand that these two parts of the Internet are different entities. To put it simply, the Dark Web is a distinct area of the Deep Web—roughly 0.01% of it. There are a few distinguishing characteristics that a website must meet to be considered a Dark Web site. A Dark Web site must only be able to be accessed anonymously through a specialized browser/software such as TOR, Freenet, Tails, Orbot, Whonix, or I2P. Specifically, websites accessed through the TOR browser have the domain suffix “.onion”, a special top-level domain name referring to an anonymous onion service. The figure below shows what the full onion URL of the popular search engine DuckDuckGo looks like. 

 

 

TOR, also known as the “The Onion Router”, is the most popular and used Dark Web browser. As of August 2022, it has approximately 2 million users worldwide. A simplistic interpretation of TOR is that of an open-source browser. Specifically, the TOR browser is an adaption of the Firefox browser, having several similar characteristics. In a TOR network, thousands of volunteers around the world run/operate relays (servers) that route traffic. The traffic is relayed and encrypted multiple times as it passes over the TOR network. Only a handful of alternative technologies can match TOR’s sophisticated features. The fact that all TOR users look alike on the Internet makes TOR one of the most elite cyberspace technologies of all time. 

 

Another popular Dark Web tool is I2P, also known as the “Invisible Internet Project”. I2P is an anonymous network like TOR, and it utilizes the “end-to-end encryption” standard. This standard is a secure communication method that prevents third parties from accessing data packets while they are transferred from one end system to another. The main difference with TOR is that I2P does not rely on a centralized database of server nodes, as it uses “garlic routing” rather than TOR’s “onion routing”. Garlic routing is an extension of onion routing, and like garlic cloves, it works by encrypting multiple messages together. The garlic technology increases data speed and makes it more difficult for attackers to perform traffic analysis. I2P’s decentralized approach has two significant advantages: 1) Better scalability and 2) No trusted central party.

 

Conclusion

In conclusion, the exploration of the Dark Web reveals a multifaceted landscape within the vast expanse of the Internet. The article underscores the significance of web data extraction for businesses, offering valuable insights into market intelligence and regulatory changes. By dissecting the Internet into its three primary levels—Surface Web, Deep Web, and the mysterious Dark Web—we gain a nuanced understanding of the diverse content and access requirements. The Dark Web, positioned as a subset of the Deep Web, emerges as a focal point for both legitimate and illicit activities, necessitating specialized tools like TOR and I2P for access. The TOR browser, with its extensive user base, stands out for its relay-based anonymity, while I2P introduces a decentralized and secure alternative. 

 

As the debate over Internet anonymity unfolds, policymakers are urged to comprehend the intricacies of the Dark Web to formulate effective regulations that balance the need for privacy with the prevention of illicit activities. The article serves as a comprehensive guide, unraveling the layers of the Dark Web and contributing to a nuanced discourse on its impact and regulation in the digital age.

Nearchos Nearchou

Nearchos Nearchou

Nearchos Nearchou is a determined person and 1st Class BSc (Hons) Computer Science and MSc Cyber Security graduate. He is a big tech-lover and spent several years exploring new innovations in the IT field. Driven by his passion for learning, he is pursuing a career in the Cyber Security world. Passionate about learning new skills and information that can be used for further personal and career development. Finally, he is the author of the book “Combating Crime On The Dark Web”.

What is web data extraction, and how does it benefit companies?

Web data extraction, also known as web scraping, is a sophisticated method for collecting data from the Internet. It benefits companies by providing market intelligence, keeping them updated on compliance changes, and enabling them to stay abreast of industry developments.

How much of the Internet does the Surface Web cover, and how is it accessed?

The Surface Web, which is publicly accessible and indexed by standard search engines, covers around 5% of the total Internet. It can be accessed using standard web browsers like Internet Explorer, Google Chrome, Opera, and others.

What distinguishes the Dark Web from the Deep Web, and how is it accessed?

The Dark Web is a subset of the Deep Web, constituting around 0.01% of it. It contains websites not indexed by search engines and requires specialized browsers like TOR, Freenet, Tails, Orbot, Whonix, or I2P for access.

What are the characteristics of a Dark Web site, and how is the TOR browser involved?

A Dark Web site can only be accessed anonymously through specialized browsers like TOR. TOR, or The Onion Router, is the most popular Dark Web browser, connecting users to a network of relays that route and encrypt traffic, ensuring a high level of anonymity.

Why is understanding the Dark Web crucial for policymakers?

Policymakers must comprehend the Dark Web's complexities to engage effectively in debates surrounding Internet anonymity. The Dark Web is central to discussions on whether online anonymity should be maintained despite its association with illegal activities, necessitating efficient policies to strike a balance.

Share

Leave a comment

Please note, comments need to be approved before they are published.