Those who index websites with search engines are frequently confronted with error messages in the results. Some of these relate to page content that is supposedly "duplicate," meaning it is a "duplicate."
The meta tags
Websites have a specific area, separate from the visual appearance, that describes the website. This so-called meta section was developed specifically for search engines and allows additional information to be integrated into the page. This information is not visible to visitors. This additional information is located in the head section of each web page and can provide a wide range of details. A page without visual content, based on HTML and containing meta information, might look like this:
<html><head>
<title>Peter Mustermann's website</title>
This meta tag specifies the title of the website.
<meta name="description" content="This is my private website">
This meta information provides a description of the website.
<meta name="keywords" content="Peter, Mustermann, private, homepage">
Provides short keywords to describe the content with single words.
<link href="/https :// www. myhomepage . com/meininhalt/" rel="canonical"/>
The relationship attribute "rel" (relationship) classifies the preceding hyperlink as canonical. The URL specified in the hyperlink describes the path to the "original page".
</head><body></body></html>
(The spaces in the URLs are for clarity and should not exist in practice)
Within the meta information, tags represent an organizational unit for information. A canonical tag is an HTML attribute value that identifies a URL as canonical for the page content. If a page is accessible via multiple URLs, the specified URL is considered the primary or original page by search engines.
The problem
Especially in the age of Web Content Management Systems (WCMS), individual pages can be accessed via different URL formats. These WCMS systems usually offer settings for search engine-friendly URL output and also provide addressing options for internal structure management. This allows one and the same page to be accessible to search engines under different URLs.
(The spaces in the URLs are for clarity and should not exist in practice)
- https :// www. myhomepage . com/mycontent
- https :// www. myhomepage . com/index.php/mycontent
- https :// www. myhomepage . com/?com_content&id=3
This would result in search engines perceiving all three pages as different target pages with identical content. For the search engine, this would mean at least two duplicates. To prevent this incorrect indexing, the URL of the canonical page can be specified in the meta section. This allows the canonical page to be included as part of the index.
Generally, all major search engines can evaluate these canonical tags. If a WCMS is used for the website, it should be checked whether this management system offers an option to designate a single page as canonical. Some WCMS (e.g., WordPress) also offer extensions (so-called "plugins") that allow such a function to be installed retroactively and canonical pages to be managed.