Counsel in the heart of man is like deep water; But a man of understanding will draw it out.
Proverbs 20:5
Here is a simple solution to extract all internal links (links to the same domain) from the current web page. This JavaScript solution is especially useful if we need to check the number of internal links to our website. Especially on large pages (for example, on a news website) where there are a large number of articles, photos, videos and other multimedia on the page. Visually detecting internal links would be a difficult task.
Script should be used only on the Same-origin webpage, because Cross-origin resource sharing (CORS) policy. This means that it can not produce results if called as a function (library) from another domain.
<h2>Internal links extraction example</h2>
<h3>Random test web links on the page</h3>
<div>
<a href="/" target="_blank">Home page</a><span>|</span>
<a href="https://google.com" target="_blank">Google</a><span>|</span>
<a href="https://amazon.com" target="_blank">Amazon</a><span>|</span>
<a href="https://ebay.com" target="_blank">Ebay</a><span>|</span>
<a href="https://simplesolutions.ml/contact" target="_blank">Simple solutions contact</a><span>|</span>
<a href="https://simplesolutions.ml/privacy-policy" target="_blank">Simple solutions Privacy Policy</a>
</div><br>
<h3>All extracted internal links from this page</h3>
<script>
// Check if page exists.
function URLCheck(url) {
var address = new XMLHttpRequest();
address.open('HEAD', url, false);
address.send();
return address.status;
}
// Get all internal links from the current page.
function getInternalLinks() {
const internalLinks = [];
var links = document.links, linksNumber = document.links.length, host = window.location.hostname, j = 0, i;
for (i = 0; i < linksNumber; i++) {
if (links[i].href.search(host) > -1 && links[i].href.search("#") < 0 && URLCheck(links[i].href) == 200) {
internalLinks[j] = links[i];
j++;
}
}
return internalLinks;
}
// Remove duplicate items.
function getUnique(a) {
var seen = {};
return a.filter(function(item) {
return seen.hasOwnProperty(item) ? false : (seen[item] = true);
});
}
window.onload = function() {
const allInternalLinks = getUnique(getInternalLinks());
var body = document.getElementsByTagName("body")[0], newParagraph, paragraphContent = "", n;
for (n = 0; n < allInternalLinks.length; n++) {
paragraphContent = paragraphContent + allInternalLinks[n] + "<br>";
}
newParagraph = document.createElement("p");
newParagraph.innerHTML = paragraphContent;
body.appendChild(newParagraph);
}
</script>
Take into account the fact that the program checks for the validity of each internal link (if the page exists, checking for all may take longer) and returns a result only for existing pages.
You can test this solution using Online HTML Editor.
Comments