How to Parse HTML string in JavaScript or Angular 11

In any situation, the application receives HTML text as a string and programmatically we need to manipulate the HTML to get the values. There are many ways to parse HTML strings. 

In this article, I am going to discuss how many different ways, we can parse HTML and get the correct value.


Parse HTML String using DOMParser

The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM Document. DOMParser is very generic and you do not need to install any npm module to use it. You can check the referrance as well from here

const domParser = new DOMParser();

const htmlElement = domParser.parseFromString(htmlString, 'text/html');

const divObj = htmlElement.getElementById(`top-container`);

 

You can manipulate the HTML DOM by the runtime and create or modify values accordingly. 

DOMParser will not work if you want to parse HTML without a window object. It requires the browser window object to initialize. 


Parse HTML String using JQuery

Using jQuery as wll you can parse and manipulate the HTML string. You can install JQuery as a CDN or using npm package and import it in your angular componet.

declare const $;

$ will have the jquery object and use parseHTML() Method in jQuery is used to parses a string into an array of DOM nodes.

jQuery.parseHTML(data [, context ] [, keepScripts ])

Example 

var doc = $( "#container-id" );

str = "This <b>is a sample </b> text <b>for html parser.</b>";

          html = jQuery.parseHTML( str );

          nodeNames = [];

doc.append( html );

 

But as with DOMParser, this JQuery approach can not be used where the browser window object is not present. You can use this approach on the client-side only.


Parse HTML String using Cheerio

Cheerio is one of the great libraries to parse your HTML or XML string with or without browser presence. Cheerio mostly used in NodeJS or serverside rendering (SSR) or prerendering scenario. 

Many CMS created HTML, which can be manipulated and updated the page title or meta tag based on dynamic content, while creation of prerendering or static web content. To use Cheerio in an angular application you do need to install the following libraries. 

npm i cheerio --save

npm i @types/cheerio --save

npm i stream --save

after installation, you can use the Cheerio in the following way. 

const parsedHtml = cheerio.load(htmlContent);

const description = this.getDescription(parsedHtml, 'p');

 

getDescription(htmlDom, character): string {

const descriptionTextArray = this.getTextContentFromHTML(htmlDom, character);

return descriptionTextArray.join(' ');

}

 

Recurrsive way to get the text value from an HTML DOM.

getTextContentFromHTML(htmlDom: any, char?): Array<string> {

let content: any;

if (char) {

content = htmlDom(char);

this.textArray = [];

} else {

content = htmlDom;

}

 

if (content.length && content.length > 0) {

content = content[0];

content = (content && content.children) ? content.children : content;

 

if (content.type === 'text') {

this.textArray.push(content.data);

if (content && content.next) {

content = content.next;

return this.getTextContentFromHTML(content);

} else {

return this.textArray;

}

} else {

return this.getTextContentFromHTML(content);

}

} else {

content = (content && content.children) ? content.children[0] : content;

if (content.type === 'text') {

this.textArray.push(content.data);

if (content && content.next) {

content = content.next;

return this.getTextContentFromHTML(content);

} else {

return this.textArray;

}

} else {

return this.getTextContentFromHTML(content);

}

}

}

 

You can use the code snippets to extract text from a particular HTML string. Hope you like the article and for any suggestion please comments below.

Thanks,

- LP