从文本JavaScript中删除HTML

JavaScript

达蒙小胖

2020-03-11

有没有一种简单的方法可以在JavaScript中获取html字符串并去除html?

第812篇《从文本JavaScript中删除HTML》来自Winter(https://github.com/aiyld/aiyld.github.io)的站点

11个回答
小胖泡芙 2020.03.11

For escape characters also this will work using pattern matching:

myString.replace(/((&lt)|(<)(?:.|\n)*?(&gt)|(>))/gm, '');
蛋蛋蛋蛋 2020.03.11

The accepted answer works fine mostly, however in IE if the html string is null you get the "null" (instead of ''). Fixed:

function strip(html)
{
   if (html == null) return "";
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}
米亚Eva 2020.03.11

I have created a working regular expression myself:

str=str.replace(/(<\?[a-z]*(\s[^>]*)?\?(>|$)|<!\[[a-z]*\[|\]\]>|<!DOCTYPE[^>]*?(>|$)|<!--[\s\S]*?(-->|$)|<[a-z?!\/]([a-z0-9_:.])*(\s[^>]*)?(>|$))/gi, ''); 
LEY乐伽罗 2020.03.11

I just needed to strip out the <a> tags and replace them with the text of the link.

This seems to work great.

htmlContent= htmlContent.replace(/<a.*href="(.*?)">/g, '');
htmlContent= htmlContent.replace(/<\/a>/g, '');
番长小哥 2020.03.11

simple 2 line jquery to strip the html.

 var content = "<p>checking the html source&nbsp;</p><p>&nbsp;
  </p><p>with&nbsp;</p><p>all</p><p>the html&nbsp;</p><p>content</p>";

 var text = $(content).text();//It gets you the plain text
 console.log(text);//check the data in your console

 cj("#text_area_id").val(text);//set your content to text area using text_area_id
樱LEY神无 2020.03.11
function stripHTML(my_string){
    var charArr   = my_string.split(''),
        resultArr = [],
        htmlZone  = 0,
        quoteZone = 0;
    for( x=0; x < charArr.length; x++ ){
     switch( charArr[x] + htmlZone + quoteZone ){
       case "<00" : htmlZone  = 1;break;
       case ">10" : htmlZone  = 0;resultArr.push(' ');break;
       case '"10' : quoteZone = 1;break;
       case "'10" : quoteZone = 2;break;
       case '"11' : 
       case "'12" : quoteZone = 0;break;
       default    : if(!htmlZone){ resultArr.push(charArr[x]); }
     }
    }
    return resultArr.join('');
}

Accounts for > inside attributes and <img onerror="javascript"> in newly created dom elements.

usage:

clean_string = stripHTML("string with <html> in it")

demo:

https://jsfiddle.net/gaby_de_wilde/pqayphzd/

demo of top answer doing the terrible things:

https://jsfiddle.net/gaby_de_wilde/6f0jymL6/1/

小胖Pro 2020.03.11

If you want to keep the links and the structure of the content (h1, h2, etc) then you should check out TextVersionJS You can use it with any HTML, although it was created to convert an HTML email to plain text.

The usage is very simple. For example in node.js:

var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";

var textVersion = createTextVersion(yourHtml);

Or in the browser with pure js:

<script src="textversion.js"></script>
<script>
  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
  var textVersion = createTextVersion(yourHtml);
</script>

It also works with require.js:

define(["textversionjs"], function(createTextVersion) {
  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
  var textVersion = createTextVersion(yourHtml);
});
entaseven 2020.03.11

This should do the work on any Javascript environment (NodeJS included).

const text = `
<html lang="en">
  <head>
    <style type="text/css">*{color:red}</style>
  </head>
  <body><b>This is some text</b><br/><body>
</html>`;

// Rule to remove inline CSS.
text.replace(/<style[^>]*>.*<\/style>/gm, '')
// Rule to remove all opening, closing and orphan HTML tags.
    .replace(/<[^>]+>/gm, '')
// Rule to remove leading spaces and repeated CR/LF.
    .replace(/([\r\n]+ +)+/gm, '');
卡卡西Near 2020.03.11
myString.replace(/<[^>]*>?/gm, '');
猴子神乐 2020.03.11

最简单的方法:

jQuery(html).text();

这将从html字符串中检索所有文本。

null 2020.03.11

如果您在浏览器中运行,那么最简单的方法就是让浏览器为您完成...

function stripHtml(html)
{
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}

注意:正如人们在评论中所指出的那样,如果您不控制HTML的源代码(例如,请勿在可能来自用户输入的任何内容上运行此代码),则最好避免这种情况。对于这些情况,您仍然可以让浏览器为您完成工作- 请参阅Saba关于使用现在广泛使用的DOMParser的答案

问题类别

JavaScript Ckeditor Python Webpack TypeScript Vue.js React.js ExpressJS KoaJS CSS Node.js HTML Django 单元测试 PHP Asp.net jQuery Bootstrap IOS Android