从文本JavaScript中删除HTML

有没有一种简单的方法可以在JavaScript中获取html字符串并去除html?

小胖泡芙2020/03/11 20:32:19

For escape characters also this will work using pattern matching:

myString.replace(/((&lt)|(<)(?:.|\n)*?(&gt)|(>))/gm, '');
蛋蛋蛋蛋2020/03/11 20:32:19

The accepted answer works fine mostly, however in IE if the html string is null you get the "null" (instead of ''). Fixed:

function strip(html)
{
   if (html == null) return "";
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}
米亚Eva2020/03/11 20:32:19

I have created a working regular expression myself:

str=str.replace(/(<\?[a-z]*(\s[^>]*)?\?(>|$)|<!\[[a-z]*\[|\]\]>|<!DOCTYPE[^>]*?(>|$)|<!--[\s\S]*?(-->|$)|<[a-z?!\/]([a-z0-9_:.])*(\s[^>]*)?(>|$))/gi, ''); 
LEY乐伽罗2020/03/11 20:32:18

I just needed to strip out the <a> tags and replace them with the text of the link.

This seems to work great.

htmlContent= htmlContent.replace(/<a.*href="(.*?)">/g, '');
htmlContent= htmlContent.replace(/<\/a>/g, '');
番长小哥2020/03/11 20:32:18

simple 2 line jquery to strip the html.

 var content = "<p>checking the html source&nbsp;</p><p>&nbsp;
  </p><p>with&nbsp;</p><p>all</p><p>the html&nbsp;</p><p>content</p>";

 var text = $(content).text();//It gets you the plain text
 console.log(text);//check the data in your console

 cj("#text_area_id").val(text);//set your content to text area using text_area_id
樱LEY神无2020/03/11 20:32:18
function stripHTML(my_string){
    var charArr   = my_string.split(''),
        resultArr = [],
        htmlZone  = 0,
        quoteZone = 0;
    for( x=0; x < charArr.length; x++ ){
     switch( charArr[x] + htmlZone + quoteZone ){
       case "<00" : htmlZone  = 1;break;
       case ">10" : htmlZone  = 0;resultArr.push(' ');break;
       case '"10' : quoteZone = 1;break;
       case "'10" : quoteZone = 2;break;
       case '"11' : 
       case "'12" : quoteZone = 0;break;
       default    : if(!htmlZone){ resultArr.push(charArr[x]); }
     }
    }
    return resultArr.join('');
}

Accounts for > inside attributes and <img onerror="javascript"> in newly created dom elements.

usage:

clean_string = stripHTML("string with <html> in it")

demo:

https://jsfiddle.net/gaby_de_wilde/pqayphzd/

demo of top answer doing the terrible things:

https://jsfiddle.net/gaby_de_wilde/6f0jymL6/1/

小胖Pro2020/03/11 20:32:18

If you want to keep the links and the structure of the content (h1, h2, etc) then you should check out TextVersionJS You can use it with any HTML, although it was created to convert an HTML email to plain text.

The usage is very simple. For example in node.js:

var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";

var textVersion = createTextVersion(yourHtml);

Or in the browser with pure js:

<script src="textversion.js"></script>
<script>
  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
  var textVersion = createTextVersion(yourHtml);
</script>

It also works with require.js:

define(["textversionjs"], function(createTextVersion) {
  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
  var textVersion = createTextVersion(yourHtml);
});
entaseven2020/03/11 20:32:18

This should do the work on any Javascript environment (NodeJS included).

const text = `
<html lang="en">
  <head>
    <style type="text/css">*{color:red}</style>
  </head>
  <body><b>This is some text</b><br/><body>
</html>`;

// Rule to remove inline CSS.
text.replace(/<style[^>]*>.*<\/style>/gm, '')
// Rule to remove all opening, closing and orphan HTML tags.
    .replace(/<[^>]+>/gm, '')
// Rule to remove leading spaces and repeated CR/LF.
    .replace(/([\r\n]+ +)+/gm, '');
卡卡西Near2020/03/11 20:32:18
myString.replace(/<[^>]*>?/gm, '');
猴子神乐2020/03/11 20:32:18

最简单的方法:

jQuery(html).text();

这将从html字符串中检索所有文本。

2020/03/11 20:32:18

如果您在浏览器中运行,那么最简单的方法就是让浏览器为您完成...

function stripHtml(html)
{
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}

注意:正如人们在评论中所指出的那样,如果您不控制HTML的源代码(例如,请勿在可能来自用户输入的任何内容上运行此代码),则最好避免这种情况。对于这些情况,您仍然可以让浏览器为您完成工作- 请参阅Saba关于使用现在广泛使用的DOMParser的答案