Flex: Export valid HTML from RichTextEditor and back

I’ve been looking for a RichTextEditor valid HTML parser for long time. I’ve seen all this regexp solutions but is not working all the time specially if you try to send an email with this RTE or to convert from HTML to RTE.

The solution to this problem is a XML approach. Maybe is a little messy but it’s working, I have tested with all kinds of emails like Exchange(Browser and Outlook), Yahoo, MSN. Also tested on IE6, IE7, FF, Opera. The HTML code looks good.

This code was inspired from Antoni Jakubiak’s post on some blog somewhere.

Here are some nice stuff you can do with this parser:

  • <TEXTFORMAT> will be completely removed,
  • <p> tag can be replaced with any tag you like, the default is <div>
  • <font> tag can be replaced with any other tag, default is <span>
  • <li> can be replaced, default is <li>
  • <li> will be inside <ul> by default but it can be replaced with anything else (e.g. <ol>, <dir> etc.)
  • <br/> tag added when you want a space between paragraphs
  • the ability to convert back to RTE valid XML from HTML

You can give it a try and please let me know how did it work. Suggestions are always welcome.

Here is a live example and source(rar) or source(zip) files.

UPDATE:

After K.A. noticed the space problem when parsing from RTE to HTML and back I had to escape the content text using XMLDocument and then parse it to XML witch has a better element management.

There is a catch when using both XML and XMLDocument. In XML the white space is ignored at the beginning and the end of the text node. In XMLDocument all the special characters like ‘ (apostrophe) will be changed to &apos; and it’s a big problem with parsing to CSS, like style=”font-family:’Times New Roman’;” will be style=”font-family:&apos;Times New Roman&apos;;”.

That’s why when parsing to RTE and Back the string that is passed should be plain string with no indent or line brakes (e.g. /n), otherwise the indent will show up in the RTE as space.

UPDATE:

I made an update so if you want to completely remove a TAG then simply set it to null.

UPDATE:

New property added ‘ignoreParagraphSpace’, the default value is false. If set to true this will ignore any space at the beginning or at the end of the paragraph.

86 Responses to “Flex: Export valid HTML from RichTextEditor and back”

  1. K. A. writes:

    If you change color of any one word in between a sentence, and convert to HTML, it will trim spaces between words. For example “My Test Sentence” having “Test” in different color, will result in “MyTestSentence”. Formatting is correct, but it is losing spaces.

  2. scuty writes:

    I just updated this post! Now the space problem it’s solved. It was a good observation since I use this in one of my projects.

    Thanks for pointing that out! That was good one!

  3. gwb writes:

    This is what I need! How can I implement this “tohtml” in PHP? i.e., I need to edit data from a database through PHP…thanks!

  4. scuty writes:

    Simply use ParseToHTML to get the string that you want to save in the database then send it to php to be saved in the database.

    I hope this is what you asked me.

  5. sydd writes:

    Looks great!
    Altough i found a bug:
    If you apply text align AND list to the same text the text align gets lost :(

  6. sydd writes:

    umm nevemind, its a flex bug. Link:
    http://bugs.adobe.com/jira/browse/SDK-14486
    And the Adobe guys dont seem to be in a hurry to fix it….

  7. Tommy writes:

    Looks great. I do have a problem if the users start entering in equations that take the less than sign (<). The htmlText has already converted this to the html name < and the xml value has the html number %3C but the ParseToHTML function returns it as < which messes up when you put it back into the ParseToRTE function.

  8. scuty writes:

    Thanks Tommy! I’ll look into it since I am using this in my projects.

    edit:
    Problem solved Tommy! The new version is up.
    Thanks for pointing that out!

  9. Tommy writes:

    Do you have the source as a zip file? Having an issue with the rar file.

  10. scuty writes:

    See if the zip files would work.

  11. Tommy writes:

    Sorry to keep bugging you but your example seems to work perfectly but I can’t get the source file to parse the < sign correctly. Would you double check that the source files are correct?

  12. scuty writes:

    You are right! The source files are the old files … I am not sure how did that happen.

    The new files are up now.

  13. Tommy writes:

    I knew the less than sign was a simple fix, I just could find it. Great job!

  14. scuty writes:

    Thanks! I hope you enjoy this small script.

  15. binod writes:

    I faced one issue while parsing back html to RTE.
    I was getting space prefixed to every word.
    e.g following content

    Hello Binod

    again

    is getting displayed as

    Hello binod

    Binod

    When I looked at the public function ParseToRTE(string:String)
    in rteHtmlParser.as

    I have modified the following line

    //var nxml:XMLNode = manage_space(xml_doc.firstChild);
    var nxml:XMLNode = xml_doc.firstChild;

    Do you see any issue due to that?

  16. scuty writes:

    Well … manage_space is good when you add space in front of some line. When you send it back to RTE the spaces in front of the line are gone.

    e.g.
        some text

    will become

    some text

    with no space.

    But what text did you send to the ParseToRTE because all I do with manage_space is to replace space with %20 and I don’t think that is the issue?

  17. Tushar writes:

    whats the difference between the editor you created and Flex 3 RTE’s htmltext method?

  18. scuty writes:

    HUGE! Try to add the flex rte htmltext in a web page!

  19. Tushar writes:

    I am facing a similar problem to Binod.
    If you use manage_space, RTE appends space in front of every word when converting from html. e.g
    Hello Everyone

    will become
    Hello Everyone

    If you dont use manage_space, then the extra spaces added in rte are gone in the html. e.g.
    Hello Someone

    becomes
    Hello Someone

    Please help me fix this.

  20. Tushar writes:

    the spaces got distorted in my previous comment hope you understand this.

  21. scuty writes:

    so you don’t want the extra space in rte when you convert back from html?

    because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.

    I can make that manage_space optional and you can decide if you want it or not.

  22. Tushar writes:

    this is correct: because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.

    also if you use manage_space: if you “dont” add any space in rte and convert to html and back…it adds spaces in front which is a bug, rite?

  23. scuty writes:

    no. if you don’t add any space in rte and convert it to html and back there will be no space! the thing that I wanted with this parser is that I wanted to preserve all the format that you add in rte after you convert it back from html.

    there are some comments posted here about this problem.

  24. Tushar writes:

    sorry, that was my mistake, i was using XMLFormat which led to the weird behavior.
    I have one more problem though, not necessarily connected with your parser: If I choose any other font other than the default font, the RTE does not respond to any key stroke. Am i missing a font file? (Even the basic Arial, Times New Roman fonts dont work)

  25. scuty writes:

    works fine on my side! I think this is a RTE component problem.

    did you try to use other browsers? maybe it’s buggy on you current browser.

  26. lordB8r writes:

    This is a great idea and I wanted to use it in my project…however my needs involve adding the htmlText into an XML string that gets parsed by Flex. While the string from the rteHtmlParser was correct, Flex kepting changing my string into xml w/ the incorrect spacing (that RTE’s don’t like). I was then sending this across the wire to be stored, and upon retrieval, the xml was incorrectly formatted.

    To get around this (which causes awful spacing problems), I simply took the htmlText, Base64 encoded it, then added that to my xml to shoot across the wire (actually, all of my xml gets base64 encoded – so my htmlText actually is double encoded). Then, when I read it back in, I decode (twice) and get no spacing issues.

    So in the end, this didn’t work for me, but I can see how it would be useful in other circumstances.

  27. jeeva writes:

    That’s great…
    Its help me .its works fine but small problem
    if the Text “My Example” to change the font-weoght to bold it will come tab space when it display agan…

  28. scuty writes:

    If you try to insert the bold manually in the parsed text or if you do any other changes try not to leave blank space at all between the html tags.

  29. Naso a writes:

    Hi Scuty,

    Great class, great post!!! Thanks heaps for sharing!

    I have a question, if I may. The built-in mx:RichTextEditor class offers only limited formatting features, so I needed to develop my own RTE. It’s based on mx:TextArea and the formatting bit relies entirely on fl:TextFormat. However, it adds some properties to the htmlText such as leading and blockindent (and a few more), which your class doesn’t seem to parse. My question is … do you think those could be added in the future? If that’s possible, your tool would become the ultimate solution (I am referring to the many RegExp partial solutions around).

    Cheers,

    Naso.

  30. scuty writes:

    Thanks Naso for the great feedback!
    So you want to use TextArea instead of RichTextEditor and you want to parse the HTML with the TextFormat as CSS?

    Maybe you can help me by sending me a flex project with the TextArea formated with the TextFormat and I can see how the HTML looks like and I am sure that the rest should be fairly easy. Also if you send me that include all the supported format features (those a few more).

    Here is my email: … .

  31. Cosmin Dorobantu writes:

    Nice work, Scuty!

  32. KevMull writes:

    How do I completely remove all tags ?

  33. KevMull writes:

    Please ignore previous.

    How do I completely remove all SPAN tags ?

  34. scuty writes:

    I don’t think you can. But if you really want it I can make it so if you set the tag null then it will be ignored.

    But do you really need that? How are you going to convert it back to RTE?

  35. KevMull writes:

    Null will be fine.
    I want to keep the html as clean and simple as possible, I’ve removed the font dropdown options from the RTE as an external css will deal wit this. My MSSQL server db complains if if there are quotes so I parse them in PHP, hence why i need them removed and all that;s in them.
    I don’t need to convert it back to RTE as it reads the htm

  36. KevMull writes:

    To put it simply I need the orig RTE FONT tag removing and not converting to a SPAN tag at all

  37. scuty writes:

    The solution is not to cut the quotes, as you will always have this problem. But to figure out a way to insert quotes in the database. The easy way is the use parameters instead of in line query variables, using parameters you will be able to save HTML in the database.

    I will do that as soon as I can.

  38. Exportare/Importare de HTML din/in FLEX « Eduard Pal writes:

    [...] Exportare/Importare de HTML din/in FLEX By eduardpal Sunt sigur ca multi ati observat ca HTML-ul generat de componentele text din FLEX nu este valid. Asa ca una dintre solutii ar fi sa utlizati un tool de conversie. Un astfel de tool am gasit la http://blog.flashweb.org/archives/7. [...]

  39. scuty writes:

    KevMull – you can now download the new version of the RTE parses and if you want to remove a TAG then just set it to null and that tag will be completely removed.

  40. scuty writes:

    Naso – can you post that list of the CSS tags that you need to be parsed?

  41. Alex writes:

    This is some great stuff. Thanks for sharing.

  42. jmp writes:

    Thanks for a great, easy to use tool. Separate question, if you would be so kind, how do I keep paragraph indents from showing. Doesn’t seem to be anything in markup and displays correctly on any report just indents show up to user.

  43. scuty writes:

    I’ll add something in a few days.

    edit:
    I updated the post and the code files, I hope this is what you wanted jmp.

  44. Jay writes:

    Hi scuty,
    I don’t know if was just me but the output doesn’t include the style you apply to the text, am I right? For instance, if you change the color in the middle of the text, the output will just ignore the font style…

    This is a great comp, thanks for sharing!

  45. scuty writes:

    It should output everything! It might be a bug … I’ll fix it!

    OK now it’s working. SET_FONT was set to null and was ignoring all the FONT tags. The code is fine.

  46. Kwaku writes:

    Great work! keep it up

  47. Trevor writes:

    Upercase tags gives validation error :-/
    Can you provide a recent vercion with lowercase tags ?
    I tried but it didn’t work for STYLE
    And also I could not change to lowercase and I couldn’t remove some useless tags in
    Regards
    Trevor

  48. Akshit writes:

    Great Job!!!!!!!!11111

    if i want to add section of tables in rich text editor and want to parse in html
    can u give me some update on this

    waiting for ur reply…..

  49. scuty writes:

    @Trevor: I see what I can do. I was in vacation so …

    @Akshit: I don’t think tables are supported in flash. What exactly do you want to do?

  50. Akshit writes:

    i want to add a button in richtext editor called as table when click on it i want it will asked about rows and columns and after entering the the rows and columns it should show a tabular format containing that much row and columns in which then we can add data

  51. scuty writes:

    This has nothing to do with this class. This is more extending the RTE component in flex. You need to look for a RTE component that does that.

    I am sorry …

  52. Akshit writes:

    thanks but u get any upadte abt it then let me know

    i have modified richtexteditor by using datagrid in it.

  53. cipri writes:

    Hi guys,
    does anyone have this code translated to c# ?
    I’d appreciate it very much.
    I am interested in converting the RTE into HTML code.

    Thanks

  54. scuty writes:

    This is a Flex(Flash) code and has nothing to do with C#.

  55. Paul writes:

    Hi scuty,

    You have done a great job! Thanks!
    I have one “feature request”:
    Is it possible to generate HTML preserving spaces so if the html is rendered in a browser to get the same formatting. Now using the span tag you see in the browser all spaces gone. Maybe they should be replaced by nbsp ?
    What is your oppinion?

  56. scuty writes:

    Hi Paul,

    Try the “ignoreParagraphSpace” option, that should do it I think. There were other people complaining about this and I fixed a while back.

    EDIT:
    nvm … I know what you mean. I’ll see what I can do.

  57. Paul writes:

    Hi scuty,

    ignoreParagraphSpace is false by default and as I understand it is used to preserve leading spaces between RTEHTML conversions when visualizing the content into RichTextEditor. My problem is that when generated HTML is used to get rendered in Firefox/IE the leading spaces are skipped (because tag). I tried to replace with

      - then it keeps leading spaces but i got extra line breaks in my final browser rendering...
  58. Paul writes:

    I tried to replace with – then it keeps leading spaces but i got extra line breaks in my final browser rendering…

  59. Paul writes:

    I tried to replace span with pre – then it keeps leading spaces but i got extra line breaks in my final browser rendering…

  60. scuty writes:

    I know … I’ll look into it.

  61. Yann writes:

    Hi Scuty,
    this is some really nice work !
    I couldn’t find a license in the sources, is it okay to use it in a commercial project ?

  62. scuty writes:

    Yes, you can use it in your project. I will update the license as soon as I can.

  63. Murray writes:

    scuty – I love this…well done!

    Weve been writing a custom text area using regex to actually produce code to eventually wind up in PDF. however the faulty html causes problems due to the tag rearranging.

    Would it be possible to extend the TA /RTA class as in:
    override public function set htmlText(value:String):void{
    super.htmlText = value
    }
    I know this is really adobe’s flex community’s task but wouldn’t that be the real aim or goal?

    Perhaps the question really is where does one access the super.htmlText prior to the mess?

  64. scuty writes:

    @Murray

    So you want to extend the RTE with this class? And override the set; and get; htmlText?

  65. Murray writes:

    Hi there scuty
    Thanks for the reply.
    Spot on – yes to extend the RTE or TA.
    like:
    package
    {
    import mx.controls.RichTextEditor;

    public class TrueHTMLRichTextArea extends RichTextEditor
    {
    public function TrueHTMLRichTextArea()
    {
    super();
    }
    public function get trueHTMLText():String {
    return convertTohtmltext(this.htmlText);
    }
    public static function convertTohtmltext(str:String):String{

    ///your RAD regex and conversion!

    }

    }
    }

    then you could access the html by yourTA/RTA.trueHTMLText;

  66. scuty writes:

    ok … I’ll will look into it!

    Thank for the tip!

  67. Welder work writes:

    ‘;` I am very thankful to this topic because it really gives great information –*

  68. Jay writes:

    Hi, I know that the topic it’s kinda old but I created a parse myself and I can’t solve a problem.
    After I gave up, I tried to use yours. Same problem!
    When you write a text and change the color of just 1 char in the middle of the text, it adds a space before and after the character you changed the color.
    I figured that this happens because the text its sent to the next line. I shouldn’t happen but it does and I don’t know how to get this thing solved.
    Any help would mostly appreciated. info[at]isynapps.com

    thank you!

  69. scuty writes:

    I just test it and it works fine. I am not sure how do you reproduce this issue, but I’m assuming you send your own HTML text to this parser. If you do so than yes, it is possible, this parser does not work well if you send your own HTML. The only way this work is if you let the RTE create the HTML for you and then if you send it back it would display the right content in RTE.

    However if you don’t do this I might need more info on how to reproduce this bug.

  70. Jay writes:

    @scuty

    thanks for the fast reply!
    well, I could also reproduce this using your live sample. what I did was:

    wrote “This is a test”
    Change the color of “e” from “test”
    Clicked “Parse HTML” button
    Copy the HTML output
    Paste into NopPad and saved as .htm

    Open the *.htm file and you’ll able to see the problem.

    Again, thanks for the help!

  71. scuty writes:

    Try to get the text using StringFormat not XMLFormat. The XMLFormat will add line breaks for each HTML element and it would break the HTML on your page. With StringFormat you have one string with no line breaks and that would work just fine.

    Give it a try!

  72. Jay writes:

    @scuty

    I’ll try it. I’ll extend your code and add this fuctionality. I’ll post it here later on.
    Thanks for the help, I appreciate it!

  73. scuty writes:

    My code has this functionality I just don’t use it in the live test. I use XMLFormat to be able to view the HTML code clearly.

  74. Jay writes:

    I just saw that! Thanks for the awsome piece of code!

  75. scuty writes:

    You are welcome!

  76. webalizer writes:

    Hi!
    I place images in my RTE using RTE.htmlText = “”
    As soon as i have an image in the text, i can not save the content through rtehtmlparser.
    Do you have any idea how the -tag causes problems in ParseToHtml?

    The other way from database to rte (ParseToRte) works.

  77. webalizer writes:

    @to before:
    between the quotation marks should be < img src….. etc.

  78. scuty writes:

    @webalizer

    I don’t know if the RTE supports img tag. Even if it would my parser doesn’t support it at all.

  79. webalizer writes:

    @scuty
    Thanks for your reply. The RTE does support simple img tag.
    How can i implement the support to your parser?

  80. neo writes:

    hi scuty, I tried to use your parser but when there is a color between the text, it removes the space. example “The color of text”, the word color on the sentence is blue and it will output on RTE “Thecolorof text”. do you know why is this happening? thanks

  81. scuty writes:

    works fine for me. where do you see the wrong output? on html or when you parse it back to RTE?

  82. scuty writes:

    @webalizer
    I don’t think you can because there is no file upload implemented. I saw some examples online and it’s really tough to make it happen.

  83. vincent writes:

    Wow, great job man! Really thanks for sharing…

    In your example you use the RichTextEditor, the latest spark component (textArea) should work too ? (Flex 4.5)

    Thank you again :-)

  84. scuty writes:

    @vincent

    I am not sure! But my guess is that it’s not. Because they changed the html output from any text boxes in Flex 4+

  85. Sunny writes:

    Currently is given style information but it probably shouldn’t have any at all. For example:

    Is there a way to fix this?

  86. Sunny writes:

    currently the “br” tag is given style information but it probably shouldn’t have any at all. For example: br style=”letter-spacing:0px;color:#FFFFFF;font-size:12px;font-family:’Century Gothic’;”

Leave a Reply

Bad Behavior has blocked 1011 access attempts in the last 7 days.