Flex: Export valid HTML from RichTextEditor and back

I’ve been looking for a RichTextEditor valid HTML parser for long time. I’ve seen all this regexp solutions but is not working all the time specially if you try to send an email with this RTE or to convert from HTML to RTE.

The solution to this problem is a XML approach. Maybe is a little messy but it’s working, I have tested with all kinds of emails like Exchange(Browser and Outlook), Yahoo, MSN. Also tested on IE6, IE7, FF, Opera. The HTML code looks good.

This code was inspired from Antoni Jakubiak’s post on some blog somewhere.

Here are some nice stuff you can do with this parser:

  • <TEXTFORMAT> will be completely removed,
  • <p> tag can be replaced with any tag you like, the default is <div>
  • <font> tag can be replaced with any other tag, default is <span>
  • <li> can be replaced, default is <li>
  • <li> will be inside <ul> by default but it can be replaced with anything else (e.g. <ol>, <dir> etc.)
  • <br/> tag added when you want a space between paragraphs
  • the ability to convert back to RTE valid XML from HTML

You can give it a try and please let me know how did it work. Suggestions are always welcome.

Here is a live example and source(rar) or source(zip) files.

UPDATE:

After K.A. noticed the space problem when parsing from RTE to HTML and back I had to escape the content text using XMLDocument and then parse it to XML witch has a better element management.

There is a catch when using both XML and XMLDocument. In XML the white space is ignored at the beginning and the end of the text node. In XMLDocument all the special characters like ‘ (apostrophe) will be changed to &apos; and it’s a big problem with parsing to CSS, like style=”font-family:’Times New Roman’;” will be style=”font-family:&apos;Times New Roman&apos;;”.

That’s why when parsing to RTE and Back the string that is passed should be plain string with no indent or line brakes (e.g. /n), otherwise the indent will show up in the RTE as space.

UPDATE:

I made an update so if you want to completely remove a TAG then simply set it to null.

Tags: , , , , , , , , , , , ,

40 Responses to “Flex: Export valid HTML from RichTextEditor and back”

  1. K. A. writes:

    If you change color of any one word in between a sentence, and convert to HTML, it will trim spaces between words. For example “My Test Sentence” having “Test” in different color, will result in “MyTestSentence”. Formatting is correct, but it is losing spaces.

  2. scuty writes:

    I just updated this post! Now the space problem it’s solved. It was a good observation since I use this in one of my projects.

    Thanks for pointing that out! That was good one!

  3. gwb writes:

    This is what I need! How can I implement this “tohtml” in PHP? i.e., I need to edit data from a database through PHP…thanks!

  4. scuty writes:

    Simply use ParseToHTML to get the string that you want to save in the database then send it to php to be saved in the database.

    I hope this is what you asked me.

  5. sydd writes:

    Looks great!
    Altough i found a bug:
    If you apply text align AND list to the same text the text align gets lost :(

  6. sydd writes:

    umm nevemind, its a flex bug. Link:
    http://bugs.adobe.com/jira/browse/SDK-14486
    And the Adobe guys dont seem to be in a hurry to fix it….

  7. Tommy writes:

    Looks great. I do have a problem if the users start entering in equations that take the less than sign (<). The htmlText has already converted this to the html name < and the xml value has the html number %3C but the ParseToHTML function returns it as < which messes up when you put it back into the ParseToRTE function.

  8. scuty writes:

    Thanks Tommy! I’ll look into it since I am using this in my projects.

    edit:
    Problem solved Tommy! The new version is up.
    Thanks for pointing that out!

  9. Tommy writes:

    Do you have the source as a zip file? Having an issue with the rar file.

  10. scuty writes:

    See if the zip files would work.

  11. Tommy writes:

    Sorry to keep bugging you but your example seems to work perfectly but I can’t get the source file to parse the < sign correctly. Would you double check that the source files are correct?

  12. scuty writes:

    You are right! The source files are the old files … I am not sure how did that happen.

    The new files are up now.

  13. Tommy writes:

    I knew the less than sign was a simple fix, I just could find it. Great job!

  14. scuty writes:

    Thanks! I hope you enjoy this small script.

  15. binod writes:

    I faced one issue while parsing back html to RTE.
    I was getting space prefixed to every word.
    e.g following content

    Hello Binod

    again

    is getting displayed as

    Hello binod

    Binod

    When I looked at the public function ParseToRTE(string:String)
    in rteHtmlParser.as

    I have modified the following line

    //var nxml:XMLNode = manage_space(xml_doc.firstChild);
    var nxml:XMLNode = xml_doc.firstChild;

    Do you see any issue due to that?

  16. scuty writes:

    Well … manage_space is good when you add space in front of some line. When you send it back to RTE the spaces in front of the line are gone.

    e.g.
        some text

    will become

    some text

    with no space.

    But what text did you send to the ParseToRTE because all I do with manage_space is to replace space with %20 and I don’t think that is the issue?

  17. Tushar writes:

    whats the difference between the editor you created and Flex 3 RTE’s htmltext method?

  18. scuty writes:

    HUGE! Try to add the flex rte htmltext in a web page!

  19. Tushar writes:

    I am facing a similar problem to Binod.
    If you use manage_space, RTE appends space in front of every word when converting from html. e.g
    Hello Everyone

    will become
    Hello Everyone

    If you dont use manage_space, then the extra spaces added in rte are gone in the html. e.g.
    Hello Someone

    becomes
    Hello Someone

    Please help me fix this.

  20. Tushar writes:

    the spaces got distorted in my previous comment hope you understand this.

  21. scuty writes:

    so you don’t want the extra space in rte when you convert back from html?

    because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.

    I can make that manage_space optional and you can decide if you want it or not.

  22. Tushar writes:

    this is correct: because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.

    also if you use manage_space: if you “dont” add any space in rte and convert to html and back…it adds spaces in front which is a bug, rite?

  23. scuty writes:

    no. if you don’t add any space in rte and convert it to html and back there will be no space! the thing that I wanted with this parser is that I wanted to preserve all the format that you add in rte after you convert it back from html.

    there are some comments posted here about this problem.

  24. Tushar writes:

    sorry, that was my mistake, i was using XMLFormat which led to the weird behavior.
    I have one more problem though, not necessarily connected with your parser: If I choose any other font other than the default font, the RTE does not respond to any key stroke. Am i missing a font file? (Even the basic Arial, Times New Roman fonts dont work)

  25. scuty writes:

    works fine on my side! I think this is a RTE component problem.

    did you try to use other browsers? maybe it’s buggy on you current browser.

  26. lordB8r writes:

    This is a great idea and I wanted to use it in my project…however my needs involve adding the htmlText into an XML string that gets parsed by Flex. While the string from the rteHtmlParser was correct, Flex kepting changing my string into xml w/ the incorrect spacing (that RTE’s don’t like). I was then sending this across the wire to be stored, and upon retrieval, the xml was incorrectly formatted.

    To get around this (which causes awful spacing problems), I simply took the htmlText, Base64 encoded it, then added that to my xml to shoot across the wire (actually, all of my xml gets base64 encoded – so my htmlText actually is double encoded). Then, when I read it back in, I decode (twice) and get no spacing issues.

    So in the end, this didn’t work for me, but I can see how it would be useful in other circumstances.

  27. jeeva writes:

    That’s great…
    Its help me .its works fine but small problem
    if the Text “My Example” to change the font-weoght to bold it will come tab space when it display agan…

  28. scuty writes:

    If you try to insert the bold manually in the parsed text or if you do any other changes try not to leave blank space at all between the html tags.

  29. Naso a writes:

    Hi Scuty,

    Great class, great post!!! Thanks heaps for sharing!

    I have a question, if I may. The built-in mx:RichTextEditor class offers only limited formatting features, so I needed to develop my own RTE. It’s based on mx:TextArea and the formatting bit relies entirely on fl:TextFormat. However, it adds some properties to the htmlText such as leading and blockindent (and a few more), which your class doesn’t seem to parse. My question is … do you think those could be added in the future? If that’s possible, your tool would become the ultimate solution (I am referring to the many RegExp partial solutions around).

    Cheers,

    Naso.

  30. scuty writes:

    Thanks Naso for the great feedback!
    So you want to use TextArea instead of RichTextEditor and you want to parse the HTML with the TextFormat as CSS?

    Maybe you can help me by sending me a flex project with the TextArea formated with the TextFormat and I can see how the HTML looks like and I am sure that the rest should be fairly easy. Also if you send me that include all the supported format features (those a few more).

    Here is my email: … .

  31. Cosmin Dorobantu writes:

    Nice work, Scuty!

  32. KevMull writes:

    How do I completely remove all tags ?

  33. KevMull writes:

    Please ignore previous.

    How do I completely remove all SPAN tags ?

  34. scuty writes:

    I don’t think you can. But if you really want it I can make it so if you set the tag null then it will be ignored.

    But do you really need that? How are you going to convert it back to RTE?

  35. KevMull writes:

    Null will be fine.
    I want to keep the html as clean and simple as possible, I’ve removed the font dropdown options from the RTE as an external css will deal wit this. My MSSQL server db complains if if there are quotes so I parse them in PHP, hence why i need them removed and all that;s in them.
    I don’t need to convert it back to RTE as it reads the htm

  36. KevMull writes:

    To put it simply I need the orig RTE FONT tag removing and not converting to a SPAN tag at all

  37. scuty writes:

    The solution is not to cut the quotes, as you will always have this problem. But to figure out a way to insert quotes in the database. The easy way is the use parameters instead of in line query variables, using parameters you will be able to save HTML in the database.

    I will do that as soon as I can.

  38. Exportare/Importare de HTML din/in FLEX « Eduard Pal writes:

    [...] Exportare/Importare de HTML din/in FLEX By eduardpal Sunt sigur ca multi ati observat ca HTML-ul generat de componentele text din FLEX nu este valid. Asa ca una dintre solutii ar fi sa utlizati un tool de conversie. Un astfel de tool am gasit la http://blog.flashweb.org/archives/7. [...]

  39. scuty writes:

    KevMull – you can now download the new version of the RTE parses and if you want to remove a TAG then just set it to null and that tag will be completely removed.

  40. scuty writes:

    Naso – can you post that list of the CSS tags that you need to be parsed?

Leave a Reply


XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

for better code support use: <pre lang="the_language">
where "the_language" can be: actionscript, actionscript3, php, java etc.