Flex: Export valid HTML from RichTextEditor and back
Monday, 13 October 2008
I’ve been looking for a RichTextEditor valid HTML parser for long time. I’ve seen all this regexp solutions but is not working all the time specially if you try to send an email with this RTE or to convert from HTML to RTE.
The solution to this problem is a XML approach. Maybe is a little messy but it’s working, I have tested with all kinds of emails like Exchange(Browser and Outlook), Yahoo, MSN. Also tested on IE6, IE7, FF, Opera. The HTML code looks good.
This code was inspired from Antoni Jakubiak’s post on some blog somewhere.
Here are some nice stuff you can do with this parser:
- <TEXTFORMAT> will be completely removed,
- <p> tag can be replaced with any tag you like, the default is <div>
- <font> tag can be replaced with any other tag, default is <span>
- <li> can be replaced, default is <li>
- <li> will be inside <ul> by default but it can be replaced with anything else (e.g. <ol>, <dir> etc.)
- <br/> tag added when you want a space between paragraphs
- the ability to convert back to RTE valid XML from HTML
You can give it a try and please let me know how did it work. Suggestions are always welcome.
Here is a live example and source(rar) or source(zip) files.
UPDATE:
After K.A. noticed the space problem when parsing from RTE to HTML and back I had to escape the content text using XMLDocument and then parse it to XML witch has a better element management.
There is a catch when using both XML and XMLDocument. In XML the white space is ignored at the beginning and the end of the text node. In XMLDocument all the special characters like ‘ (apostrophe) will be changed to ' and it’s a big problem with parsing to CSS, like style=”font-family:’Times New Roman’;” will be style=”font-family:'Times New Roman';”.
That’s why when parsing to RTE and Back the string that is passed should be plain string with no indent or line brakes (e.g. /n), otherwise the indent will show up in the RTE as space.
UPDATE:
I made an update so if you want to completely remove a TAG then simply set it to null.
No. 1 — October 23rd, 2008 at 3:45 am
If you change color of any one word in between a sentence, and convert to HTML, it will trim spaces between words. For example “My Test Sentence” having “Test” in different color, will result in “MyTestSentence”. Formatting is correct, but it is losing spaces.
No. 2 — October 23rd, 2008 at 6:24 am
I just updated this post! Now the space problem it’s solved. It was a good observation since I use this in one of my projects.
Thanks for pointing that out! That was good one!
No. 3 — November 7th, 2008 at 8:04 pm
This is what I need! How can I implement this “tohtml” in PHP? i.e., I need to edit data from a database through PHP…thanks!
No. 4 — November 8th, 2008 at 8:01 am
Simply use ParseToHTML to get the string that you want to save in the database then send it to php to be saved in the database.
I hope this is what you asked me.
No. 5 — February 14th, 2009 at 12:02 pm
Looks great!
Altough i found a bug:
If you apply text align AND list to the same text the text align gets lost
No. 6 — February 14th, 2009 at 12:37 pm
umm nevemind, its a flex bug. Link:
http://bugs.adobe.com/jira/browse/SDK-14486
And the Adobe guys dont seem to be in a hurry to fix it….
No. 7 — March 3rd, 2009 at 4:54 pm
Looks great. I do have a problem if the users start entering in equations that take the less than sign (<). The htmlText has already converted this to the html name < and the xml value has the html number %3C but the ParseToHTML function returns it as < which messes up when you put it back into the ParseToRTE function.
No. 8 — March 3rd, 2009 at 9:12 pm
Thanks Tommy! I’ll look into it since I am using this in my projects.
edit:
Problem solved Tommy! The new version is up.
Thanks for pointing that out!
No. 9 — March 16th, 2009 at 3:36 pm
Do you have the source as a zip file? Having an issue with the rar file.
No. 10 — March 17th, 2009 at 6:44 am
See if the zip files would work.
No. 11 — March 17th, 2009 at 2:09 pm
Sorry to keep bugging you but your example seems to work perfectly but I can’t get the source file to parse the < sign correctly. Would you double check that the source files are correct?
No. 12 — March 17th, 2009 at 4:58 pm
You are right! The source files are the old files … I am not sure how did that happen.
The new files are up now.
No. 13 — March 20th, 2009 at 10:20 am
I knew the less than sign was a simple fix, I just could find it. Great job!
No. 14 — March 20th, 2009 at 10:25 am
Thanks! I hope you enjoy this small script.
No. 15 — April 8th, 2009 at 12:07 pm
I faced one issue while parsing back html to RTE.
I was getting space prefixed to every word.
e.g following content
Hello Binod
again
is getting displayed as
Hello binod
Binod
When I looked at the public function ParseToRTE(string:String)
in rteHtmlParser.as
I have modified the following line
//var nxml:XMLNode = manage_space(xml_doc.firstChild);
var nxml:XMLNode = xml_doc.firstChild;
Do you see any issue due to that?
No. 16 — April 8th, 2009 at 1:22 pm
Well … manage_space is good when you add space in front of some line. When you send it back to RTE the spaces in front of the line are gone.
e.g.
some text
will become
some text
with no space.
But what text did you send to the ParseToRTE because all I do with manage_space is to replace space with %20 and I don’t think that is the issue?
No. 17 — April 22nd, 2009 at 2:58 am
whats the difference between the editor you created and Flex 3 RTE’s htmltext method?
No. 18 — April 22nd, 2009 at 5:48 am
HUGE! Try to add the flex rte htmltext in a web page!
No. 19 — April 29th, 2009 at 5:55 am
I am facing a similar problem to Binod.
If you use manage_space, RTE appends space in front of every word when converting from html. e.g
Hello Everyone
will become
Hello Everyone
If you dont use manage_space, then the extra spaces added in rte are gone in the html. e.g.
Hello Someone
becomes
Hello Someone
Please help me fix this.
No. 20 — April 29th, 2009 at 5:56 am
the spaces got distorted in my previous comment hope you understand this.
No. 21 — April 29th, 2009 at 9:53 am
so you don’t want the extra space in rte when you convert back from html?
because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.
I can make that manage_space optional and you can decide if you want it or not.
No. 22 — April 30th, 2009 at 7:36 am
this is correct: because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.
also if you use manage_space: if you “dont” add any space in rte and convert to html and back…it adds spaces in front which is a bug, rite?
No. 23 — April 30th, 2009 at 9:55 am
no. if you don’t add any space in rte and convert it to html and back there will be no space! the thing that I wanted with this parser is that I wanted to preserve all the format that you add in rte after you convert it back from html.
there are some comments posted here about this problem.
No. 24 — May 4th, 2009 at 8:38 am
sorry, that was my mistake, i was using XMLFormat which led to the weird behavior.
I have one more problem though, not necessarily connected with your parser: If I choose any other font other than the default font, the RTE does not respond to any key stroke. Am i missing a font file? (Even the basic Arial, Times New Roman fonts dont work)
No. 25 — May 4th, 2009 at 9:09 am
works fine on my side! I think this is a RTE component problem.
did you try to use other browsers? maybe it’s buggy on you current browser.
No. 26 — October 28th, 2009 at 3:06 pm
This is a great idea and I wanted to use it in my project…however my needs involve adding the htmlText into an XML string that gets parsed by Flex. While the string from the rteHtmlParser was correct, Flex kepting changing my string into xml w/ the incorrect spacing (that RTE’s don’t like). I was then sending this across the wire to be stored, and upon retrieval, the xml was incorrectly formatted.
To get around this (which causes awful spacing problems), I simply took the htmlText, Base64 encoded it, then added that to my xml to shoot across the wire (actually, all of my xml gets base64 encoded – so my htmlText actually is double encoded). Then, when I read it back in, I decode (twice) and get no spacing issues.
So in the end, this didn’t work for me, but I can see how it would be useful in other circumstances.
No. 27 — December 2nd, 2009 at 4:43 am
That’s great…
Its help me .its works fine but small problem
if the Text “My Example” to change the font-weoght to bold it will come tab space when it display agan…
No. 28 — December 2nd, 2009 at 7:24 am
If you try to insert the bold manually in the parsed text or if you do any other changes try not to leave blank space at all between the html tags.
No. 29 — December 16th, 2009 at 8:31 pm
Hi Scuty,
Great class, great post!!! Thanks heaps for sharing!
I have a question, if I may. The built-in mx:RichTextEditor class offers only limited formatting features, so I needed to develop my own RTE. It’s based on mx:TextArea and the formatting bit relies entirely on fl:TextFormat. However, it adds some properties to the htmlText such as leading and blockindent (and a few more), which your class doesn’t seem to parse. My question is … do you think those could be added in the future? If that’s possible, your tool would become the ultimate solution (I am referring to the many RegExp partial solutions around).
Cheers,
Naso.
No. 30 — December 17th, 2009 at 5:24 am
Thanks Naso for the great feedback!
So you want to use TextArea instead of RichTextEditor and you want to parse the HTML with the TextFormat as CSS?
Maybe you can help me by sending me a flex project with the TextArea formated with the TextFormat and I can see how the HTML looks like and I am sure that the rest should be fairly easy. Also if you send me that include all the supported format features (those a few more).
Here is my email: … .
No. 31 — January 11th, 2010 at 3:19 am
Nice work, Scuty!
No. 32 — January 12th, 2010 at 8:21 am
How do I completely remove all tags ?
No. 33 — January 12th, 2010 at 8:22 am
Please ignore previous.
How do I completely remove all SPAN tags ?
No. 34 — January 12th, 2010 at 8:27 am
I don’t think you can. But if you really want it I can make it so if you set the tag null then it will be ignored.
But do you really need that? How are you going to convert it back to RTE?
No. 35 — January 12th, 2010 at 10:04 am
Null will be fine.
I want to keep the html as clean and simple as possible, I’ve removed the font dropdown options from the RTE as an external css will deal wit this. My MSSQL server db complains if if there are quotes so I parse them in PHP, hence why i need them removed and all that;s in them.
I don’t need to convert it back to RTE as it reads the htm
No. 36 — January 12th, 2010 at 10:13 am
To put it simply I need the orig RTE FONT tag removing and not converting to a SPAN tag at all
No. 37 — January 12th, 2010 at 10:34 am
The solution is not to cut the quotes, as you will always have this problem. But to figure out a way to insert quotes in the database. The easy way is the use parameters instead of in line query variables, using parameters you will be able to save HTML in the database.
I will do that as soon as I can.
No. 38 — January 15th, 2010 at 8:26 am
[...] Exportare/Importare de HTML din/in FLEX By eduardpal Sunt sigur ca multi ati observat ca HTML-ul generat de componentele text din FLEX nu este valid. Asa ca una dintre solutii ar fi sa utlizati un tool de conversie. Un astfel de tool am gasit la http://blog.flashweb.org/archives/7. [...]
No. 39 — January 16th, 2010 at 7:23 am
KevMull – you can now download the new version of the RTE parses and if you want to remove a TAG then just set it to null and that tag will be completely removed.
No. 40 — January 16th, 2010 at 7:28 am
Naso – can you post that list of the CSS tags that you need to be parsed?