Flex: Export valid HTML from RichTextEditor and back
Monday, 13 October 2008
I’ve been looking for a RichTextEditor valid HTML parser for long time. I’ve seen all this regexp solutions but is not working all the time specially if you try to send an email with this RTE or to convert from HTML to RTE.
The solution to this problem is a XML approach. Maybe is a little messy but it’s working, I have tested with all kinds of emails like Exchange(Browser and Outlook), Yahoo, MSN. Also tested on IE6, IE7, FF, Opera. The HTML code looks good.
This code was inspired from Antoni Jakubiak’s post on some blog somewhere.
Here are some nice stuff you can do with this parser:
- <TEXTFORMAT> will be completely removed,
- <p> tag can be replaced with any tag you like, the default is <div>
- <font> tag can be replaced with any other tag, default is <span>
- <li> can be replaced, default is <li>
- <li> will be inside <ul> by default but it can be replaced with anything else (e.g. <ol>, <dir> etc.)
- <br/> tag added when you want a space between paragraphs
- the ability to convert back to RTE valid XML from HTML
You can give it a try and please let me know how did it work. Suggestions are always welcome.
Here is a live example and source(rar) or source(zip) files.
UPDATE:
After K.A. noticed the space problem when parsing from RTE to HTML and back I had to escape the content text using XMLDocument and then parse it to XML witch has a better element management.
There is a catch when using both XML and XMLDocument. In XML the white space is ignored at the beginning and the end of the text node. In XMLDocument all the special characters like ‘ (apostrophe) will be changed to ' and it’s a big problem with parsing to CSS, like style=”font-family:’Times New Roman’;” will be style=”font-family:'Times New Roman';”.
That’s why when parsing to RTE and Back the string that is passed should be plain string with no indent or line brakes (e.g. /n), otherwise the indent will show up in the RTE as space.
UPDATE:
I made an update so if you want to completely remove a TAG then simply set it to null.
UPDATE:
New property added ‘ignoreParagraphSpace’, the default value is false. If set to true this will ignore any space at the beginning or at the end of the paragraph.
No. 1 — October 23rd, 2008 at 3:45 am
If you change color of any one word in between a sentence, and convert to HTML, it will trim spaces between words. For example “My Test Sentence” having “Test” in different color, will result in “MyTestSentence”. Formatting is correct, but it is losing spaces.
No. 2 — October 23rd, 2008 at 6:24 am
I just updated this post! Now the space problem it’s solved. It was a good observation since I use this in one of my projects.
Thanks for pointing that out! That was good one!
No. 3 — November 7th, 2008 at 8:04 pm
This is what I need! How can I implement this “tohtml” in PHP? i.e., I need to edit data from a database through PHP…thanks!
No. 4 — November 8th, 2008 at 8:01 am
Simply use ParseToHTML to get the string that you want to save in the database then send it to php to be saved in the database.
I hope this is what you asked me.
No. 5 — February 14th, 2009 at 12:02 pm
Looks great!
Altough i found a bug:
If you apply text align AND list to the same text the text align gets lost
No. 6 — February 14th, 2009 at 12:37 pm
umm nevemind, its a flex bug. Link:
http://bugs.adobe.com/jira/browse/SDK-14486
And the Adobe guys dont seem to be in a hurry to fix it….
No. 7 — March 3rd, 2009 at 4:54 pm
Looks great. I do have a problem if the users start entering in equations that take the less than sign (<). The htmlText has already converted this to the html name < and the xml value has the html number %3C but the ParseToHTML function returns it as < which messes up when you put it back into the ParseToRTE function.
No. 8 — March 3rd, 2009 at 9:12 pm
Thanks Tommy! I’ll look into it since I am using this in my projects.
edit:
Problem solved Tommy! The new version is up.
Thanks for pointing that out!
No. 9 — March 16th, 2009 at 3:36 pm
Do you have the source as a zip file? Having an issue with the rar file.
No. 10 — March 17th, 2009 at 6:44 am
See if the zip files would work.
No. 11 — March 17th, 2009 at 2:09 pm
Sorry to keep bugging you but your example seems to work perfectly but I can’t get the source file to parse the < sign correctly. Would you double check that the source files are correct?
No. 12 — March 17th, 2009 at 4:58 pm
You are right! The source files are the old files … I am not sure how did that happen.
The new files are up now.
No. 13 — March 20th, 2009 at 10:20 am
I knew the less than sign was a simple fix, I just could find it. Great job!
No. 14 — March 20th, 2009 at 10:25 am
Thanks! I hope you enjoy this small script.
No. 15 — April 8th, 2009 at 12:07 pm
I faced one issue while parsing back html to RTE.
I was getting space prefixed to every word.
e.g following content
Hello Binod
again
is getting displayed as
Hello binod
Binod
When I looked at the public function ParseToRTE(string:String)
in rteHtmlParser.as
I have modified the following line
//var nxml:XMLNode = manage_space(xml_doc.firstChild);
var nxml:XMLNode = xml_doc.firstChild;
Do you see any issue due to that?
No. 16 — April 8th, 2009 at 1:22 pm
Well … manage_space is good when you add space in front of some line. When you send it back to RTE the spaces in front of the line are gone.
e.g.
some text
will become
some text
with no space.
But what text did you send to the ParseToRTE because all I do with manage_space is to replace space with %20 and I don’t think that is the issue?
No. 17 — April 22nd, 2009 at 2:58 am
whats the difference between the editor you created and Flex 3 RTE’s htmltext method?
No. 18 — April 22nd, 2009 at 5:48 am
HUGE! Try to add the flex rte htmltext in a web page!
No. 19 — April 29th, 2009 at 5:55 am
I am facing a similar problem to Binod.
If you use manage_space, RTE appends space in front of every word when converting from html. e.g
Hello Everyone
will become
Hello Everyone
If you dont use manage_space, then the extra spaces added in rte are gone in the html. e.g.
Hello Someone
becomes
Hello Someone
Please help me fix this.
No. 20 — April 29th, 2009 at 5:56 am
the spaces got distorted in my previous comment hope you understand this.
No. 21 — April 29th, 2009 at 9:53 am
so you don’t want the extra space in rte when you convert back from html?
because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.
I can make that manage_space optional and you can decide if you want it or not.
No. 22 — April 30th, 2009 at 7:36 am
this is correct: because if you add space in front of any word in rte then convert to html and then back to rte the space will be removed/ignored if you don’t use manage_space.
also if you use manage_space: if you “dont” add any space in rte and convert to html and back…it adds spaces in front which is a bug, rite?
No. 23 — April 30th, 2009 at 9:55 am
no. if you don’t add any space in rte and convert it to html and back there will be no space! the thing that I wanted with this parser is that I wanted to preserve all the format that you add in rte after you convert it back from html.
there are some comments posted here about this problem.
No. 24 — May 4th, 2009 at 8:38 am
sorry, that was my mistake, i was using XMLFormat which led to the weird behavior.
I have one more problem though, not necessarily connected with your parser: If I choose any other font other than the default font, the RTE does not respond to any key stroke. Am i missing a font file? (Even the basic Arial, Times New Roman fonts dont work)
No. 25 — May 4th, 2009 at 9:09 am
works fine on my side! I think this is a RTE component problem.
did you try to use other browsers? maybe it’s buggy on you current browser.
No. 26 — October 28th, 2009 at 3:06 pm
This is a great idea and I wanted to use it in my project…however my needs involve adding the htmlText into an XML string that gets parsed by Flex. While the string from the rteHtmlParser was correct, Flex kepting changing my string into xml w/ the incorrect spacing (that RTE’s don’t like). I was then sending this across the wire to be stored, and upon retrieval, the xml was incorrectly formatted.
To get around this (which causes awful spacing problems), I simply took the htmlText, Base64 encoded it, then added that to my xml to shoot across the wire (actually, all of my xml gets base64 encoded – so my htmlText actually is double encoded). Then, when I read it back in, I decode (twice) and get no spacing issues.
So in the end, this didn’t work for me, but I can see how it would be useful in other circumstances.
No. 27 — December 2nd, 2009 at 4:43 am
That’s great…
Its help me .its works fine but small problem
if the Text “My Example” to change the font-weoght to bold it will come tab space when it display agan…
No. 28 — December 2nd, 2009 at 7:24 am
If you try to insert the bold manually in the parsed text or if you do any other changes try not to leave blank space at all between the html tags.
No. 29 — December 16th, 2009 at 8:31 pm
Hi Scuty,
Great class, great post!!! Thanks heaps for sharing!
I have a question, if I may. The built-in mx:RichTextEditor class offers only limited formatting features, so I needed to develop my own RTE. It’s based on mx:TextArea and the formatting bit relies entirely on fl:TextFormat. However, it adds some properties to the htmlText such as leading and blockindent (and a few more), which your class doesn’t seem to parse. My question is … do you think those could be added in the future? If that’s possible, your tool would become the ultimate solution (I am referring to the many RegExp partial solutions around).
Cheers,
Naso.
No. 30 — December 17th, 2009 at 5:24 am
Thanks Naso for the great feedback!
So you want to use TextArea instead of RichTextEditor and you want to parse the HTML with the TextFormat as CSS?
Maybe you can help me by sending me a flex project with the TextArea formated with the TextFormat and I can see how the HTML looks like and I am sure that the rest should be fairly easy. Also if you send me that include all the supported format features (those a few more).
Here is my email: … .
No. 31 — January 11th, 2010 at 3:19 am
Nice work, Scuty!
No. 32 — January 12th, 2010 at 8:21 am
How do I completely remove all tags ?
No. 33 — January 12th, 2010 at 8:22 am
Please ignore previous.
How do I completely remove all SPAN tags ?
No. 34 — January 12th, 2010 at 8:27 am
I don’t think you can. But if you really want it I can make it so if you set the tag null then it will be ignored.
But do you really need that? How are you going to convert it back to RTE?
No. 35 — January 12th, 2010 at 10:04 am
Null will be fine.
I want to keep the html as clean and simple as possible, I’ve removed the font dropdown options from the RTE as an external css will deal wit this. My MSSQL server db complains if if there are quotes so I parse them in PHP, hence why i need them removed and all that;s in them.
I don’t need to convert it back to RTE as it reads the htm
No. 36 — January 12th, 2010 at 10:13 am
To put it simply I need the orig RTE FONT tag removing and not converting to a SPAN tag at all
No. 37 — January 12th, 2010 at 10:34 am
The solution is not to cut the quotes, as you will always have this problem. But to figure out a way to insert quotes in the database. The easy way is the use parameters instead of in line query variables, using parameters you will be able to save HTML in the database.
I will do that as soon as I can.
No. 38 — January 15th, 2010 at 8:26 am
[...] Exportare/Importare de HTML din/in FLEX By eduardpal Sunt sigur ca multi ati observat ca HTML-ul generat de componentele text din FLEX nu este valid. Asa ca una dintre solutii ar fi sa utlizati un tool de conversie. Un astfel de tool am gasit la http://blog.flashweb.org/archives/7. [...]
No. 39 — January 16th, 2010 at 7:23 am
KevMull – you can now download the new version of the RTE parses and if you want to remove a TAG then just set it to null and that tag will be completely removed.
No. 40 — January 16th, 2010 at 7:28 am
Naso – can you post that list of the CSS tags that you need to be parsed?
No. 41 — March 8th, 2010 at 10:02 am
This is some great stuff. Thanks for sharing.
No. 42 — March 22nd, 2010 at 9:42 am
Thanks for a great, easy to use tool. Separate question, if you would be so kind, how do I keep paragraph indents from showing. Doesn’t seem to be anything in markup and displays correctly on any report just indents show up to user.
No. 43 — March 22nd, 2010 at 6:22 pm
I’ll add something in a few days.
edit:
I updated the post and the code files, I hope this is what you wanted jmp.
No. 44 — March 29th, 2010 at 1:54 am
Hi scuty,
I don’t know if was just me but the output doesn’t include the style you apply to the text, am I right? For instance, if you change the color in the middle of the text, the output will just ignore the font style…
This is a great comp, thanks for sharing!
No. 45 — March 29th, 2010 at 6:18 am
It should output everything! It might be a bug … I’ll fix it!
OK now it’s working. SET_FONT was set to null and was ignoring all the FONT tags. The code is fine.
No. 46 — April 29th, 2010 at 3:56 am
Great work! keep it up
No. 47 — June 16th, 2010 at 9:59 am
Upercase tags gives validation error :-/
Can you provide a recent vercion with lowercase tags ?
I tried but it didn’t work for STYLE
And also I could not change to lowercase and I couldn’t remove some useless tags in
Regards
Trevor
No. 48 — June 25th, 2010 at 7:08 am
Great Job!!!!!!!!11111
if i want to add section of tables in rich text editor and want to parse in html
can u give me some update on this
waiting for ur reply…..
No. 49 — June 25th, 2010 at 7:36 am
@Trevor: I see what I can do. I was in vacation so …
@Akshit: I don’t think tables are supported in flash. What exactly do you want to do?
No. 50 — June 30th, 2010 at 10:58 pm
i want to add a button in richtext editor called as table when click on it i want it will asked about rows and columns and after entering the the rows and columns it should show a tabular format containing that much row and columns in which then we can add data
No. 51 — July 1st, 2010 at 10:20 am
This has nothing to do with this class. This is more extending the RTE component in flex. You need to look for a RTE component that does that.
I am sorry …
No. 52 — July 7th, 2010 at 8:48 am
thanks but u get any upadte abt it then let me know
i have modified richtexteditor by using datagrid in it.
No. 53 — August 23rd, 2010 at 11:27 am
Hi guys,
does anyone have this code translated to c# ?
I’d appreciate it very much.
I am interested in converting the RTE into HTML code.
Thanks
No. 54 — August 25th, 2010 at 7:08 am
This is a Flex(Flash) code and has nothing to do with C#.
No. 55 — September 9th, 2010 at 9:06 am
Hi scuty,
You have done a great job! Thanks!
I have one “feature request”:
Is it possible to generate HTML preserving spaces so if the html is rendered in a browser to get the same formatting. Now using the span tag you see in the browser all spaces gone. Maybe they should be replaced by nbsp ?
What is your oppinion?
No. 56 — September 9th, 2010 at 9:40 am
Hi Paul,
Try the “ignoreParagraphSpace” option, that should do it I think. There were other people complaining about this and I fixed a while back.
EDIT:
nvm … I know what you mean. I’ll see what I can do.
No. 57 — September 9th, 2010 at 9:51 am
Hi scuty,
ignoreParagraphSpace is false by default and as I understand it is used to preserve leading spaces between RTEHTML conversions when visualizing the content into RichTextEditor. My problem is that when generated HTML is used to get rendered in Firefox/IE the leading spaces are skipped (because tag). I tried to replace with
No. 58 — September 9th, 2010 at 9:53 am
I tried to replace with – then it keeps leading spaces but i got extra line breaks in my final browser rendering…
No. 59 — September 9th, 2010 at 9:53 am
I tried to replace span with pre – then it keeps leading spaces but i got extra line breaks in my final browser rendering…
No. 60 — September 9th, 2010 at 10:10 am
I know … I’ll look into it.
No. 61 — September 27th, 2010 at 11:45 am
Hi Scuty,
this is some really nice work !
I couldn’t find a license in the sources, is it okay to use it in a commercial project ?
No. 62 — September 28th, 2010 at 9:57 am
Yes, you can use it in your project. I will update the license as soon as I can.
No. 63 — December 3rd, 2010 at 11:28 am
scuty – I love this…well done!
Weve been writing a custom text area using regex to actually produce code to eventually wind up in PDF. however the faulty html causes problems due to the tag rearranging.
Would it be possible to extend the TA /RTA class as in:
override public function set htmlText(value:String):void{
super.htmlText = value
}
I know this is really adobe’s flex community’s task but wouldn’t that be the real aim or goal?
Perhaps the question really is where does one access the super.htmlText prior to the mess?
No. 64 — December 4th, 2010 at 6:38 am
@Murray
So you want to extend the RTE with this class? And override the set; and get; htmlText?
No. 65 — December 9th, 2010 at 7:05 am
Hi there scuty
Thanks for the reply.
Spot on – yes to extend the RTE or TA.
like:
package
{
import mx.controls.RichTextEditor;
public class TrueHTMLRichTextArea extends RichTextEditor
{
public function TrueHTMLRichTextArea()
{
super();
}
public function get trueHTMLText():String {
return convertTohtmltext(this.htmlText);
}
public static function convertTohtmltext(str:String):String{
///your RAD regex and conversion!
}
}
}
then you could access the html by yourTA/RTA.trueHTMLText;
No. 66 — December 27th, 2010 at 6:19 am
ok … I’ll will look into it!
Thank for the tip!
No. 67 — January 28th, 2011 at 12:34 am
‘;` I am very thankful to this topic because it really gives great information –*
No. 68 — February 8th, 2011 at 4:32 am
Hi, I know that the topic it’s kinda old but I created a parse myself and I can’t solve a problem.
After I gave up, I tried to use yours. Same problem!
When you write a text and change the color of just 1 char in the middle of the text, it adds a space before and after the character you changed the color.
I figured that this happens because the text its sent to the next line. I shouldn’t happen but it does and I don’t know how to get this thing solved.
Any help would mostly appreciated. info[at]isynapps.com
thank you!
No. 69 — February 8th, 2011 at 5:45 am
I just test it and it works fine. I am not sure how do you reproduce this issue, but I’m assuming you send your own HTML text to this parser. If you do so than yes, it is possible, this parser does not work well if you send your own HTML. The only way this work is if you let the RTE create the HTML for you and then if you send it back it would display the right content in RTE.
However if you don’t do this I might need more info on how to reproduce this bug.
No. 70 — February 8th, 2011 at 5:55 am
@scuty
thanks for the fast reply!
well, I could also reproduce this using your live sample. what I did was:
wrote “This is a test”
Change the color of “e” from “test”
Clicked “Parse HTML” button
Copy the HTML output
Paste into NopPad and saved as .htm
Open the *.htm file and you’ll able to see the problem.
Again, thanks for the help!
No. 71 — February 8th, 2011 at 6:06 am
Try to get the text using StringFormat not XMLFormat. The XMLFormat will add line breaks for each HTML element and it would break the HTML on your page. With StringFormat you have one string with no line breaks and that would work just fine.
Give it a try!
No. 72 — February 8th, 2011 at 6:21 am
@scuty
I’ll try it. I’ll extend your code and add this fuctionality. I’ll post it here later on.
Thanks for the help, I appreciate it!
No. 73 — February 8th, 2011 at 6:24 am
My code has this functionality I just don’t use it in the live test. I use XMLFormat to be able to view the HTML code clearly.
No. 74 — February 8th, 2011 at 7:31 am
I just saw that! Thanks for the awsome piece of code!
No. 75 — February 8th, 2011 at 7:33 am
You are welcome!
No. 76 — March 7th, 2011 at 7:07 am
Hi!
I place images in my RTE using RTE.htmlText = “”
As soon as i have an image in the text, i can not save the content through rtehtmlparser.
Do you have any idea how the -tag causes problems in ParseToHtml?
The other way from database to rte (ParseToRte) works.
No. 77 — March 7th, 2011 at 7:08 am
@to before:
between the quotation marks should be < img src….. etc.
No. 78 — March 7th, 2011 at 9:48 am
@webalizer
I don’t know if the RTE supports img tag. Even if it would my parser doesn’t support it at all.
No. 79 — March 7th, 2011 at 9:59 am
@scuty
Thanks for your reply. The RTE does support simple img tag.
How can i implement the support to your parser?
No. 80 — April 11th, 2011 at 3:40 am
hi scuty, I tried to use your parser but when there is a color between the text, it removes the space. example “The color of text”, the word color on the sentence is blue and it will output on RTE “Thecolorof text”. do you know why is this happening? thanks
No. 81 — April 11th, 2011 at 5:24 am
works fine for me. where do you see the wrong output? on html or when you parse it back to RTE?
No. 82 — April 11th, 2011 at 5:26 am
@webalizer
I don’t think you can because there is no file upload implemented. I saw some examples online and it’s really tough to make it happen.
No. 83 — April 19th, 2011 at 10:49 am
Wow, great job man! Really thanks for sharing…
In your example you use the RichTextEditor, the latest spark component (textArea) should work too ? (Flex 4.5)
Thank you again
No. 84 — April 20th, 2011 at 8:05 am
@vincent
I am not sure! But my guess is that it’s not. Because they changed the html output from any text boxes in Flex 4+
No. 85 — April 26th, 2011 at 9:26 pm
Currently is given style information but it probably shouldn’t have any at all. For example:
Is there a way to fix this?
No. 86 — April 26th, 2011 at 9:28 pm
currently the “br” tag is given style information but it probably shouldn’t have any at all. For example: br style=”letter-spacing:0px;color:#FFFFFF;font-size:12px;font-family:’Century Gothic’;”