Transforming text to paragraphs

Recently, in a customer project, I came upon a situation where I would get a bulk of text delivered from the database through XML. Naturally, the customer wanted the text to be presented in neat paragraphs and the only way to differentiate one paragraph from another in the source text was line breaks.

I then created a JavaScript in the XSLT file to wrap every text entity separated by line breaks within p tags.

My solution can be found in RobLab‘s Text to paragraphs article.

4 Comments

  • Tom Greuter says:

    If the text in the database is allowed to contain HTML you can possibly run into validation trouble when this script surrounds blockquotes or unnumbered lists with paragraph tags. In that case you probably want to apply more filters.

  • Robert Nyman says:

    Tom,

    Interesting that you bring that up!

    In my case, the text contained <acronym title="HyperText Markup Language">HTML</acronym> as well, which they wanted removed.

    So I ran a regular expression script to remove all tags first; basically, like this:

    <code>strNodeText = strNodeText.replace(/</?[^>]+(>|$)/g, "");</code>

    However, if the text also contains tags that should be left there, this solution isn't optimal. This script is only meant for plain text.

  • Devon says:

    There's two things I would love to see done in XSLT –

    1. chat log to XML.

    2. e-mail (from say, my thunderbird client) to XML.

    If I had the money I'd pay someone to do it. That's how much I want these. But, I just lost my job recently.

  • Robert Nyman says:

    Devon,

    First, sorry to hear about your job. I hope things work out.

    Regarding chat logs: If you have Message History enabled in MSN Messenger, it writes an XML file per contact with your conversations.

    When it comes to the e-mailing thing, I have no concrete suggestion.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.