30
May

An autop Django template filter

autop is a script that was first written for WordPress by Matt Mullenweg (the WordPress founder). All WordPress blog posts are filtered using wpautop() (unless you install an additional plug-in to disable the filter). The function was also ported to Drupal, and it's enabled by default when entering body text into Drupal nodes. As far as I'm aware, autop has never been ported to a language other than PHP. Until now.

In the process of migrating this site from Drupal to Django, I was surprised to discover that not only Django, but also Python in general, lacks any linebreak filtering function (official or otherwise) that's anywhere near as intelligent as autop. The built-in Django linebreaks filter converts all single newlines to <br /> tags, and all double newlines to <p> tags, completely irrespective of HTML block elements such as <code> and <script>. This was a fairly major problem for me, as I was migrating a lot of old content over from Drupal, and that content was all formatted in autop style. Plus, I'm used to writing content in that way, and I'd like to continue writing content in that way, whether I'm in a PHP environment or not.

Therefore, I've ported Drupal's _filter_autop() function to Python, and implemented it as a Django template filter. From the limited testing I've done, the function appears to be working just as well in Django as it does in Drupal. You can find the function below.

import re
from django import template
from django.template.defaultfilters import force_escape, stringfilter
from django.utils.encoding import force_unicode
from django.utils.functional import allow_lazy
from django.utils.safestring import mark_safe


register = template.Library()


def autop_function(value):
    """
    Convert line breaks into <p> and <br> in an intelligent fashion.
    Originally based on: http://photomatt.net/scripts/autop
    
    Ported directly from the Drupal _filter_autop() function:
    http://api.drupal.org/api/function/_filter_autop
    """
    
    # All block level tags
    block = '(?:table|thead|tfoot|caption|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|form|blockquote|address|p|h[1-6]|hr)'

    # Split at <pre>, <script>, <style> and </pre>, </script>, </style> tags.
    # We don't apply any processing to the contents of these tags to avoid messing
    # up code. We look for matched pairs and allow basic nesting. For example:
    # "processed <pre> ignored <script> ignored </script> ignored </pre> processed"
    chunks = re.split('(</?(?:pre|script|style|object)[^>]*>)', value)
    ignore = False
    ignoretag = ''
    output = ''
    
    for i, chunk in zip(range(len(chunks)), chunks):
        prev_ignore = ignore
        
        if i % 2:
            # Opening or closing tag?
            is_open = chunk[1] != '/'
            tag = re.split('[ >]', chunk[2-is_open:], 2)[0]
            if not ignore:
                if is_open:
                    ignore = True
                    ignoretag = tag
            
            # Only allow a matching tag to close it.
            elif not is_open and ignoretag == tag:
                ignore = False
                ignoretag = ''
        
        elif not ignore:
            chunk = re.sub('\n*$', '', chunk) + "\n\n" # just to make things a little easier, pad the end
            chunk = re.sub('<br />\s*<br />', "\n\n", chunk)
            chunk = re.sub('(<'+ block +'[^>]*>)', r"\n\1", chunk) # Space things out a little
            chunk = re.sub('(</'+ block +'>)', r"\1\n\n", chunk) # Space things out a little
            chunk = re.sub("\n\n+", "\n\n", chunk) # take care of duplicates
            chunk = re.sub('\n?(.+?)(?:\n\s*\n|$)', r"<p>\1</p>\n", chunk) # make paragraphs, including one at the end
            chunk = re.sub("<p>(<li.+?)</p>", r"\1", chunk) # problem with nested lists
            chunk = re.sub('<p><blockquote([^>]*)>', r"<blockquote\1><p>", chunk)
            chunk = chunk.replace('</blockquote></p>', '</p></blockquote>')
            chunk = re.sub('<p>\s*</p>\n?', '', chunk) # under certain strange conditions it could create a P of entirely whitespace
            chunk = re.sub('<p>\s*(</?'+ block +'[^>]*>)', r"\1", chunk)
            chunk = re.sub('(</?'+ block +'[^>]*>)\s*</p>', r"\1", chunk)
            chunk = re.sub('(?<!<br />)\s*\n', "<br />\n", chunk) # make line breaks
            chunk = re.sub('(</?'+ block +'[^>]*>)\s*<br />', r"\1", chunk)
            chunk = re.sub('<br />(\s*</?(?:p|li|div|th|pre|td|ul|ol)>)', r'\1', chunk)
            chunk = re.sub('&([^#])(?![A-Za-z0-9]{1,8};)', r'&amp;\1', chunk)
        
        # Extra (not ported from Drupal) to escape the contents of code blocks.
        code_start = re.search('^<code>', chunk)
        code_end = re.search(r'(.*?)<\/code>$', chunk)
        if prev_ignore and ignore:
            if code_start:
                chunk = re.sub('^<code>(.+)', r'\1', chunk)
            if code_end:
                chunk = re.sub(r'(.*?)<\/code>$', r'\1', chunk)
            chunk = chunk.replace('<\\/pre>', '</pre>')
            chunk = force_escape(chunk)
            if code_start:
                chunk = '<code>' + chunk
            if code_end:
                chunk += '</code>'
        
        output += chunk
    
    return output

autop_function = allow_lazy(autop_function, unicode)

@register.filter
def autop(value, autoescape=None):
    return mark_safe(autop_function(value))
autop.is_safe = True
autop.needs_autoescape = True
autop = stringfilter(autop)

Update (31 May 2010): added the "Extra (not ported from Drupal) to escape the contents of code blocks" part of the code.

To use this filter in your Django templates, simply save the code above in a file called autop.py (or anything else you want) in a templatetags directory within one of your installed apps. Then, just declare {% load autop %} at the top of your templates, and filter your markup variables with something like {{ object.content|autop }}.

Note that this is pretty much a direct port of the Drupal / PHP function into Django / Python. As such, it's probably not as efficient nor as Pythonic as it could be. However, it seems to work quite well. Feedback and comments are welcome.

Comments are closed

Comment

03
May
2011

This is exactly what I was looking for - thanks!