An autop Django template filter
autop
is a script that was first written for WordPress by Matt Mullenweg (the WordPress founder). All WordPress blog posts are filtered using wpautop()
(unless you install an additional plug-in to disable the filter). The function was also ported to Drupal, and it's enabled by default when entering body text into Drupal nodes. As far as I'm aware, autop
has never been ported to a language other than PHP. Until now.
In the process of migrating this site from Drupal to Django, I was surprised to discover that not only Django, but also Python in general, lacks any linebreak filtering function (official or otherwise) that's anywhere near as intelligent as autop
. The built-in Django linebreaks
filter converts all single newlines to <br />
tags, and all double newlines to <p>
tags, completely irrespective of HTML block elements such as <code>
and <script>
. This was a fairly major problem for me, as I was migrating a lot of old content over from Drupal, and that content was all formatted in autop
style. Plus, I'm used to writing content in that way, and I'd like to continue writing content in that way, whether I'm in a PHP environment or not.
Therefore, I've ported Drupal's _filter_autop()
function to Python, and implemented it as a Django template filter. From the limited testing I've done, the function appears to be working just as well in Django as it does in Drupal. You can find the function below.
import re
from django import template
from django.template.defaultfilters import force_escape, stringfilter
from django.utils.encoding import force_unicode
from django.utils.functional import allow_lazy
from django.utils.safestring import mark_safe
register = template.Library()
def autop_function(value):
"""
Convert line breaks into <p> and <br> in an intelligent fashion.
Originally based on: http://photomatt.net/scripts/autop
Ported directly from the Drupal _filter_autop() function:
http://api.drupal.org/api/function/_filter_autop
"""
# All block level tags
block = '(?:table|thead|tfoot|caption|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|form|blockquote|address|p|h[1-6]|hr)'
# Split at <pre>, <script>, <style> and </pre>, </script>, </style> tags.
# We don't apply any processing to the contents of these tags to avoid messing
# up code. We look for matched pairs and allow basic nesting. For example:
# "processed <pre> ignored <script> ignored </script> ignored </pre> processed"
chunks = re.split('(</?(?:pre|script|style|object)[^>]*>)', value)
ignore = False
ignoretag = ''
output = ''
for i, chunk in zip(range(len(chunks)), chunks):
prev_ignore = ignore
if i % 2:
# Opening or closing tag?
is_open = chunk[1] != '/'
tag = re.split('[ >]', chunk[2-is_open:], 2)[0]
if not ignore:
if is_open:
ignore = True
ignoretag = tag
# Only allow a matching tag to close it.
elif not is_open and ignoretag == tag:
ignore = False
ignoretag = ''
elif not ignore:
chunk = re.sub('\n*$', '', chunk) + "\n\n" # just to make things a little easier, pad the end
chunk = re.sub('<br />\s*<br />', "\n\n", chunk)
chunk = re.sub('(<'+ block +'[^>]*>)', r"\n\1", chunk) # Space things out a little
chunk = re.sub('(</'+ block +'>)', r"\1\n\n", chunk) # Space things out a little
chunk = re.sub("\n\n+", "\n\n", chunk) # take care of duplicates
chunk = re.sub('\n?(.+?)(?:\n\s*\n|$)', r"<p>\1</p>\n", chunk) # make paragraphs, including one at the end
chunk = re.sub("<p>(<li.+?)</p>", r"\1", chunk) # problem with nested lists
chunk = re.sub('<p><blockquote([^>]*)>', r"<blockquote\1><p>", chunk)
chunk = chunk.replace('</blockquote></p>', '</p></blockquote>')
chunk = re.sub('<p>\s*</p>\n?', '', chunk) # under certain strange conditions it could create a P of entirely whitespace
chunk = re.sub('<p>\s*(</?'+ block +'[^>]*>)', r"\1", chunk)
chunk = re.sub('(</?'+ block +'[^>]*>)\s*</p>', r"\1", chunk)
chunk = re.sub('(?<!<br />)\s*\n', "<br />\n", chunk) # make line breaks
chunk = re.sub('(</?'+ block +'[^>]*>)\s*<br />', r"\1", chunk)
chunk = re.sub('<br />(\s*</?(?:p|li|div|th|pre|td|ul|ol)>)', r'\1', chunk)
chunk = re.sub('&([^#])(?![A-Za-z0-9]{1,8};)', r'&\1', chunk)
# Extra (not ported from Drupal) to escape the contents of code blocks.
code_start = re.search('^<code>', chunk)
code_end = re.search(r'(.*?)<\/code>$', chunk)
if prev_ignore and ignore:
if code_start:
chunk = re.sub('^<code>(.+)', r'\1', chunk)
if code_end:
chunk = re.sub(r'(.*?)<\/code>$', r'\1', chunk)
chunk = chunk.replace('<\\/pre>', '</pre>')
chunk = force_escape(chunk)
if code_start:
chunk = '<code>' + chunk
if code_end:
chunk += '</code>'
output += chunk
return output
autop_function = allow_lazy(autop_function, unicode)
@register.filter
def autop(value, autoescape=None):
return mark_safe(autop_function(value))
autop.is_safe = True
autop.needs_autoescape = True
autop = stringfilter(autop)
Update (31 May 2010): added the "Extra (not ported from Drupal) to escape the contents of code blocks" part of the code.
To use this filter in your Django templates, simply save the code above in a file called autop.py
(or anything else you want) in a templatetags
directory within one of your installed apps. Then, just declare {% load autop %}
at the top of your templates, and filter your markup variables with something like {{ object.content|autop }}
.
Note that this is pretty much a direct port of the Drupal / PHP function into Django / Python. As such, it's probably not as efficient nor as Pythonic as it could be. However, it seems to work quite well. Feedback and comments are welcome.