This is a new markup language that we call markmin designed to produce high quality scientific papers and books and also put them online. We provide serializers for html, latex and pdf. It is implemented in the markmin2html function in the markmin2html.py.
Example of usage:
m = "Hello **world** [[link http://web2py.com]]"
+
+
+
Markmin markup language
About
This is a new markup language that we call markmin designed to produce high quality scientific papers and books and also put them online. We provide serializers for html, latex and pdf. It is implemented in the markmin2html function in the markmin2html.py.
Example of usage:
m = "Hello **world** [[link http://web2py.com]]"
from markmin2html import markmin2html
print markmin2html(m)
from markmin2latex import markmin2latex
print markmin2latex(m)
from markmin2pdf import markmin2pdf # requires pdflatex
-print markmin2pdf(m)
Why?
We wanted a markup language with the following requirements:
less than 200 lines of functional code
easy to read
secure
support table, ul, ol, code
support html5 video and audio elements (html serialization only)
can align images and resize them
can specify class for tables and code elements
can add anchors
does not use _ for markup (since it creates odd behavior)
automatically links urls
fast
easy to extend
supports latex and pdf including references
allows to describe the markup in the markup (this document is generated from markmin syntax)
(results depend on text but in average for text ~100K markmin is 30% faster than markdown, for text ~10K it is 10x faster)
The web2py book published by lulu, for example, was entirely generated with markmin2pdf from the online web2py wiki
The format is always [[title link]] or [[title [extra] link]]. Notice you can nest bold, italic, strikeout and code inside the link title.
Anchors
You can place an anchor anywhere in the text using the syntax [[name]] where name is the name of the anchor.
-You can then link the anchor with link, i.e. [[link #myanchor]] or link with an extra info, i.e.
-[[link with an extra info [extra info] #myanchor]].
Images
-This paragraph has an image aligned to the right with a width of 200px. Its is placed using the code
[[alt-string for the image [the image title] http://www.web2py.com/examples/static/web2py_logo.png right 200px]].
Unordered Lists
- Dog
+print markmin2pdf(m)
====================
This is a test block with new features:
This is a blockquote with a list with tables in it:
This is a paragraph before list. You can continue paragraph on the next lines. This is an ordered list with tables:
Item 1
Item 2
aa
bb
cc
11
22
33
Item 4
T1
T2
t3
aaa
bbb
ccc
ddd
fff
ggg
123
0
5.0
This this a new paragraph with a table. Table has header and footer:
Title 1
Title 2
Title 3
data 1
data 2
2.00
data 4
data5(long)
23.00
data 8
33.50
Total:
3 items
58.50
Multilevel lists
Now lists can be multilevel:
Ordered item 1 on level 1. You can continue item text on next strings
Ordered item 1 of sublevel 2 with a paragraph (paragraph can start with point after plus or minus characters, e.g. ++. or --.)
This is another item. But with 3 paragraphs, blockquote and sublists:
This is the second paragraph in the item. You can add paragraphs to an item, using point notation, where first characters in the string are sequence of points with space between them and another string. For example, this paragraph (in sublevel 2) starts with two points:
.. This is the second paragraph...
this is a blockquote in a list
You can use blockquote with headers, paragraphs, tables and lists in it: Tables can have or have not header and footer. This table is defined without any header and footer in it:
red
fox
0
blue
dolphin
1000
green
leaf
10000
This is yet another paragraph in the item.
This is an item of unordered list (sublevel 3)
This is the second item of the unordered list (sublevel 3)
This is a single item of ordered list in sublevel 6
and this is a paragraph in sublevel 4
This is a new item with paragraph in sublevel 3.
Start ordered list in sublevel 4 with code block:
line 1
+ line 2
+ line 3
Yet another item with code block:
line 1
+line 2
+ line 3
This item finishes with this paragraph.
Item in sublevel 3 can be continued with paragraphs.
this is another
+code block
+ in the
+ sublevel 3 item
The last item in sublevel 3
This is a continuous paragraph for item 2 in sublevel 2. You can use such structure to create difficult structured documents.
item 3 in sublevel 2
item 1 in sublevel 2 (new unordered list)
item 2 in sublevel 2
item 3 in sublevel 2
item 1 in sublevel 2 (new ordered list)
item 2 in sublevel 2
item 3 in sublevle 2
item 2 in level 1
item 3 in level 1
new unordered list (item 1 in level 1)
level 2 in level 1
level 3 in level 1
level 4 in level 1
This is the last section of the test
Single paragraph with '----' in it will be turned into separator:
And this is the last paragraph in the test. Be happy!
====================
Why?
We wanted a markup language with the following requirements:
less than 300 lines of functional code
easy to read
secure
support table, ul, ol, code
support html5 video and audio elements (html serialization only)
can align images and resize them
can specify class for tables and code elements
can add anchors
does not use _ for markup (since it creates odd behavior)
automatically links urls
fast
easy to extend
supports latex and pdf including references
allows to describe the markup in the markup (this document is generated from markmin syntax)
(results depend on text but in average for text ~100K markmin is 30% faster than markdown, for text ~10K it is 10x faster)
The web2py book published by lulu, for example, was entirely generated with markmin2pdf from the online web2py wiki
The format is always [[title link]] or [[title [extra] link]]. Notice you can nest bold, italic, strikeout and code inside the link title.
Anchors
You can place an anchor anywhere in the text using the syntax [[name]] where name is the name of the anchor. You can then link the anchor with link, i.e. [[link #myanchor]] or link with an extra info, i.e. [[link with an extra info [extra info] #myanchor]].
Images
This paragraph has an image aligned to the right with a width of 200px. Its is placed using the code
[[alt-string for the image [the image title] http://www.web2py.com/examples/static/web2py_logo.png right 200px]].
Unordered Lists
- Dog
- Cat
-- Mouse
is rendered as
Dog
Cat
Mouse
Two new lines between items break the list in two lists.
Ordered Lists
+ Dog
+- Mouse
is rendered as
Dog
Cat
Mouse
Two new lines between items break the list in two lists.
-Four or more dashes delimit the table and | separates the columns.
-The :abc at the end sets the class for the table and it is optional.
Blockquote
A table with a single cell is rendered as a blockquote:
Hello world
Code, <code>, escaping and extra stuff
def test():
- return "this is Python code"
Optionally a ` inside a ``...`` block can be inserted escaped with !`!.
NOTE: You can escape markmin constructions ('',``,**,~~,[,{,]},$,@) with '\' character:
-so \`\` can replace !`!`! escape string
The :python after the markup is also optional. If present, by default, it is used to set the class of the <code> block.
-The behavior can be overridden by passing an argument extra to the render function. For example:
(the ``...``:custom block is rendered by the custom=lambda function passed to render).
Html5 support
Markmin also supports the <video> and <audio> html5 tags using the notation:
-
[[message link video]]
++ Mouse
is rendered as
Dog
Cat
Mouse
Multilevel Lists
+ Dogs
+ -- red
+ -- brown
+ -- black
++ Cats
+ -- fluffy
+ -- smooth
+ -- bald
++ Mice
+ -- small
+ -- big
+ -- huge
is rendered as
Dogs
red
brown
black
Cats
fluffy
smooth
bald
Mice
small
big
huge
Tables (with optional header and/or footer)
Something like this
-----------------
+**A**|**B**|**C**
+=================
+ 0 | 0 | X
+ 0 | X | 0
+ X | 0 | 0
+=================
+**D**|**F**|**G**
+-----------------:abc[id]
is a table and is rendered as
A
B
C
0
0
X
0
X
0
X
0
0
D
F
G
Four or more dashes delimit the table and | separates the columns. The :abc, :id[abc_1] or :abc[abc_1] at the end sets the class and/or id for the table and it is optional.
Blockquote
A table with a single cell is rendered as a blockquote:
Hello world
Blockquote can contain headers, paragraphs, lists and tables:
-----
+ This is a paragraph in a blockquote
+
+ + item 1
+ + item 2
+ -- item 2.1
+ -- item 2.2
+ + item 3
+
+ ---------
+ 0 | 0 | X
+ 0 | X | 0
+ X | 0 | 0
+ ---------:tableclass1
+-----
is rendered as:
This is a paragraph in a blockquote
item 1
item 2
item 2.1
item 2.2
item 3
0
0
X
0
X
0
X
0
0
Code, <code>, escaping and extra stuff
def test():
+ return "this is Python code"
Optionally a ` inside a ``...`` block can be inserted escaped with !`!.
NOTE: You can escape markmin constructions ('',``,**,~~,[,{,]},$,@) with '\' character: so \`\` can replace !`!`! escape string
The :python after the markup is also optional. If present, by default, it is used to set the class of the <code> block. The behavior can be overridden by passing an argument extra to the render function. For example:
<ul/>, <ol/>, <code/>, <table/>, <blockquote/>, <h1/>, ..., <h6/> do not have <p>...</p> around them.
diff --git a/gluon/contrib/markmin/markmin2html.py b/gluon/contrib/markmin/markmin2html.py
index 7ae2fd4c..1d761ae9 100755
--- a/gluon/contrib/markmin/markmin2html.py
+++ b/gluon/contrib/markmin/markmin2html.py
@@ -1,9 +1,11 @@
#!/usr/bin/env python
+# -*- coding: utf-8 -*-
# created by Massimo Di Pierro
-# improved by Vladyslav Kozlovskyy
+# recreated by Vladyslav Kozlovskyy
# license MIT/BSD/GPL
import re
-import cgi
+from cgi import escape
+from string import maketrans
"""
TODO: next version should use MathJax
@@ -41,11 +43,152 @@ print markmin2latex(m)
from markmin2pdf import markmin2pdf # requires pdflatex
print markmin2pdf(m)
``
+====================
+# This is a test block with new features:
+
+This is a blockquote with
+a list with tables in it:
+-----------
+ This is a paragraph before list.
+ You can continue paragraph on the
+ next lines.
+
+ This is an ordered list with tables:
+ + Item 1
+ + Item 2
+ + --------
+ aa|bb|cc
+ 11|22|33
+ --------:tableclass1[tableid1]
+ + Item 4
+ -----------
+ T1| T2| t3
+ ===========
+ aaa|bbb|ccc
+ ddd|fff|ggg
+ 123|0 |5.0
+ -----------:tableclass1
+-----------:blockquoteclass[blockquoteid]
+
+This this a new paragraph
+with a table. Table has header and footer:
+-------------------------------
+**Title 1**|**Title 2**|**Title 3**
+==============================
+data 1 | data 2 | 2.00
+data 4 |data5(long)| 23.00
+ |data 8 | 33.50
+==============================
+Total: | 3 items | 58.50
+------------------------------:tableclass1[tableid2]
+
+## Multilevel
+ lists
+
+Now lists can be multilevel:
+
++ Ordered item 1 on level 1.
+ You can continue item text on
+ next strings
+
+++. Ordered item 1 of sublevel 2 with
+ a paragraph (paragraph can start
+ with point after plus or minus
+ characters, e.g. **++.** or **--.**)
+
+++. This is another item. But with 3 paragraphs,
+ blockquote and sublists:
+
+.. This is the second paragraph in the item. You
+ can add paragraphs to an item, using point
+ notation, where first characters in the string
+ are sequence of points with space between
+ them and another string. For example, this
+ paragraph (in sublevel 2) starts with two points:
+ ``.. This is the second paragraph...``
+
+.. ----------
+ ### this is a blockquote in a list
+
+ You can use blockquote with headers, paragraphs,
+ tables and lists in it:
+
+ Tables can have or have not header and footer.
+ This table is defined without any header
+ and footer in it:
+ ---------------------
+ red |fox | 0
+ blue |dolphin | 1000
+ green|leaf | 10000
+ ---------------------
+ ----------
+
+.. This is yet another paragraph in the item.
+
+--- This is an item of unordered list **(sublevel 3)**
+--- This is the second item of the unordered list ''(sublevel 3)''
+
+++++++ This is a single item of ordered list in sublevel 6
+.... and this is a paragraph in sublevel 4
+---. This is a new item with paragraph in sublevel 3.
+++++ Start ordered list in sublevel 4 with code block: ``
+line 1
+ line 2
+ line 3
+``
+++++. Yet another item with code block:
+``
+ line 1
+line 2
+ line 3
+``
+This item finishes with this paragraph.
+
+... Item in sublevel 3 can be continued with paragraphs.
+
+... ``
+ this is another
+code block
+ in the
+ sublevel 3 item
+``
+
++++ The last item in sublevel 3
+.. This is a continuous paragraph for item 2 in sublevel 2.
+ You can use such structure to create difficult structured
+ documents.
+
+++ item 3 in sublevel 2
+-- item 1 in sublevel 2 (new unordered list)
+-- item 2 in sublevel 2
+-- item 3 in sublevel 2
+
+++ item 1 in sublevel 2 (new ordered list)
+++ item 2 in sublevel 2
+++ item 3 in sublevle 2
+
++ item 2 in level 1
++ item 3 in level 1
+- new unordered list (item 1 in level 1)
+- level 2 in level 1
+
+- level 3 in level 1
+- level 4 in level 1
+## This is the last section of the test
+
+Single paragraph with '----' in it will be turned into separator:
+
+-----------
+
+And this is the last paragraph in
+the test. Be happy!
+
+====================
## Why?
We wanted a markup language with the following requirements:
-- less than 200 lines of functional code
+- less than 300 lines of functional code
- easy to read
- secure
- support table, ul, ol, code
@@ -78,6 +221,7 @@ markmin2html.py and markmin2latex.py are single files and have no web2py depende
------------------------------------------------------------------------------
**SOURCE** | **OUTPUT**
+==============================================================================
``# title`` | **title**
``## section`` | **section**
``### subsection`` | **subsection**
@@ -138,26 +282,64 @@ is rendered as
+ Mouse
-### Tables
+### Multilevel Lists
+
+``
++ Dogs
+ -- red
+ -- brown
+ -- black
++ Cats
+ -- fluffy
+ -- smooth
+ -- bald
++ Mice
+ -- small
+ -- big
+ -- huge
+``
+
+is rendered as
++ Dogs
+ -- red
+ -- brown
+ -- black
++ Cats
+ -- fluffy
+ -- smooth
+ -- bald
++ Mice
+ -- small
+ -- big
+ -- huge
+
+
+### Tables (with optional header and/or footer)
Something like this
``
----------
-**A** | **B** | **C**
-0 | 0 | X
-0 | X | 0
-X | 0 | 0
------:abc
+-----------------
+**A**|**B**|**C**
+=================
+ 0 | 0 | X
+ 0 | X | 0
+ X | 0 | 0
+=================
+**D**|**F**|**G**
+-----------------:abc[id]
``
is a table and is rendered as
----------
-**A** | **B** | **C**
+-----------------
+**A**|**B**|**C**
+=================
0 | 0 | X
0 | X | 0
X | 0 | 0
------:abc
+=================
+**D**|**F**|**G**
+-----------------:abc[id]
Four or more dashes delimit the table and | separates the columns.
-The ``:abc`` at the end sets the class for the table and it is optional.
+The ``:abc``, ``:id[abc_1]`` or ``:abc[abc_1]`` at the end sets the class and/or id for the table and it is optional.
### Blockquote
@@ -167,6 +349,44 @@ A table with a single cell is rendered as a blockquote:
Hello world
-----
+Blockquote can contain headers, paragraphs, lists and tables:
+
+``
+-----
+ This is a paragraph in a blockquote
+
+ + item 1
+ + item 2
+ -- item 2.1
+ -- item 2.2
+ + item 3
+
+ ---------
+ 0 | 0 | X
+ 0 | X | 0
+ X | 0 | 0
+ ---------:tableclass1
+-----
+``
+
+is rendered as:
+-----
+ This is a paragraph in a blockquote
+
+ + item 1
+ + item 2
+ -- item 2.1
+ -- item 2.2
+ + item 3
+
+ ---------
+ 0 | 0 | X
+ 0 | X | 0
+ X | 0 | 0
+ ---------:tableclass1
+-----
+
+
### Code, ````, escaping and extra stuff
``
@@ -227,7 +447,7 @@ extra={'code_cpp':lambda text: CODE(text,language='cpp').xml(),
``
or simple:
``
-extra={'code':lambda text,lang='': CODE(text,language=lang).xml()}
+extra={'code':lambda text,lang='python': CODE(text,language=lang).xml()}
``
``
markmin2html(text,extra=extra)
@@ -266,10 +486,12 @@ Here is an example of usage:
As shown in Ref.!`!`mdipierro`!`!:cite
## References
+
- [[mdipierro]] web2py Manual, 3rd Edition, lulu.com
``
### Caveats
+
``
'
"""
text = str(text or '')
+ text = regex_backslash.sub(lambda m: m.group(1).translate(ttab_in), text)
+
if environment:
def u2(match, environment=environment):
- b,a = match.group('b','a')
- return b + str(environment.get(a, match.group(0)))
- text = regex_env.sub(u2,text)
+ return str(environment.get(match.group('a'), match.group(0)))
+ text = regex_env.sub(u2, text)
+
if URL is not None:
- # this is experimental @{controller/index/args}
+ # this is experimental @{function/args}
# turns into a digitally signed URL
def u1(match,URL=URL):
- b,f,args = match.group('b','f','args')
- return b + URL(f,args=args.split('/'),scheme=True,host=True)
+ f,args = match.group('f','args')
+ return URL(f,args=args.split('/'), scheme=True, host=True)
text = regex_URL.sub(u1,text)
if latex == 'google':
text = regex_dd.sub('``\g``:latex ', text)
- text = regex_newlines.sub('\n',text)
#############################################################
- # replace all blocks marked with ``...``:class with META
+ # replace all blocks marked with ``...``:class[id] with META
# store them into segments they will be treated as code
#############################################################
segments = []
def mark_code(m):
+ g = m.group(0)
if m.group() in ( META, DISABLED_META ):
- segments.append((None, None, None, m.group(0)))
+ segments.append((None, None, None, g))
return m.group()
else:
c = m.group('c') or ''
p = m.group('p') or ''
- b = m.group('b') or ''
if 'code' in allowed and not c in allowed['code']: c = ''
code = m.group('t').replace('!`!','`')
segments.append((code, c, p, m.group(0)))
- return b + META
+ return META
text = regex_code.sub(mark_code, text)
#############################################################
# replace all blocks marked with [[...]] with LINK
- # store them into links|medias they will be treated as link
+ # store them into links they will be treated as link
#############################################################
links = []
def mark_link(m):
- if m.group() == LINK:
- links.append(None)
- b = ''
- else:
- s = m.group('s') or ''
- b = m.group('b') or ''
- links.append(s)
- return b + LINK
+ links.append( None if m.group() == LINK
+ else m.group('s') )
+ return LINK
text = regex_link.sub(mark_link, text)
-
- #############################################################
- # normalize spaces
- #############################################################
- text = '\n'.join(t.strip() for t in text.split('\n'))
- text = cgi.escape(text)
+ text = escape(text)
if auto:
text = regex_iframe.sub('',text)
@@ -579,86 +786,347 @@ def render(text,extra={},allowed={},sep='p',URL=None,environment=None,latex='goo
text = regex_auto.sub('\g', text)
#############################################################
- # do h1,h2,h3,h4,h5,h6,b,i,ol,ul
+ # normalize spaces
#############################################################
- for regex, sub in regex_maps:
- text = regex.sub(sub,text)
+ strings=[t.strip() for t in text.split('\n')]
- #############################################################
- # process tables and blockquotes
- #############################################################
- while True:
- item = regex_table.search(text)
- if not item: break
- c = item.group('c') or ''
- if 'table' in allowed and not c in allowed['table']: c = ''
- content = item.group('t')
- if ' | ' in content:
- rows = content.replace('\n','
').replace(' | ','
')
- text = text[:item.start()] + '<
'%c + rows + '
\n' + text[item.end():]
+ def parse_title(t, s): #out, lev, etags, tag, s):
+ hlevel=str(len(t))
+ out.extend(etags[::-1])
+ out.append("%s"%(hlevel,s))
+ etags[:]=[""%hlevel]
+ lev=0
+ ltags[:]=[]
+ tlev[:]=[]
+ return (lev, 'h')
+
+ def parse_list(t, p, s, tag, lev, mtag, lineno):
+ lent=len(t)
+ if lentlent:
+ ltags.pop()
+ out.append(etags.pop())
+ lev=lent
+ tlev[lev:]=[]
+
+ if lent>lev: # current item level > previous item level
+ if lev==0: # previous line is not a list (paragraph or title)
+ out.extend(etags[::-1])
+ ltags[:]=[]
+ tlev[:]=[]
+ etags[:]=[]
+ if pend and mtag == '.': # paragraph in a list:
+ out.append(etags.pop())
+ ltags.pop()
+ for i in xrange(lent-lev):
+ out.append('<'+tag+'>')
+ etags.append(''+tag+'>')
+ lev+=1
+ ltags.append(lev)
+ tlev.append(tag)
+ elif lent == lev:
+ if tlev[-1] != tag:
+ # type of list is changed (ul<=>ol):
+ for i in xrange(ltags.count(lent)):
+ ltags.pop()
+ out.append(etags.pop())
+ tlev[-1]=tag
+ out.append('<'+tag+'>')
+ etags.append(''+tag+'>')
+ ltags.append(lev)
+ else:
+ if ltags.count(lev)>1:
+ out.append(etags.pop())
+ ltags.pop()
+ mtag='l'
+ out.append('
')
+ etags.append('
')
+ ltags.append(lev)
+ if s[:1] == '-':
+ (s, mtag, lineno) = parse_table_or_blockquote(s, mtag, lineno)
+ if p and mtag=='l':
+ (lev,mtag,lineno)=parse_point(t, s, lev, '', lineno)
else:
- text = text[:item.start()] + '<
'%c + content + '
\n' + text[item.end():]
+ out.append(s)
+
+ return (lev, mtag, lineno)
+
+ def parse_point(t, s, lev, mtag, lineno):
+ """ paragraphs in lists """
+ lent=len(t)
+ if lent>lev:
+ return parse_list(t, '.', s, 'ul', lev, mtag)
+ elif lentlent:
+ ltags.pop()
+ out.append(etags.pop())
+ lev=lent
+ tlev[lev:]=[]
+ mtag=''
+ elif lent==lev:
+ if pend and mtag == '.':
+ out.append(etags.pop())
+ ltags.pop()
+ if br and mtag in ('l','.'):
+ out.append(br)
+ if s == META:
+ mtag = ''
+ else:
+ mtag = '.'
+ if s[:1] == '-':
+ (s, mtag, lineno) = parse_table_or_blockquote(s, mtag, lineno)
+ if mtag == '.':
+ out.append(pbeg)
+ if pend:
+ etags.append(pend)
+ ltags.append(lev)
+ out.append(s)
+ return (lev, mtag, lineno)
+
+ def parse_table_or_blockquote(s, mtag, lineno):
+ # check next line. If next line :
+ # - is empty -> this is an tag
+ # - consists '|' -> table
+ # - consists other characters -> blockquote
+ if ( lineno+1 >= strings_len or
+ not (s.count('-') == len(s) and len(s)>3) ):
+ return (s, mtag, lineno)
+
+ lineno+=1
+ s = strings[lineno]
+ if s:
+ if '|' in s:
+ # table
+ tout=[]
+ thead=[]
+ tbody=[]
+ t_id = ''
+ t_cls = ''
+
+ # parse table:
+ while lineno < strings_len:
+ s = strings[lineno]
+ if s[:1] == '=':
+ if s.count('=')==len(s) and len(s)>3: # header or footer
+ if not thead: # if thead list is empty:
+ thead = tout
+ else: # if tbody list is empty:
+ tbody.extend(tout)
+ tout = []
+ lineno+=1
+ continue
+
+ m = regex_tq.match(s)
+ if m:
+ t_cls = m.group('c') or ''
+ t_id = m.group('p') or ''
+ break
+
+ tout.append('
'+''.join(['
%s
'% \
+ (' class="num"'
+ if regex_num.match(f)
+ else '',
+ f.strip()
+ ) for f in s.split('|')])+'
')
+ lineno+=1
+
+ t_cls = ' class="%s"'%t_cls if t_cls and t_cls != 'id' else ''
+ t_id = ' id="%s"'%t_id if t_id else ''
+ s = ''
+ if thead:
+ s += ''+''.join([l for l in thead])+''
+ if not tbody: # tbody strings are in tout list
+ tbody = tout
+ tout = []
+ if tbody: # if tbody list is not empty:
+ s += ''+''.join([l for l in tbody])+''
+ if tout: # tfoot is not empty:
+ s += ''+''.join([l for l in tout])+''
+ s = '
%s
' % (t_cls, t_id, s)
+ mtag='t'
+ else:
+ # parse blockquote:
+ bq_begin=lineno
+ t_mode = False # embidded table
+ t_cls = ''
+ t_id = ''
+
+ # search blockquote closing line:
+ while lineno < strings_len:
+ s = strings[lineno]
+ if not t_mode:
+ m = regex_tq.match(s)
+ if m:
+ if lineno+1 == strings_len or '|' not in strings[lineno+1]:
+ t_cls = m.group('c') or ''
+ t_id = m.group('p') or ''
+ break
+
+ if regex_bq_headline.match(s):
+ if lineno+1 < strings_len and strings[lineno+1]:
+ t_mode = True
+ lineno+=1
+ continue
+ elif regex_tq.match(s):
+ t_mode=False
+ lineno+=1
+ continue
+
+ lineno+=1
+
+ t_cls = ' class="%s"'%t_cls if t_cls and t_cls != 'id' else ''
+ t_id = ' id="%s"'%t_id if t_id else ''
+ s = '
"
+ br = ''
+ else:
+ pbeg = pend = ''
+ br = " " if sep=='br' else ''
+
+ lev = 0 # рівень вкладеності списків
+ c0 = '' # перший символ поточного рядка
+ out = [] # результуючий список рядків
+ etags = [] # завершуючі таги
+ ltags = [] # номер рівня відповідний завершуючому тагу
+ tlev = [] # таг рівня ('ul' або 'ol')
+ mtag = '' # marked tag (~last tag) ('l','.','h','p','t'). Used for set
+ # and for avoid around tables and blockquotes
+ lineno = 0
+ strings_len = len(strings)
+ while lineno < strings_len:
+ s = strings[lineno]
+ """ # + - . ---------------------
+ ## ++ -- .. ------- field | field | field <-title
+ ### +++ --- ... quote =====================
+ #### ++++ ---- .... ------- field | field | field <-body
+ ##### +++++ ----- ..... ---------------------:class[id]
+ """
+ pc0=c0 # перший символ попереднього рядка
+ c0=s[:1]
+ if c0: # for non empty strings
+ if c0 in "#+-.": # first character is one of: # + - .
+ (t,p,s) = regex_list.findall(s)[0] # t - tag ("###", "+++", "---", "...")
+ # p - paragraph point ('.')->for "++." or "--."
+ # s - other part of string
+ if t:
+ # headers and lists:
+ if c0 == '#': # headers
+ (lev, mtag) = parse_title(t, s)
+ elif c0 == '+': # ordered list
+ (lev, mtag, lineno)= parse_list(t, p, s, 'ol', lev, mtag, lineno)
+ elif c0 == '-': # unordered list
+ (lev, mtag, lineno) = parse_list(t, p, s, 'ul', lev, mtag, lineno)
+ else: # c0 == '.' # paragraph in lists
+ (lev, mtag, lineno) = parse_point(t, s, lev, mtag, lineno)
+ lineno+=1
+ continue
+ else:
+ if c0 == '-': # table or blockquote?
+ (s, mtag, lineno) = parse_table_or_blockquote(s, mtag, lineno)
+
+ if lev == 0 and (mtag == 'q' or s == META):
+ # new paragraph
+ pc0=''
+
+ if pc0 == '':
+ # paragraph
+ out.extend(etags[::-1])
+ etags=[]
+ ltags=[]
+ tlev=[]
+ lev=0
+ if br and mtag == 'p': out.append(br)
+ if mtag != 'q' and s != META:
+ if pend: etags=[pend]
+ out.append(pbeg)
+ mtag = 'p'
+ else:
+ mtag = ''
+ out.append(s)
+ else:
+ if lev>0 and mtag=='.' and s == META:
+ out.append(etags.pop())
+ ltags.pop()
+ out.append(s)
+ mtag = ''
+ else:
+ out.append(' '+s)
+ lineno+=1
+ out.extend(etags[::-1])
+ text = ''.join(out)
#############################################################
- # deal with paragraphs (trick <
%s'%p or '%s'%p) \
- for p in items if p.strip())
- elif sep=='br':
- text = ' '.join(items)
-
- #############################################################
- # finally get rid of <<
- #############################################################
- text=text.replace('<<','<')
+ text = regex_strong.sub('\g', text)
+ text = regex_del.sub('\g', text)
+ text = regex_em.sub('\g', text)
#############################################################
# deal with images, videos, audios and links
#############################################################
def sub_media(m):
- d=m.groupdict()
- if not d['k']:
+ t,a,k,p,w = m.group('t','a','k','p','w')
+ if not k:
return m.group(0)
- d['k'] = cgi.escape(d['k'])
- d['t'] = d['t'] or ''
- d['width'] = ' width="%s"'%d['w'] if d['w'] else ''
- d['title'] = ' title="%s"'%cgi.escape(d['a']).replace(META, DISABLED_META) if d['a'] else ''
- d['style'] = d['p_begin'] = d['p_end'] = ''
- if d['p'] == 'center':
- d['p_begin'] = '
'
- d['p_end'] = '
'
- elif d['p'] in ('left','right'):
- d['style'] = ' style="float:%s"'%d['p']
- if d['p'] in ('video','audio'):
- d['t']=render(d['t'],{},{},'',URL,environment,latex,auto)
- return '<%(p)s controls="controls"%(title)s%(width)s>%(t)s%(p)s>'%d
- d['alt'] = ' alt="%s"'%cgi.escape(d['t']).replace(META, DISABLED_META) if d['t'] else ''
- return '%(p_begin)s%(p_end)s'%d
+ k = escape(k)
+ t = t or ''
+ width = ' width="%s"' % w if w else ''
+ title = ' title="%s"' % escape(a).replace(META, DISABLED_META) if a else ''
+ style = p_begin = p_end = ''
+ if p == 'center':
+ p_begin = '
'
+ p_end = '
'
+ elif p in ('left','right'):
+ style = ' style="float:%s"' % p
+ if p in ('video','audio'):
+ t = render(t, {}, {}, 'br', URL, environment, latex, auto)
+ return '<%(p)s controls="controls"%(title)s%(width)s>%(t)s%(p)s>' \
+ % dict(p=p, title=title, width=width, k=k, t=t)
+ alt = ' alt="%s"'%escape(t).replace(META, DISABLED_META) if t else ''
+ return '%(begin)s%(end)s' \
+ % dict(begin=p_begin, k=k, alt=alt, title=title,
+ style=style, width=width, end=p_end)
def sub_link(m):
- d=m.groupdict()
- if not d['k'] and not d['t']:
+ t,a,k,p = m.group('t','a','k','p')
+ if not k and not t:
return m.group(0)
- d['t'] = d['t'] or ''
- d['a'] = cgi.escape(d['a']) if d['a'] else ''
- if d['k']:
- d['k'] = cgi.escape(d['k'])
- d['title'] = ' title="%s"' % d['a'].replace(META, DISABLED_META) if d['a'] else ''
- d['target'] = ' target="_blank"' if d['p'] == 'popup' else ''
- d['t'] = render(d['t'],{},{},'',URL,environment,latex,auto) if d['t'] else d['k']
- return '%(t)s'%d
- d['t'] = cgi.escape(d['t'])
- return '%(a)s'%d
+ t = t or ''
+ a = escape(a) if a else ''
+ if k:
+ k = escape(k)
+ title = ' title="%s"' % a.replace(META, DISABLED_META) if a else ''
+ target = ' target="_blank"' if p == 'popup' else ''
+ t = render(t, {}, {}, 'br', URL, environment, latex, auto) if t else k
+ return '%(t)s' \
+ % dict(k=k, title=title, target=target, t=t)
+ return '%s' % (escape(t),a)
parts = text.split(LINK)
text = parts[0]
for i,s in enumerate(links):
- if s==None:
+ if s == None:
html = LINK
else:
html = regex_media_level2.sub(sub_media, s)
@@ -667,7 +1135,7 @@ def render(text,extra={},allowed={},sep='p',URL=None,environment=None,latex='goo
if html == s:
# return unprocessed string as a signal of an error
html = '[[%s]]'%s
- text = text+html+parts[i+1]
+ text += html + parts[i+1]
#############################################################
# process all code text
@@ -675,7 +1143,7 @@ def render(text,extra={},allowed={},sep='p',URL=None,environment=None,latex='goo
def expand_meta(m):
code,b,p,s = segments.pop(0)
if code==None or m.group() == DISABLED_META:
- return cgi.escape(s)
+ return escape(s)
if b in extra:
if code[:1]=='\n': code=code[1:]
if code[-1:]=='\n': code=code[:-1]
@@ -686,24 +1154,27 @@ def render(text,extra={},allowed={},sep='p',URL=None,environment=None,latex='goo
elif b=='cite':
return '['+','.join('%s' \
% (d,b,d) \
- for d in cgi.escape(code).split(','))+']'
+ for d in escape(code).split(','))+']'
elif b=='latex':
return LATEX % code.replace('"','\"').replace('\n',' ')
elif b in html_colors:
return '%s' \
- % (b, render(code,{},{},'',URL,environment,latex,auto))
+ % (b, render(code,{},{},'br',URL,environment,latex,auto))
elif b in ('c', 'color') and p:
c=p.split(':')
fg='color: %s;' % c[0] if c[0] else ''
bg='background-color: %s;' % c[1] if len(c)>1 and c[1] else ''
return '%s' \
- % (fg, bg, render(code,{},{},'',URL,environment,latex,auto))
- elif code[:1]=='\n' and code[-1:]=='\n':
- return '
%s
' % (b,cgi.escape(code[1:-1]))
- return '%s' % (b,cgi.escape(code[ (code[:1]=='\n')
- : [None,-1][code[-1:]=='\n']]))
+ % (fg, bg, render(code,{},{},'br', URL, environment, latex, auto))
+ cls = ' class="%s"'%b if b and b != 'id' else ''
+ id = ' id="%s"'%escape(p) if p else ''
+ if code[:1]=='\n' and code[-1:]=='\n':
+ return '
%s
' % (cls, id, escape(code[1:-1]))
+ return '%s' \
+ % (cls, id, escape(code[ (code[:1]=='\n')
+ : [None,-1][code[-1:]=='\n']]))
text = regex_expand_meta.sub(expand_meta, text)
- text = remove_backslashes(text)
+ text = text.translate(ttab_out)
return text
def markmin2html(text, extra={}, allowed={}, sep='p', auto=True):
@@ -713,7 +1184,27 @@ if __name__ == '__main__':
import sys
import doctest
if sys.argv[1:2] == ['-h']:
- print ''+markmin2html(__doc__)+''
+ print """
+
+ """+markmin2html(__doc__)+''
+ elif sys.argv[1:2] == ['-t']:
+ from timeit import Timer
+ loops=1000
+ ts = Timer("markmin2html(__doc__)","from markmin2html import markmin2html")
+ print 'timeit "markmin2html(__doc__)":'
+ t = min([ts.timeit(loops) for i in range(3)])
+ print "%s loops, best of 3: %.3f ms per loop" % (loops, t/1000*loops)
elif len(sys.argv) > 1:
fargv = open(sys.argv[1],'r')
try: