[cl-typesetting-devel] HTML rendering

Klaus Weidner klaus at atsec.com
Sun Apr 18 10:04:08 UTC 2004


Hello,

after an all-day hacking session, I now have a proof of concept for XHTML
rendering via cl-typesetting working, see attachment. It's still missing
many tags, but that's mostly a matter of patience, I think the hard work
is largely done.

The following article had inspired me to do this:

  http://www-106.ibm.com/developerworks/library/x-xslfo2app/

It covers using XSL-FO stylesheet transforms on XHTML documents. There
you can also find the input document "everything.html" I used, along with
a FOP-rendered PDF version for comparison.

Instead of XSL-FO, I'm using Lisp-based tree transform from XML (read
from a file via the XMLS library) to cl-typesetting commands.

I think this is an important step towards having cl-typesetting behave
more like a normal program with input and output files rather than
exclusively as a library. And all it took was an "eval" in the right
place :-)

I've extended cl-typesetting a bit, to support cross-referencing (needed
for the table of contents and intra-document page references), left/right
page header alternation (containing chapter titles) and some other minor
stuff.

The only change to the core code was adding a :fresh-page token. It works
like :eop, but does not start a new page if the current one is already
empty. (This makes automated generation easier.) All the other extensions
were in client code.

I've attached the :fresh-page patch, and sample code to demonstrate the
new features. The latter is still a bit messy and undocumented, it's a
work in progress.

The XHTML transformer is still too unfinished for distribution, but please
let me know if you're interested in that and I'll work on cleaning it up.
But I'm surprised at how well it works already, considering that most of
my time had been spent figuring out how things work.

Some miscellaneous things I noticed:

- there are a couple of rather grating spelling errors in the source
  code, which I'd suggest fixing before the code is in more widespread
  use (later changes would be harder) - I can make a patch for those but
  wanted to check first if that's okay.

  	trimable => trimmable
	splitable => splittable
	ponctuation => punctuation

  Also, I'm not sure what a "trie" is (used by hyphenation). And I think
  that keyword arguments should not use a "-p" suffix.
  
- This would also be a good opportunity to document the style parameters
  supported by the public interface, since grepping through the source is
  suboptimal.

- I don't know how leaders (the dots joining table-of-content entries to
  the page numbers) should be implemented. Note that the dots should line
  up even if the TOC entries have different lengths, so it's not a simple
  matter of sticking periods in boxes.

- PDF hyperlinks aren't handled yet by the typesetting code - how about a
  link-box class which acts as a clickable area for the bounding box of
  the content?

- The W3 FO specification is a surprisingly good fit to the
  cl-typesetting internals, and since XML is isomorphic to s-exprs, I
  think it would make sense to stay close to that model where
  appropriate. That would make it much easier to reuse existing work,
  i.e. for DocBook or OpenOffice rendering.

Comments and feedback would be much appreciated.

-Klaus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output.pdf
Type: application/pdf
Size: 14871 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/cl-typesetting-devel/attachments/20040418/acc18b11/attachment.pdf>
-------------- next part --------------
diff -urN -b orig/cl-typesetting/layout.lisp cl-typesetting/layout.lisp
--- orig/cl-typesetting/layout.lisp	Mon Mar 22 03:09:13 2004
+++ cl-typesetting/layout.lisp	Sun Apr 18 01:45:08 2004
@@ -221,6 +221,15 @@
 	       (when (eq *h-align* :justified)
 		 (push (make-hfill-glue) line-boxes))
 	       (next-line line-boxes))
+	      ((eq box :fresh-page)
+	       (unless (or (null text-lines)
+			   (and (null (rest text-lines))
+				(every (lambda (box) (or (style-p box)
+							 (typep box 'h-spacing)
+							 (typep box 'v-spacing)))
+				       (boxes (first text-lines)))))
+		 (next-line line-boxes)
+		 (return-lines 0)))
 	      ((eq box :eop)
 	       (next-line line-boxes)
 	       (return-lines 0))
diff -urN -b orig/cl-typesetting/typo.lisp cl-typesetting/typo.lisp
--- orig/cl-typesetting/typo.lisp	Wed Mar 10 13:12:49 2004
+++ cl-typesetting/typo.lisp	Sun Apr 18 01:36:16 2004
@@ -231,6 +231,9 @@
 (defun new-line ()
   (add-box :eol))
 
+(defun fresh-page ()
+  (add-box :fresh-page))
+
 (defun new-page ()
   (add-box :eop))
 
@@ -242,6 +245,9 @@
 
 (defmethod insert-stuff ((obj (eql :eol)))
   '(new-line))
+
+(defmethod insert-stuff ((obj (eql :fresh-page)))
+  '(fresh-page))
 
 (defmethod insert-stuff ((obj (eql :eop)))
   '(new-page))
-------------- next part --------------
(in-package typeset)

;; reference handling

(defvar *ref-table* (make-hash-table :test #'equal))
(defvar *ref-counter* 0)
(defvar *bad-reference* nil)

(defclass ref-mark ()
  ((id :accessor ref-id :initform nil :initarg :id)
   (value :accessor ref-mark-value :initform nil :initarg :value)
   (page :accessor ref-mark-page :initform nil)
   (x :accessor ref-x :initform nil)
   (y :accessor ref-y :initform nil)))

(defmethod stroke ((mark ref-mark) x y)
  (setf (ref-mark-page mark) (length (pages pdf:*document*))
	(ref-x mark) x
	(ref-y mark) y))

(defmacro ref-get (id)
  `(gethash ,id *ref-table*))

(defun make-ref-mark (id &optional value)
  (let ((mark (or (ref-get id)
		  (make-instance 'ref-mark
				 :id id))))
    (setf (ref-get id) mark)
    (setf (ref-mark-value mark) value)
    (add-box mark)))

(defun ref-page (id)
  (let* ((ref (ref-get id))
	 (page (if ref (ref-mark-page ref))))
    (cond (page page)
	  (t (push id *bad-reference*)
	     999))))

(defun put-ref-page (id)
  (put-string (format nil "~D" (ref-page id))))

(defgeneric ref-value (ref))

(defmethod ref-value ((ref ref-mark))
  (if ref (ref-mark-value ref)))

(defmethod ref-value ((id t))
  (let ((ref (ref-get id)))
    (if ref (ref-mark-value ref))))

(defun put-ref-value (id)
  (put-string (ref-value id)))

(defun this-page-number ()
  (length (pages pdf:*document*)))

(defun make-ref-page-mark (reftype value)
  (make-ref-mark (cons reftype (incf *ref-counter*)) value))

(defun get-latest-ref-to (reftype for-page)
  (let ((refs nil))
    ;; Find all references of a type, store unsorted
    ;; (ordinal page ref) lists in "refs".
    (maphash (lambda (key ref)
	       (if (and (consp key)
			(equal reftype (car key)))
		   (push (list (cdr key)
			       (or (ref-mark-page ref) most-positive-fixnum)
			       ref)
			 refs)))
	     *ref-table*)
    ;; Now walk through the reverse sorted references,
    ;; and get the last matching one on or before the
    ;; current page.
    (third (find-if (lambda (page)
		      (<= page for-page))
		    (sort refs #'> :key #'car)
		    :key #'cadr))))

(defun current-ref-value (reftype)
  (ref-value (get-latest-ref-to reftype (this-page-number))))

;; higher-level layout

(defun safe-eval (expr)
  (let ((*package* (find-package "TYPESET"))
	(*read-eval* nil))
    (eval expr)))

;; Note that the tree argument to render-document is a dead list of
;; symbols and strings. This is a prerequisite for being to handle
;; documents that are completely generated at runtime.

(defun render-document (tree &key
			(file #P"/tmp/stuff.pdf")
			(twosided t)
			(paper-size :letter))
  "Render the document specified by tree, which is a s-exp containing
recursive typesetting commands. It gets eval'ed here to typeset it."
  (do ((*ref-table* (make-hash-table :test #'equal))
       (*ref-counter* 0)
       (*bad-reference* nil)
       (pass 0 (1+ pass)))
      ((or (> pass 1)
	   (and (> pass 0)
		(not *bad-reference*)))
       *bad-reference*)
    (setq *bad-reference* nil)
    (format t "Pass ~d~%" pass)
    (with-document ()
      (let ((margins '(72 72 72 50))
	    (header (lambda (pdf:*page*)
		      (if (current-ref-value :header-enabled)
			  (let ((inside (or (current-ref-value :title) "Untitled Document"))
				(outside (current-ref-value :chapter)))
			    (if (and twosided (evenp (this-page-number)))
				(compile-text (:font "Times-Roman" :font-size 10)
				  (hbox (:align :center :adjustable-p t)
				    (put-string outside)
				    :hfill
				    (with-style (:font "Times-Italic")
				      (put-string inside))))
				(compile-text (:font "Times-Roman" :font-size 10)
				  (hbox (:align :center :adjustable-p t)
				    (with-style (:font "Times-Italic")
				      (put-string inside))
				    :hfill
				    (put-string outside))))))))
	    (footer (lambda (pdf:*page*)
		      (if (current-ref-value :footer-enabled)
			  (let ((inside (or (current-ref-value :version) ""))
				(outside (format nil "Page ~d of ~d"
						 (this-page-number)
						 (ref-page "DocumentEnd"))))
			    (if (and twosided (evenp (this-page-number)))
				(compile-text (:font "Times-Roman" :font-size 10)
				  (hbox (:align :center :adjustable-p t)
				    (put-string outside)
				    :hfill
				    (put-string inside)))
				(compile-text (:font "Times-Roman" :font-size 10)
				  (hbox (:align :center :adjustable-p t)
				    (put-string inside)
				    :hfill
				    (put-string outside)))))))))
	
	(draw-pages (safe-eval (list 'compile-text () tree))
		    :margins margins :header header :footer footer
		    :size paper-size)
	(when pdf:*page* (finalize-page pdf:*page*))
	(pdf:write-document file)))))

;; Example follows.

(defun document-test ()
  (render-document
   '(with-style ()
     (make-ref-page-mark :title "Titled Document")
     (make-ref-page-mark :version "Version 1.x")
     (make-ref-page-mark :header-enabled nil)
     (make-ref-page-mark :footer-enabled nil)

     :vfill
     (paragraph (:font "Helvetica-Bold" :font-size 24 :h-align :center :bottom-margin 20)
       "This is the Document Title")
     (paragraph (:font "Helvetica-Bold" :font-size 16 :h-align :center)
       "A. N. Author")
     :vfill
     :eop
		       
     (make-ref-page-mark :header-enabled t)
     (make-ref-page-mark :footer-enabled t)
     (make-ref-mark '(:chapter . 0) "Table of Contents")
     (with-style (:font "Helvetica" :font-size 10)
       (hbox (:align :center :adjustable-p t)
	 (put-ref-value '(:chapter . 1))
	 :hfill
	 (put-ref-page '(:chapter . 1)))
       (hbox (:align :center :adjustable-p t)
	 (put-ref-value '(:chapter . 2))
	 :hfill
	 (put-ref-page '(:chapter . 2))))
     :eop
     
     (make-ref-mark '(:chapter . 1) "Introduction")
     (paragraph (:font "Times-Roman" :font-size 10 :h-align :left :bottom-margin 7)
       "Test with "
       (with-style (:font "Times-Bold")
	 "bold")
       " and "
       (with-style (:font "Times-Italic")
	 "italic")
       " text.")
     (paragraph (:font "Times-Roman" :font-size 10 :h-align :left :bottom-margin 7)
       (make-ref-mark "link-from")
       "See also stuff on page "
       (put-ref-page "stuff")
       ".")
     :eop
     
     (make-ref-mark '(:chapter . 2) "Interesting Stuff")
     (paragraph (:font "Courier" :font-size 10 :bottom-margin 7)
       (make-ref-mark "stuff")
       "Some" :eol "more" :eol "Text." )
     (paragraph (:font "Times-Roman" :font-size 10 :h-align :left :bottom-margin 7)
       "This is linked to from page "
       (put-ref-page "link-from")
       ".")
     
     (make-ref-mark "DocumentEnd"))))


More information about the cl-typesetting-devel mailing list