[cl-pdf-devel] Some problems with pdf-parser

Piotr Chamera piotr_chamera at poczta.onet.pl
Fri Jun 10 16:48:25 UTC 2011


Hi,
I just started with cl-pdf and it works great for me :)
but I found some problems in pdf-parser and need advice
how to fix it. I am rather novice Lisper so I can be wrong
in my guesses below...


1. In file cl-pdf, function find-cross-reference-start

function searches for 'startxref' in buffer _from beginning_
and can find incorrect place if at end of file (in buffer)
are two such sections (eg small incremental change at end of file).

Proposition: change

     (let ((position (search "startxref" buffer)))

to

     (let ((position (search "startxref" buffer :from-end t)))



2. In file cl-pdf, function make-indirect-object:

(defun make-indirect-object (obj-number gen-number position)
   (let ((object (or (car (gethash (cons obj-number gen-number) 
*indirect-objects*))
		    (make-instance 'indirect-object
				   :obj-number obj-number
				   :gen-number gen-number
				   :content :unread
				   :no-link t))))
     (setf (gethash (cons obj-number gen-number) *indirect-objects*) 
(cons object position))
     object))

I am working on file generated from Adobe Acrobat Distiller
and then cropped in Adobe Acrobat so at end of file there are
few modified objects with duplicate numbers (and generations �
whih is maybe bug in Acrobat?). When indirect-object objects
are read from file (in order from cross reference tables which
a read from newest to oldest) then newer one are overwritten
by older one with the same number. We end with readable pdf
but with some object revisions dropped.

I have added some print for debuggind in above function (and some
others) and for sample file got such a reading order:

startxref position: 89502
xref position: 89502
making obj: 4 0 position 85386
making obj: 5 0 position 89106
making obj: 8 0 position 89309
making obj: 7 0 position 0
xref position: 116
making obj: 6 0 position 16
making obj: 7 0 position 1150
making obj: 8 0 position 1227
making obj: 9 0 position 1411
making obj: 10 0 position 1554
(..)
making obj: 37 0 position 936
xref position: 85210
making obj: 1 0 position 81250
making obj: 2 0 position 81284
making obj: 3 0 position 81308
making obj: 4 0 position 81359
making obj: 5 0 position 85007

Which shows that in file are 4 duplicated objects and
they are overwritten by older versions (4 0, 5 0, 8 0, 7 0).


I think that solution would be to drop older objects when
newer wersion with the same number and generation was already read?
Something like this:

(defun make-indirect-object (obj-number gen-number position)
   (let ((object (gethash (cons obj-number gen-number) *indirect-objects*)))
     (if object
	(progn
	  (format T "obj alredy present: ~s ~s at position ~s (dropped older 
one at position ~s)~%"
		  obj-number gen-number
		  (cdr object) position)
	  (car object))
	(progn
	  (format T "making obj: ~s ~s position ~s ~%" obj-number gen-number 
position)
	  (let ((new-object (make-instance 'indirect-object
					   :obj-number obj-number
					   :gen-number gen-number
					   :content :unread
					   :no-link t)))
	    (setf (gethash (cons obj-number gen-number) *indirect-objects*) 
(cons new-object position))
	    new-object)))))

Which gives on the same example file

startxref position: 89502
xref position: 89502
making obj: 4 0 position 85386
making obj: 5 0 position 89106
making obj: 8 0 position 89309
making obj: 7 0 position 0
xref position: 116
making obj: 6 0 position 16
obj alredy present: 7 0 at position 0 (dropped older one at position 1150)
obj alredy present: 8 0 at position 89309 (dropped older one at position 
1227)
making obj: 9 0 position 1411
making obj: 10 0 position 1554
(...)
making obj: 37 0 position 936
xref position: 85210
making obj: 1 0 position 81250
making obj: 2 0 position 81284
making obj: 3 0 position 81308
obj alredy present: 4 0 at position 85386 (dropped older one at position 
81359)
obj alredy present: 5 0 at position 89106 (dropped older one at position 
85007)



But this reveals another problem in read-xref-and-trailer

(defun read-xref-and-trailer (position)
   (let (first-trailer)
     (loop
        (format T "xref position: ~s~%" position)
        (read-cross-reference-subsections position)
        (let* ((trailer (read-trailer)))
	 (unless first-trailer (setf first-trailer trailer))
	 (let ((prev-position (get-dict-value trailer "/Prev")))
	   (if prev-position
	       (setq position prev-position)
	       (return first-trailer)))))))

If I correctly read it, it reads trailers from most recent to older
and returns oldest instead of first read? So in read-pdf document gets 
incorrect information.

Can someone rewiew above and tell me if I search in good direction
or I am entirely wrong...


-- 
pozdrawiam
Piotr Chamera




More information about the cl-pdf-devel mailing list