[cl-pdf-devel] National language support and other proposals

Thu Feb 5 14:18:41 UTC 2004

"Dmitri Ivanov" <divanov at aha.ru> writes:
> Hello,

Hi  Dmitri,

> 1. National Language Support
>
> First, it is hardly possible to transfer binary data to a file in
> non-Latin-1 external format. So I vote for opening PDF file as a binary
> stream, not a character stream. In my code, this is the choice when the
> pdf-binary feature is on.
>
> Second, I suggest zlib compress-string returning (vector (unsigned-byte
8))
> instead of a string. My version of zlib-lw.lisp is an example.
>
> The benefit would be better control over char-to-byte conversion. The
> drawback is that we should always do this manually :-). Only two
> write-object methods would be affected if we limited ourselves by using
> national characters only in contents streams.

I agree with you that this would be a better way. But these encoding and
binary stream issues are really, really touchy.
What works with LWW might not work on other implementations even if the have
bivalent streams or even on other OS (LWL for instance)

> 2. Fonts
>
> Unfortunately, I have failed referring to TrueType fonts without embedding
> them. So I converted them to Type1 subsets and experimented.
>
> 2.1. Embedding Fonts
>
> IMHO, one useful option could be not embed a custom Type1 font into a
> document provided it had already installed on the target computer. The
> following primitives can do:
>
> (defvar *embed-fonts* :default)
> (defgeneric font-descriptor (font-metrics &key embed errorp))

ok. You must be sure that the user have them though.

> 2.2. Some fixes are needed in case the pfb-file is missing and/or the
> afm-file is limited, e.g. was generated from an pfm, e.g.
>
> (defun load-t1-font (afm-file &optional pfb-file) ...)

ok, but the italic-correction must be a number. (0 if no bbox)
And if there is no bbox, I don't see how you can use the font. Or it's a
fixed size font, in that case the bbox should be the one defined for the
font.

> 3. Encoding
>
> I suggest renaming the slot standard-encoding to standard-p or built-in-p
> (in order not to correlate with *standard-encoding*); alternatively,
remove
> it and introduce a subclass, named
> built-in-encoding.
>
> For custom needs, I propose:
>
> (defclass custom-encoding (encoding)
> ((base-encoding :initarg :base-encoding :reader base-encoding :initform
> nil)))

ok

> The value of *win-1251-encoding*, an instance of the custom-encoding
class,
> was generated in that manner in accordance with Adobe's glyph list.

ok

> For flexibility, compute-encoding-differences could be redefined as
follows
> (defun compute-encoding-differences
>       (encoding &optional (from *standard-encoding*))

ok, have you verified that this does not yield to problems with standard
fonts ?
IIRC there was a problem with the encoding differences this is why I ended
up with the whole encoding.

> Though :win-ansi-encoding gets better results for installed fonts that are
> not embedded, but basing a custom encoding on it via /BaseEncoding
> completely fails for embedded Type1 fonts!
>
> Hence the question: can we do with *default-encoding* as the default value
> of the encoding parameter of get-font? It seems that get-font should not
> provide any default for it. Instead, if it is null, the
> extract-font-metrics-encoding tries to extract it from the font metrics.
> IMHO, extract-font-metrics-encoding should be enhanced to convert
> EncodingScheme "AdobeStandardEncoding" to :standard-encoding and the like.
> Adding a encoding parameter the read-afm-file function that is to be used
as
> a substitution of "FontSpecific" EncodingScheme in is also a choice.

I will look at this in more details.

> 4. PDF Dictionaries
>
> I suggest the following generic function to create dictionaries in a more
> regular way:
> (defgeneric make-dictionary (thing &key &allow-other-keys))

ok

> For dictionary property names, I would recommend using symbols like
> | /Length| or keywords with corresponding string set as properties, e.g.:
>
> (setf (get :length 'pdf:namestring) "/Length")
> (defmethod write-object ((obj keyword) &optional root-level)
>   (declare (ignorable root-level))
>   (write-string (or (get obj 'pdf:namestring) (symbol-name obj))
>              *pdf-stream*))
>
> Strings are quite enough for the time being (generation), but could lead
to
> excessive memory consumption and performance degradation for more complex
tasks:
> parsing and editing.

When you parse a pdf file, the major amount of data is in the streams, not
in the dictionaries.
So why not? But there are a lot of more important points to improve from a
performance point of view. (I have some ideas if you want... ;-)

> 5. Code Restructure
>
> 5.1. (defclass pdf-stream (dictionary)
>    (...(no-compression :accessor no-compression
> :initarg :no-compression :initform nil)))
>
> I would rename the no-compression slot to 'compression' and assign a
decode
> filter designator to it, e.g. |Flate|, t (equivalent to |Flate|), or
other.

OK but there are no other compression scheme for now. (in cl-pdf)

> 5.2. Slightly misnomers
>   find-font-object
>   find-encoding-object
>   find-gstate-object
> I would prefer find-or-make- or ensure- (more CLOS-like).

Yes. I also like ensure.

> 6. My Test Environment
>
> I have tested all the examples listed in the /examples directory on LWW
4.3
> and they seemed to work fine. I neither included init.lisp nor zlib.lisp
nor
> t3-fonts in my test environment, nor asdf installation. Instead, I used LW
> defsys augmented by zliw-lw.lisp and di-contrib.lisp.

A good test is to run the cl-typesetting example. ;-)
An even better test test would be to add a Russian paragraph to the
cl-typesetting example.

> BTW I ran into a useful resource, http://www.fpdf.org/
> These guys make the same thing in Perl, expose source code, and provide
some
> useful hints in the forum.

Interesting I will have a look.

Thanks for all these improvements!

I will merge most of them ASAP. Some others would require testing on other
implementations and/or OS before I include them.

Marc