From divanov at aha.ru Thu Jan 10 08:52:31 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Thu, 10 Jan 2008 11:52:31 +0300 Subject: [cl-pdf-devel] Mapping useful Unicode characters to single-byte-encoding Message-ID: <000001c85366$25592d30$8100a8c0@digo> Hello folks, I guess somebody has been experiencing problems when printing characters like Em-dash with standard fonts. The reason is the following. These characters are represented in the Lisp system internally as Unicode (usually UCS-2), but their counterparts in Helvetica and other fonts are glyphs with codes in [0...255] range. For custom-encoding, char-external-code provides necessary mapping thanks to a charset notion. But for single-byte-encoding, we do not have anything. I suggest introducing "extended charset", an alist storing (char . code) pairs: (defparameter *char-single-byte-codes* '((#.(code-char #x2014) . #x97))) ; Em dash: 3212 -> 151 (defmethod charset ((encoding single-byte-encoding)) *char-single-byte-codes*) (defun char-external-code (char charset) (cond ((null charset) (char-code char)) ((atom charset) #+lispworks (ef:char-external-code char charset) #+allegro (aref (excl:string-to-octets (coerce `(,char) 'string) :external-format charset) 0) #-(or lispworks allegro) (char-code char)) ((cdr (assoc char charset)) ; map to single-byte if possible (t (char-code char)))) For single-byte-encoding, we would define methods on get-char-metrics and write-to-page. Actually, I have got all the code working and could commit it. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From divanov at aha.ru Mon Jan 14 09:59:49 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Mon, 14 Jan 2008 12:59:49 +0300 Subject: [cl-pdf-devel] Mapping useful Unicode characters to single-byte-encoding References: <000001c85366$25592d30$8100a8c0@digo> Message-ID: <002401c85695$7ba42550$8100a8c0@digo> Hello folks, I have slightly refactored and committed the code related to the problem mentioned in my previous post. Additionally, the *default-charset* special has been introduced to output strings outside content streams properly. The corresponding method on write-object was altered in pdf.lisp. Please, update and test carefully :-) -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From marc.battyani at fractalconcept.com Sun Jan 20 22:06:40 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Sun, 20 Jan 2008 23:06:40 +0100 Subject: [cl-pdf-devel] Unicode issue In-Reply-To: References: Message-ID: <4793C5F0.1030503@fractalconcept.com> Andrey Moskvitin wrote: > >>/ The example uses a unicode font and simply puts a string into the > />>/ output document which contains #\Space #\( #\) and a few other > />>/ characters. Both parenthesis and space are needed to produce the wrong > > />>/ pdf./ > > > Unfortunately, the unicode integration is still in an alpha state as > > I've never found the time to continue it :( > > I will look at your example to see if I can make a fix for it. > > > I encountered the same problem with unbalanced parentheses in text. > > Sample code (Gentoo Linux, sbcl-1.0.12): > > (pdf:load-ttu-font #P"/usr/local/fonts/arial.ufm" #P"/usr/local/fonts/arial.ttf") > > > (with-open-file (out #P"/tmp/bad.pdf" :direction :output :if-exists :supersede :element-type :default :external-format :latin-1) > (pdf:with-document () > (pdf:with-page () > (pdf:in-text-mode > > (pdf:move-text 20 20) > (pdf:set-font (pdf:get-font "ArialMT") 12) > (pdf::show-text "hello ("))) > (pdf:write-document out))) > > The problem in the function of write-cid-string. I change it and solved the problem: > > > --- /usr/share/common-lisp/source/cl-pdf/pdf-base.lisp 2007-10-10 12:50:45.000000000 +0000 > +++ pdf-base.lisp 2007-12-29 16:38:18.000000000 +0000 > @@ -25,11 +25,14 @@ > (write-char #\( *page-stream*) > > (if (and *font* (typep (font-metrics *font*) 'ttu-font-metrics)) > (loop for c across string do > - (let* ((code (char-code c)) > - (hi (ldb (byte 8 8) code)) > - (lo (ldb (byte 8 0) code))) > > - (write-char (code-char hi) *page-stream*) > - (write-char (code-char lo) *page-stream*))) > + (let* ((code (char-code c)) > + (hi (ldb (byte 8 8) code)) > + (lo (ldb (byte 8 0) code)) > > + (is-bracket (or (eql c #\() (eql c #\))))) > + (if is-bracket (write-char #\\ *page-stream*)) > + (write-char (code-char hi) *page-stream*) > + (if is-bracket (write-char #\\ *page-stream*)) > > + (write-char (code-char lo) *page-stream*))) > (princ string *page-stream*)) > (write-string ") " *page-stream*)) > > Hi Andrey, Thanks for this. IIRC, I corrected this in revision 152 (october 14). Is this fix a complement to that fix or an alternative fix? Marc From archimag at gmail.com Thu Jan 31 08:50:23 2008 From: archimag at gmail.com (Andrey Moskvitin) Date: Thu, 31 Jan 2008 08:50:23 +0000 Subject: [cl-pdf-devel] Unicode issue In-Reply-To: <4793C5F0.1030503@fractalconcept.com> References: <4793C5F0.1030503@fractalconcept.com> Message-ID: Hi Marc I have the problem with unbalanced parentheses when use cl-pdf library (in revision 159). This problem is solved in the cl-typesetting library (stroke.lisp): (when (find char "\\()" :test #'char=) (push #\\ string)) No, that decision is wrong. The problem is in the cl-pdf library and should be resolved there. 2008/1/20, Marc Battyani : > > Andrey Moskvitin wrote: > > >>/ The example uses a unicode font and simply puts a string into the > > />>/ output document which contains #\Space #\( #\) and a few other > > />>/ characters. Both parenthesis and space are needed to produce the > wrong > > > > />>/ pdf./ > > > > > Unfortunately, the unicode integration is still in an alpha state as > > > I've never found the time to continue it :( > > > I will look at your example to see if I can make a fix for it. > > > > > > I encountered the same problem with unbalanced parentheses in text. > > > > Sample code (Gentoo Linux, sbcl-1.0.12): > > > > (pdf:load-ttu-font #P"/usr/local/fonts/arial.ufm" > #P"/usr/local/fonts/arial.ttf") > > > > > > (with-open-file (out #P"/tmp/bad.pdf" :direction :output :if-exists > :supersede :element-type :default :external-format :latin-1) > > (pdf:with-document () > > (pdf:with-page () > > (pdf:in-text-mode > > > > (pdf:move-text 20 20) > > (pdf:set-font (pdf:get-font "ArialMT") 12) > > (pdf::show-text "hello ("))) > > (pdf:write-document out))) > > > > The problem in the function of write-cid-string. I change it and solved > the problem: > > > > > > --- /usr/share/common-lisp/source/cl-pdf/pdf-base.lisp 2007-10-10 > 12:50:45.000000000 +0000 > > +++ pdf-base.lisp 2007-12-29 16:38:18.000000000 +0000 > > @@ -25,11 +25,14 @@ > > (write-char #\( *page-stream*) > > > > (if (and *font* (typep (font-metrics *font*) 'ttu-font-metrics)) > > (loop for c across string do > > - (let* ((code (char-code c)) > > - (hi (ldb (byte 8 8) code)) > > - (lo (ldb (byte 8 0) code))) > > > > - (write-char (code-char hi) *page-stream*) > > - (write-char (code-char lo) *page-stream*))) > > + (let* ((code (char-code c)) > > + (hi (ldb (byte 8 8) code)) > > + (lo (ldb (byte 8 0) code)) > > > > + (is-bracket (or (eql c #\() (eql c #\))))) > > + (if is-bracket (write-char #\\ *page-stream*)) > > + (write-char (code-char hi) *page-stream*) > > + (if is-bracket (write-char #\\ *page-stream*)) > > > > + (write-char (code-char lo) *page-stream*))) > > (princ string *page-stream*)) > > (write-string ") " *page-stream*)) > > > > > Hi Andrey, > > Thanks for this. > IIRC, I corrected this in revision 152 (october 14). Is this fix a > complement to that fix or an alternative fix? > > Marc > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.battyani at fractalconcept.com Thu Jan 31 09:24:20 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Thu, 31 Jan 2008 10:24:20 +0100 Subject: [cl-pdf-devel] Unicode issue In-Reply-To: References: <4793C5F0.1030503@fractalconcept.com> Message-ID: <47A193C4.2020604@fractalconcept.com> Hi Andrey, I'm sorry but are you sure to have the current version? The fixes were in pdf/pdf-base, not in cl-typesetting and I don't find the code related to your patch in pdf-base.lisp. Marc Andrey Moskvitin wrote: > Hi Marc > > I have the problem with unbalanced parentheses when use cl-pdf library > (in revision 159). > This problem is solved in the cl-typesetting library (stroke.lisp): > (when (find char "\\()" :test #'char=) > (push #\\ string)) > No, that decision is wrong. The problem is in the cl-pdf library and > should be resolved there. > > > 2008/1/20, Marc Battyani >: > > Andrey Moskvitin wrote: > > >>/ The example uses a unicode font and simply puts a string > into the > > />>/ output document which contains #\Space #\( #\) and a few other > > />>/ characters. Both parenthesis and space are needed to > produce the wrong > > > > />>/ pdf./ > > > > > Unfortunately, the unicode integration is still in an alpha > state as > > > I've never found the time to continue it :( > > > I will look at your example to see if I can make a fix for it. > > > > > > I encountered the same problem with unbalanced parentheses in text. > > > > Sample code (Gentoo Linux, sbcl-1.0.12): > > > > (pdf:load-ttu-font #P"/usr/local/fonts/arial.ufm" > #P"/usr/local/fonts/arial.ttf") > > > > > > (with-open-file (out #P"/tmp/bad.pdf" :direction :output > :if-exists :supersede :element-type :default :external-format > :latin-1) > > (pdf:with-document () > > (pdf:with-page () > > (pdf:in-text-mode > > > > (pdf:move-text 20 20) > > (pdf:set-font (pdf:get-font "ArialMT") 12) > > (pdf::show-text "hello ("))) > > (pdf:write-document out))) > > > > The problem in the function of write-cid-string. I change it and > solved the problem: > > > > > > --- > /usr/share/common-lisp/source/cl-pdf/pdf-base.lisp 2007-10-10 > 12:50:45.000000000 +0000 > > +++ pdf-base.lisp 2007-12-29 16:38:18.000000000 +0000 > > @@ -25,11 +25,14 @@ > > (write-char #\( *page-stream*) > > > > (if (and *font* (typep (font-metrics *font*) 'ttu-font-metrics)) > > (loop for c across string do > > - (let* ((code (char-code c)) > > - (hi (ldb (byte 8 8) code)) > > - (lo (ldb (byte 8 0) code))) > > > > - (write-char (code-char hi) *page-stream*) > > - (write-char (code-char lo) *page-stream*))) > > + (let* ((code (char-code c)) > > + (hi (ldb (byte 8 8) code)) > > + (lo (ldb (byte 8 0) code)) > > > > + (is-bracket (or (eql c #\() (eql c #\))))) > > + (if is-bracket (write-char #\\ *page-stream*)) > > + (write-char (code-char hi) *page-stream*) > > + (if is-bracket (write-char #\\ *page-stream*)) > > > > + (write-char (code-char lo) *page-stream*))) > > (princ string *page-stream*)) > > (write-string ") " *page-stream*)) > > > > > Hi Andrey, > > Thanks for this. > IIRC, I corrected this in revision 152 (october 14). Is this fix a > complement to that fix or an alternative fix? > > Marc > > > > ------------------------------------------------------------------------ > > _______________________________________________ > cl-pdf-devel site list > cl-pdf-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-pdf-devel