From lohner.roland at gmail.com Tue Feb 5 18:22:11 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Tue, 5 Feb 2008 19:22:11 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters Message-ID: Dear List, I use cl-pdf with sbcl and latin-2 encoded fonts. A serious problem occurs with 4 hungarian characters. Instead of these characters some other characters are rendered in the document. These are: #\LATIN_CAPITAL_LETTER_O_WITH_DOUBLE_ACUTE (?) ---> P #\LATIN_SMALL_LETTER_O_WITH_DOUBLE_ACUTE (?) ----> Q #\LATIN_CAPITAL_LETTER_U_WITH_DOUBLE_ACUTE (?) ----> p #\LATIN_SMALL_LETTER_U_WITH_DOUBLE_ACUTE (?) -----> q The reason is, that 'write-to-page ((string string) (encoding single-byte-encoding) &optional escape) writes in a character stream, without setting the character encoding to latin-2. So these 4 hungarian characters are not mapped properly to their single byte representation in latin-2 and only the lower byte of the character codes are written in the document. 0x150 -> 0x50 (P) 0x151 -> 0x51 (Q) 0x170 -> 0x70 (p) 0x171 -> 0x71 (q) An svn diff is included to this mail. It solves the problem via use of function 'char-external-code and definition of *latin-2-charset*. This patch solves my problem. Please take a look at the attached patch. Unless there are objections or suggestions, Attila Lendvai will commit it eventually. Regards, Roland Lohner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sbcl-latin2-hun.svndiff Type: application/octet-stream Size: 2609 bytes Desc: not available URL: From divanov at aha.ru Wed Feb 6 07:41:40 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Wed, 6 Feb 2008 10:41:40 +0300 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters References: Message-ID: <001f01c86893$b941fd50$8100a8c0@digo> Hello Roland, | I use cl-pdf with sbcl and latin-2 encoded fonts. | A serious problem occurs with 4 hungarian characters. | ...snip...| | An svn diff is included to this mail. It solves the problem via use of | function 'char-external-code and definition of *latin-2-charset*. This | patch solves my problem. Please take a look at the attached patch. As *latin-2-encoding* is an instance of custom-encoding, not just single-byte-encoding, your approach is not quite universal. The proposed method on charset (defmethod charset ((encoding (eql *latin-2-encoding*))) *latin-2-charset*) displaces the "standard value" :latin-2, which is stored in the charset slot and seems to work fine for the others. So your proposal could potentially defeat other Lisp implementations. Just expanding the definition of char-external-code for SBCL would be a better solution. I feel that that should be enough but do not know how to do that. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From lohner.roland at gmail.com Wed Feb 6 14:47:03 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Wed, 6 Feb 2008 15:47:03 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters In-Reply-To: <001f01c86893$b941fd50$8100a8c0@digo> References: <001f01c86893$b941fd50$8100a8c0@digo> Message-ID: Hi Dmitriy, List, thanks for the answer. A diff of an another useful solution considering you opinion is attached. Please take a look at it. Regards, Roland Lohner As *latin-2-encoding* is an instance of custom-encoding, not just > single-byte-encoding, your approach is not quite universal. The proposed > method on charset > > (defmethod charset ((encoding (eql *latin-2-encoding*))) > *latin-2-charset*) > > displaces the "standard value" :latin-2, which is stored in the charset > slot > and seems to work fine for the others. So your proposal could potentially > defeat other Lisp implementations. > > Just expanding the definition of char-external-code for SBCL would be a > better solution. I feel that that should be enough but do not know how to > do > that. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sbcl-latin2-hun.svndiff-2 Type: application/octet-stream Size: 2366 bytes Desc: not available URL: From divanov at aha.ru Wed Feb 6 15:27:58 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Wed, 6 Feb 2008 18:27:58 +0300 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters References: <001f01c86893$b941fd50$8100a8c0@digo> Message-ID: <002e01c868d4$df8cc800$8100a8c0@digo> Roland Lohner wrote on Wed, 6 Feb 2008 15:47:03 +0100 17:47: | A diff of an another useful solution considering you opinion is | attached. Please take a look at it. This looks slightly better. Though, IMHO, char-external-code should remain an ordinary function not generic. Let us introduce sbcl-char-external-code, for example: +#+sbcl + (defmethod sbcl-char-external-code ((char character) (charset (eql :latin-2))) + (let ((code (call-next-method))) + (case code + (336 213) ; #\LATIN_CAPITAL_LETTER_O_WITH_DOUBLE_ACUTE + (337 245) ; #\LATIN_SMALL_LETTER_O_WITH_DOUBLE_ACUTE + (368 219) ; #\LATIN_CAPITAL_LETTER_U_WITH_DOUBLE_ACUTE + (369 251) ; #\LATIN_SMALL_LETTER_U_WITH_DOUBLE_ACUTE + (otherwise code)))) When SBCL gets a "real" char-external-code converter function in the future, it will be easier to rewrite just this peace of code. The final question is whether this approach works for Latin-2 based languages other than Hungarian. Are you sure all they are mapping these Unicode characters similarly? -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From lohner.roland at gmail.com Fri Feb 8 11:01:47 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Fri, 8 Feb 2008 12:01:47 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters In-Reply-To: <002e01c868d4$df8cc800$8100a8c0@digo> References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> Message-ID: Dear Dmitriy, List, what's the reason for char-external-code should remain an ordinary function? It should behave differently in case of some encodings. This typically can be mapped to a generic with different methods. > > The final question is whether this approach works for Latin-2 based > languages other than Hungarian. Are you sure all they are mapping these > Unicode characters similarly? > The problem described is generally the problem of the latin-2 encoding with sbcl. It is independent of the language. The latin-2 character table contains for example the character #\LATIN_SMALL_LETTER_O_WITH_DOUBLE_ACUTE (ps: /ohungarumlaut, code: 245, "?") independently of the language. Now in cl-pdf with sbcl, this character is unaccessible. So these four characters belong to the latin-2 encoding, though as far as I know only the Hungarian language uses them. Regards, Roland Lohner -------------- next part -------------- An HTML attachment was scrubbed... URL: From divanov at aha.ru Fri Feb 8 12:20:48 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Fri, 8 Feb 2008 15:20:48 +0300 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> Message-ID: <001701c86a4d$0fb4c3b0$8100a8c0@digo> Roland Lohner wrote on Fri, 8 Feb 2008 12:01:47 +0100 14:01: | what's the reason for char-external-code should remain an ordinary | function? It should behave differently in case of some encodings. This | typically can be mapped to a generic with different methods. That is just a matter of coding style: the generics dispatch over Lisp objects, the ordinary - over Lisp implementations :-) | The problem described is generally the problem of the latin-2 encoding | with sbcl. It is independent of the language. The latin-2 character | table contains for example the character | #\LATIN_SMALL_LETTER_O_WITH_DOUBLE_ACUTE (ps: /ohungarumlaut, code: 245, | "?") independently of the language. Now in cl-pdf with sbcl, this | character is unaccessible. So these four characters belong to the | latin-2 encoding, though as far as I know only the Hungarian language | uses them. I see. I have committed the changes. Please update and test. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From lohner.roland at gmail.com Fri Feb 8 13:02:39 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Fri, 8 Feb 2008 14:02:39 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters In-Reply-To: <001701c86a4d$0fb4c3b0$8100a8c0@digo> References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> <001701c86a4d$0fb4c3b0$8100a8c0@digo> Message-ID: Hi Dmitriy, I see. I have committed the changes. Please update and test. > Thanks for committing. It seems to be OK, though you have forgotten to commit the changes of " pdf-base.lisp". I attached the diff. Regards, Roland Lohner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pdf-base.lisp.svndiff Type: application/octet-stream Size: 1292 bytes Desc: not available URL: From divanov at aha.ru Fri Feb 8 14:13:36 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Fri, 8 Feb 2008 17:13:36 +0300 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> <001701c86a4d$0fb4c3b0$8100a8c0@digo> Message-ID: <002601c86a5c$d236c780$8100a8c0@digo> Roland Lohner wrote on Fri, 8 Feb 2008 14:02:39 +0100 16:02: | Thanks for committing. | It seems to be OK, though you have forgotten to commit the changes of | "pdf-base.lisp". | | I attached the diff. I am sorry. It is there now. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From lohner.roland at gmail.com Fri Feb 8 15:36:21 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Fri, 8 Feb 2008 16:36:21 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters In-Reply-To: <002601c86a5c$d236c780$8100a8c0@digo> References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> <001701c86a4d$0fb4c3b0$8100a8c0@digo> <002601c86a5c$d236c780$8100a8c0@digo> Message-ID: OK. Thanks for your help. Sorry, one thing I forgot, get-char-metrics methods contain the function call char-external-code, as well. So font.lisp also needs to be patched. I attached the diff. Have a nice weekend! Roland Lohner I am sorry. It is there now. > -- > Sincerely, > Dmitriy Ivanov > lisp.ystok.ru > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: font.lisp.svndiff Type: application/octet-stream Size: 997 bytes Desc: not available URL: From divanov at aha.ru Fri Feb 8 16:17:53 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Fri, 8 Feb 2008 19:17:53 +0300 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> <001701c86a4d$0fb4c3b0$8100a8c0@digo> <002601c86a5c$d236c780$8100a8c0@digo> Message-ID: <003501c86a6e$2fd600c0$8100a8c0@digo> Roland Lohner wrote on Fri, 8 Feb 2008 16:36:21 +0100 18:36: | Sorry, one thing I forgot, get-char-metrics methods contain the | function call char-external-code, as well. So font.lisp also needs to | be patched. | | I attached the diff. Amended and committed now. Have fun. | Have a nice weekend! Thanks, you too. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From iso at wemba.edu.pl Fri Feb 8 16:25:10 2008 From: iso at wemba.edu.pl (Iso Asciinen) Date: Fri, 08 Feb 2008 17:25:10 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters Message-ID: <87hcgj76vt.fsf@localhost> Hi, I've added the remaining latin-2 characters. It should be correct as I used an official unicode source: # # Name: ISO 8859-2 (1987) to Unicode # Unicode version: 1.1 # Table version: 0.1 # Table format: Format A # Date: 16 January 1995 # Authors: Tim Greenwood # John H. Jenkins # # Copyright (c) 1991-1995 Unicode, Inc. All Rights reserved. -------------- next part -------------- A non-text attachment was scrubbed... Name: latin-2-charset.svndiff Type: application/octet-stream Size: 7486 bytes Desc: not available URL: From iso at wemba.edu.pl Sat Feb 9 02:36:53 2008 From: iso at wemba.edu.pl (Iso) Date: Sat, 09 Feb 2008 03:36:53 +0100 Subject: [cl-pdf-devel] Re: latin-2 + sbcl Message-ID: <877iherh2y.fsf@localhost> Actually, none of the special characters defined in *char-single-byte-codes* will work with a latin-2 encoded font. So, better omit these translations from *sbcl-latin-2-charset*. I've attached a corrected patch below. -------------- next part -------------- A non-text attachment was scrubbed... Name: latin-2-charset.svndiff Type: application/octet-stream Size: 7460 bytes Desc: not available URL: From divanov at aha.ru Sat Feb 9 12:42:20 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Sat, 9 Feb 2008 15:42:20 +0300 Subject: [cl-pdf-devel] Re: latin-2 + sbcl References: <877iherh2y.fsf@localhost> Message-ID: <002501c86b19$3abd5460$8100a8c0@digo> Iso wrote on Sat, 09 Feb 2008 03:36:53 +0100 05:36: | Actually, none of the special characters defined in | *char-single-byte-codes* will work with a latin-2 | encoded font. So, better omit these translations | from *sbcl-latin-2-charset*. | | I've attached a corrected patch below. Thanks. I have committed the change. Additionally, *sbcl-latin-2-charset* was moved to encodings.lisp. IMHO, it would be nice to split the encodings.lisp into several files and put them into encodings/ directory. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From attila.lendvai at gmail.com Sat Feb 9 13:26:15 2008 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Sat, 9 Feb 2008 14:26:15 +0100 Subject: [cl-pdf-devel] Re: latin-2 + sbcl In-Reply-To: <002501c86b19$3abd5460$8100a8c0@digo> References: <877iherh2y.fsf@localhost> <002501c86b19$3abd5460$8100a8c0@digo> Message-ID: > IMHO, it would be nice to split the encodings.lisp into several files and > put them into encodings/ directory. if someone is going to look deeper into the encoding issue, i suggest depending on babel: darcs get http://common-lisp.net/~loliveira/darcs/babel/ (or when the page is set up: http://common-lisp.net/project/babel/) it's a very flexible cross platform lib for handling various string encodings/decodings. -- attila From marc.battyani at fractalconcept.com Sun Feb 10 21:45:38 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Sun, 10 Feb 2008 22:45:38 +0100 Subject: [cl-pdf-devel] Re: latin-2 + sbcl In-Reply-To: References: <877iherh2y.fsf@localhost> <002501c86b19$3abd5460$8100a8c0@digo> Message-ID: <47AF7082.2080609@fractalconcept.com> Attila Lendvai wrote: >> IMHO, it would be nice to split the encodings.lisp into several files and >> put them into encodings/ directory. >> > > if someone is going to look deeper into the encoding issue, i suggest > depending on babel: darcs get > http://common-lisp.net/~loliveira/darcs/babel/ (or when the page is > set up: http://common-lisp.net/project/babel/) > > it's a very flexible cross platform lib for handling various string > encodings/decodings. > Well, I'm not a big fan of too many dependencies if it can be avoided. At least if you really want to use some library, it should work on the same implementations and OS as cl-pdf/typesetting and, if it's a "working in progress" library, a stable version working with cl-pdf should be easily available or put in the cl-pdf repository to avoid problems in the future. BTW I do that with all the libraries I use for my own work. I've set up an svn repository and I keep all this in sync. Marc From michaelw at foldr.org Thu Feb 14 15:41:08 2008 From: michaelw at foldr.org (Michael Weber) Date: Thu, 14 Feb 2008 16:41:08 +0100 Subject: [cl-pdf-devel] [bug] *name-counter* unbound in with-existing-document Message-ID: <25A4C171-A8D2-4F58-82F0-E7B719913016@foldr.org> Hi, attached is a small patch which binds *name-counter* in with-existing- document. Otherwise I get errors of the variable being unbound when trying to use w-e-d. Cheers, Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: cl-pdf-parser.patch Type: application/octet-stream Size: 593 bytes Desc: not available URL: -------------- next part -------------- From michaelw at foldr.org Thu Feb 14 15:58:00 2008 From: michaelw at foldr.org (Michael Weber) Date: Thu, 14 Feb 2008 16:58:00 +0100 Subject: [cl-pdf-devel] SBCL and :pdf-binary Message-ID: <6D5BED31-1921-407D-98D2-CB47FB6CE68C@foldr.org> Hi, Some time ago, I ran into problem due to the way cl-pdf uses bivalent streams. Unfortunately, I did not keep notes at the time because I was in a hurry to get something done. However, the attached patch made them go away then, and still appears to work now, with SBCL 1.0.14. IIRC, SBCL throws errors in with-page/with-existing-page due to their use of with-output-to-string, which cannot be convinced to be bivalent. I was using cl-pdf-parser (to graft content onto an existing pdf file) at the time. However, I think I remember that it also happens with some of the examples shipped with cl-pdf. Did somebody else observe something like that? Otherwise, I might go back and try to reproduce the exact error when I get some spare minutes. I am using SBCL in an utf-8 locale, if that's relevant. Cheers, Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: cl-pdf-config.patch Type: application/octet-stream Size: 481 bytes Desc: not available URL: -------------- next part -------------- From lohner.roland at gmail.com Tue Feb 19 11:42:09 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Tue, 19 Feb 2008 12:42:09 +0100 Subject: [cl-pdf-devel] latin-2 + sbcl + hungarian characters In-Reply-To: <003501c86a6e$2fd600c0$8100a8c0@digo> References: <001f01c86893$b941fd50$8100a8c0@digo> <002e01c868d4$df8cc800$8100a8c0@digo> <001701c86a4d$0fb4c3b0$8100a8c0@digo> <002601c86a5c$d236c780$8100a8c0@digo> <003501c86a6e$2fd600c0$8100a8c0@digo> Message-ID: Hi Dmitriy, List, I fabricated a much more simple solution for the problem of sbcl and charsets. It uses the function sb-ext:string-to-octets in pdf::char-external-code. So no more extra charset table is needed in cl-pdf. In addition to latin-2 I filled up the sbcl charsets for win1250 and 1251 encodings. The svndiff is attached. Please take a look at it and commit, if it's right. Sorry for realizing this charset-specific conversion ability of sbcl too late. Regards, Roland 2008/2/8, Dmitriy Ivanov : > > Roland Lohner wrote on Fri, 8 Feb 2008 16:36:21 +0100 18:36: > > | Sorry, one thing I forgot, get-char-metrics methods contain the > | function call char-external-code, as well. So font.lisp also needs to > | be patched. > | > | I attached the diff. > > Amended and committed now. Have fun. > > | Have a nice weekend! > > Thanks, you too. > -- > Sincerely, > Dmitriy Ivanov > lisp.ystok.ru > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sbcl-charsets.svndiff Type: application/octet-stream Size: 9996 bytes Desc: not available URL: From lohner.roland at gmail.com Tue Feb 19 13:36:40 2008 From: lohner.roland at gmail.com (Roland Lohner) Date: Tue, 19 Feb 2008 14:36:40 +0100 Subject: [cl-pdf-devel] deflate stream with sbcl Message-ID: Dear List, the attached patch fixes a bug which makes sbcl users unable to compress pdf streams using non unicode encodings. The problem is with the function call (sb-ext:string-to-octets string :start start :end end) in salza-deflate::string-to-octets. salza-deflate::string-to-octets in cl-pdf gets strings only which can be encoded with single byte encoding. (earlier functions map exceptional characters of the used encoding as if it was a single-byte encoding) So it should use sb-ext:string-to-octets with arguments :external-format :iso-8859-1 to avoid re-encoding with non-single-byte or "non-identity" encoding. (defult external format is: utf-8) This fix works with unicode and non-unicode encodings, as well. Please take a look at the attached patch. Unless there are objections or suggestions, Attila Lendvai will commit it eventually. Regards, Roland Lohner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sbcl-deflate-stream.svndiff Type: application/octet-stream Size: 669 bytes Desc: not available URL: From peter at gigamonkeys.com Sun Feb 24 20:57:26 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Sun, 24 Feb 2008 12:57:26 -0800 Subject: [cl-pdf-devel] Character encoding? Message-ID: <47C1DA36.7020101@gigamonkeys.com> So let's say I'm using a Lisp that uses Unicode strings. I have some strings that contain characters such as u+2018 and u+2019 (i.e. curly quotes). If I'm using a non-Unicode font there are no font metrics for those code points and I get an array index error if I try to render those strings. But if I translate those to the corresponding cp-1252 code points. On the other hand if I *am* using a Unicode font, I want to leave the strings alone. Would it make sense for cl-pdf, knowing what font is being used at the moment, to do this translation for me or not, as needed? If so, where's the best place for that to happen? In PUT-STRING? -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From peter at gigamonkeys.com Sun Feb 24 21:00:07 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Sun, 24 Feb 2008 13:00:07 -0800 Subject: [cl-pdf-devel] Character encoding? In-Reply-To: <47C1DA36.7020101@gigamonkeys.com> References: <47C1DA36.7020101@gigamonkeys.com> Message-ID: <47C1DAD7.9050606@gigamonkeys.com> Bah, this is probably really a cl-typesetting question. Peter Seibel wrote: > So let's say I'm using a Lisp that uses Unicode strings. I have some > strings that contain characters such as u+2018 and u+2019 (i.e. curly > quotes). If I'm using a non-Unicode font there are no font metrics for > those code points and I get an array index error if I try to render > those strings. But if I translate those to the corresponding cp-1252 > code points. On the other hand if I *am* using a Unicode font, I want to > leave the strings alone. Would it make sense for cl-pdf, knowing what > font is being used at the moment, to do this translation for me or not, > as needed? If so, where's the best place for that to happen? In PUT-STRING? -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/