From peter at gigamonkeys.com Sun Feb 24 21:00:07 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Sun, 24 Feb 2008 13:00:07 -0800 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <47C1DA36.7020101@gigamonkeys.com> References: <47C1DA36.7020101@gigamonkeys.com> Message-ID: <47C1DAD7.9050606@gigamonkeys.com> Bah, this is probably really a cl-typesetting question. Peter Seibel wrote: > So let's say I'm using a Lisp that uses Unicode strings. I have some > strings that contain characters such as u+2018 and u+2019 (i.e. curly > quotes). If I'm using a non-Unicode font there are no font metrics for > those code points and I get an array index error if I try to render > those strings. But if I translate those to the corresponding cp-1252 > code points. On the other hand if I *am* using a Unicode font, I want to > leave the strings alone. Would it make sense for cl-pdf, knowing what > font is being used at the moment, to do this translation for me or not, > as needed? If so, where's the best place for that to happen? In PUT-STRING? -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From peter at gigamonkeys.com Mon Feb 25 06:30:43 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Sun, 24 Feb 2008 22:30:43 -0800 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <47C1DAD7.9050606@gigamonkeys.com> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com> Message-ID: <47C26093.7070102@gigamonkeys.com> Peter Seibel wrote: > > Bah, this is probably really a cl-typesetting question. > > > Peter Seibel wrote: >> So let's say I'm using a Lisp that uses Unicode strings. I have some >> strings that contain characters such as u+2018 and u+2019 (i.e. curly >> quotes). If I'm using a non-Unicode font there are no font metrics for >> those code points and I get an array index error if I try to render >> those strings. But if I translate those to the corresponding cp-1252 >> code points. On the other hand if I *am* using a Unicode font, I want >> to leave the strings alone. Would it make sense for cl-pdf, knowing >> what font is being used at the moment, to do this translation for me >> or not, as needed? If so, where's the best place for that to happen? >> In PUT-STRING? So here's a patch that fixes my problem. I don't think it's really quite right--for one thing it just assumes that the Lisp is using Unicode strings which may not always be true. And it probably needs to be filled out with methods for other encodings. Plus I'm not at all sure that there isn't a much better place already in the code base to do this--I looked some at the encodings.lisp but couldn't quite figure out where those were used. But my basic point is that cl-typesetting and/or cl-pdf should know what encoding the Lisp is using (i.e. how should one interpret the values returned by CHAR-CODE) and should know how to map those to the numeric values used as indices into fonts, at least for the case where CHAR-CODE returns Unicode code points and the fonts are something well understood like Unicode and CP1252. (Are the others? MacRoman?) -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From peter at gigamonkeys.com Mon Feb 25 06:31:33 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Sun, 24 Feb 2008 22:31:33 -0800 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <47C1DAD7.9050606@gigamonkeys.com> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com> Message-ID: <47C260C5.5080202@gigamonkeys.com> Okay, here's the patch mentioned in my previous message: diff -r 81aa27926362 cl-typesetting/typo.lisp --- a/cl-typesetting/typo.lisp Sat Feb 23 21:17:20 2008 -0800 +++ b/cl-typesetting/typo.lisp Sun Feb 24 22:25:44 2008 -0800 @@ -151,6 +151,7 @@ (defclass text-line (hbox) ()) + (defun make-char-box (char) (if *use-exact-char-boxes* (multiple-value-bind (width ascender descender) (pdf:get-char-size char *font* *font-size*) @@ -225,12 +226,58 @@ (defun white-char-p (char) (find char *white-chars*)) +(defgeneric convert-code-point (code-point font-encoding lisp-encoding)) + +(defmethod convert-code-point (code-point font-encoding lisp-encoding) + (unless (eql font-encoding lisp-encoding) + (warn "Don't know how to convert ~a code points to ~a. Using identity." lisp-encoding font-encoding)) + code-point) + +(defmethod convert-code-point (code-point (font-encoding (eql :win-ansi-encoding)) (lisp-encoding (eql :unicode-encoding))) + (case code-point + (338 140) ; #\LATIN_CAPITAL_LIGATURE_OE + (339 156) ; #\LATIN_SMALL_LIGATURE_OE + (352 138) ; #\LATIN_CAPITAL_LETTER_S_WITH_CARON + (353 154) ; #\LATIN_SMALL_LETTER_S_WITH_CARON + (376 159) ; #\LATIN_CAPITAL_LETTER_Y_WITH_DIAERESIS + (381 142) ; #\LATIN_CAPITAL_LETTER_Z_WITH_CARON + (382 158) ; #\LATIN_SMALL_LETTER_Z_WITH_CARON + (402 131) ; #\LATIN_SMALL_LETTER_F_WITH_HOOK + (710 136) ; #\MODIFIER_LETTER_CIRCUMFLEX_ACCENT + (732 152) ; #\SMALL_TILDE + (8211 150) ; #\EN_DASH + (8212 151) ; #\EM_DASH + (8216 145) ; #\LEFT_SINGLE_QUOTATION_MARK + (8217 146) ; #\RIGHT_SINGLE_QUOTATION_MARK + (8218 130) ; #\SINGLE_LOW-9_QUOTATION_MARK + (8220 147) ; #\LEFT_DOUBLE_QUOTATION_MARK + (8221 148) ; #\RIGHT_DOUBLE_QUOTATION_MARK + (8222 132) ; #\DOUBLE_LOW-9_QUOTATION_MARK + (8224 134) ; #\DAGGER + (8225 135) ; #\DOUBLE_DAGGER + (8226 149) ; #\BULLET + (8230 133) ; #\HORIZONTAL_ELLIPSIS + (8240 137) ; #\PER_MILLE_SIGN + (8249 139) ; #\SINGLE_LEFT-POINTING_ANGLE_QUOTATION_MARK + (8250 155) ; #\SINGLE_RIGHT-POINTING_ANGLE_QUOTATION_MARK + (8364 128) ; #\EURO_SIGN + (8482 153) ; #\TRADE_MARK_SIGN + (t code-point))) + +(defun convert-char-encoding (char) + (code-char + (convert-code-point + (char-code char) + (pdf::keyword-name (pdf::encoding *font*)) + :unicode-encoding))) + (defun put-string (string) (when (stringp string) (let ((hyphen-points (hyphenate-string string))) (loop with hyphen-point = (pop hyphen-points) for prev-char = #\I then char - for char across string + for actual-char across string + for char = (convert-char-encoding actual-char) for i from 0 for kerning = (* (pdf:get-kerning prev-char char *font* *font-size*) *text-x-scale*) do @@ -251,7 +298,8 @@ (defun verbatim (string) "put a string in a 'verbatim' way: no kerning, no hyphenation, significant whitespaces, significant newlines" (when (stringp string) - (loop for char across string + (loop for actual-char across string + for char = (convert-char-encoding actual-char) for i from 0 do (cond -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From attila.lendvai at gmail.com Mon Feb 25 11:03:15 2008 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Mon, 25 Feb 2008 12:03:15 +0100 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <47C26093.7070102@gigamonkeys.com> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com> <47C26093.7070102@gigamonkeys.com> Message-ID: > But my basic point is that cl-typesetting and/or cl-pdf should know what > encoding the Lisp is using (i.e. how should one interpret the values > returned by CHAR-CODE) and should know how to map those to the numeric the proper fix would be to refactor cl-pdf to write into binary streams and do the character encoding itself (i'd use babel, but Marc would prefer no external dependency). i've done that once (the branch is still laying around on my harddrive), but after a day of work i gave up. it produced a pdf that almost worked (the toc could display unicode text) but i made a mistake somewhere in the process and it produced corrupt files. as i don't know a bit about the pdf file format, i gave up instead of debugging it. -- attila From marc.battyani at fractalconcept.com Mon Feb 25 19:33:31 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Mon, 25 Feb 2008 20:33:31 +0100 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com> <47C26093.7070102@gigamonkeys.com> Message-ID: <47C3180B.9050403@fractalconcept.com> Attila Lendvai wrote: >> But my basic point is that cl-typesetting and/or cl-pdf should know what >> encoding the Lisp is using (i.e. how should one interpret the values >> returned by CHAR-CODE) and should know how to map those to the numeric >> > the proper fix would be to refactor cl-pdf to write into binary > streams and do the character encoding itself (i'd use babel, but Marc > would prefer no external dependency). i've done that once (the branch > is still laying around on my harddrive), but after a day of work i > gave up. it produced a pdf that almost worked (the toc could display > unicode text) but i made a mistake somewhere in the process and it > produced corrupt files. as i don't know a bit about the pdf file > format, i gave up instead of debugging it. > I think Peter is right here. It's a cl-typesetting issue and not a cl-pdf one because he wants to substitute another character that will result in the same glyph in the current selected font. So it's not an encoding problem and in fact other substitutions, such as ligatures for instance, would be useful. Marc From peter at gigamonkeys.com Mon Feb 25 23:11:52 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Mon, 25 Feb 2008 15:11:52 -0800 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <47C3180B.9050403@fractalconcept.com> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com> <47C26093.7070102@gigamonkeys.com> <47C3180B.9050403@fractalconcept.com> Message-ID: <47C34B38.1080704@gigamonkeys.com> Marc Battyani wrote: > Attila Lendvai wrote: >>> But my basic point is that cl-typesetting and/or cl-pdf should know what >>> encoding the Lisp is using (i.e. how should one interpret the values >>> returned by CHAR-CODE) and should know how to map those to the numeric >>> >> the proper fix would be to refactor cl-pdf to write into binary >> streams and do the character encoding itself (i'd use babel, but Marc >> would prefer no external dependency). i've done that once (the branch >> is still laying around on my harddrive), but after a day of work i >> gave up. it produced a pdf that almost worked (the toc could display >> unicode text) but i made a mistake somewhere in the process and it >> produced corrupt files. as i don't know a bit about the pdf file >> format, i gave up instead of debugging it. >> > I think Peter is right here. It's a cl-typesetting issue and not a > cl-pdf one because he wants to substitute another character that will > result in the same glyph in the current selected font. So it's not an > encoding problem and in fact other substitutions, such as ligatures for > instance, would be useful. So my two questions then are: 1. Is there some machinery in cl-typesetting that can/should be adapted to do this. 2. If not, is something like the patch I sent, about the right place to do it. -Peter P.S. Regarding ligatures it's a bit hairier because--as I'm sure you know--you want to map multiple characters in the string to one character in the output. You could imagine PUT-STRING and VERBATIM doing a bit of buffering between calls in order to detect sequences of characters that should be turned into a ligature in the output. Or you could semi-punt and say you're only going to render sequences of characters passed together in one call to those functions with ligatures. -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From peter at gigamonkeys.com Tue Feb 26 05:48:25 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Mon, 25 Feb 2008 21:48:25 -0800 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <47C3180B.9050403@fractalconcept.com> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com> <47C26093.7070102@gigamonkeys.com> <47C3180B.9050403@fractalconcept.com> Message-ID: <47C3A829.3030709@gigamonkeys.com> Marc Battyani wrote: > Attila Lendvai wrote: >>> But my basic point is that cl-typesetting and/or cl-pdf should know what >>> encoding the Lisp is using (i.e. how should one interpret the values >>> returned by CHAR-CODE) and should know how to map those to the numeric >>> >> the proper fix would be to refactor cl-pdf to write into binary >> streams and do the character encoding itself (i'd use babel, but Marc >> would prefer no external dependency). i've done that once (the branch >> is still laying around on my harddrive), but after a day of work i >> gave up. it produced a pdf that almost worked (the toc could display >> unicode text) but i made a mistake somewhere in the process and it >> produced corrupt files. as i don't know a bit about the pdf file >> format, i gave up instead of debugging it. >> > I think Peter is right here. It's a cl-typesetting issue and not a > cl-pdf one because he wants to substitute another character that will > result in the same glyph in the current selected font. So it's not an > encoding problem and in fact other substitutions, such as ligatures for > instance, would be useful. Actually I now I'm thinking it *is* a cl-pdf issue. Assume for the moment that I'm using a Unicode lisp. (I.e. one whose CHAR-CODE returns Unicode code points.) I should be able to use cl-pdf directly and have it render properly. That PDF under the covers encodes characters using octets that are really indices into an array that's part of the font should not be an issue I have to deal with. And looking a bit at the cl-pdf code I see something that looks like it's sort of trying to do this--CHAR-EXTERNAL-CODE. But it also seems that that isn't always called. (For instance, never on SBCL in GET-CHAR-METRICS). Maybe it should be. Then if cl-pdf assumes that all characters and strings it gets are or are made up of Unicode characters, then it seems there are a just a few places where it can convert the Unicode code-point to a code-point that can be used with the current font: get-char-metrics, show-char, and show-text may be it but I haven't done a careful check. It's a bit hinky that under the covers cl-pdf just converts them to different Lisp characters and then counts on using an 8-bit clean character encoding when writing the file but that's just an implementation detail at a level below the one I'm talking about. Obviously we made this change to cl-pdf then cl-typesetting wouldn't have to worry about it at all. Finally, to generalize a bit, for Lisps that don't use Unicode code points, cl-pdf should likewise know how to map from whatever character encoding they do use to the encoding used within a PDF file. -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From divanov at aha.ru Tue Feb 26 08:30:11 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Tue, 26 Feb 2008 11:30:11 +0300 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com><47C26093.7070102@gigamonkeys.com><47C3180B.9050403@fractalconcept.com> <47C3A829.3030709@gigamonkeys.com> Message-ID: <000001c87851$d493f6d0$8100a8c0@digo> Peter Seibel wrote on Mon, 25 Feb 2008 21:48:25 -0800 08:48: | Actually I now I'm thinking it *is* a cl-pdf issue. Assume for the | moment that I'm using a Unicode lisp. (I.e. one whose CHAR-CODE returns | Unicode code points.) I should be able to use cl-pdf directly and have | it render properly. That PDF under the covers encodes characters using | octets that are really indices into an array that's part of the font | should not be an issue I have to deal with. Relying on an encoding only does not suffice. An additional notion of _charset_ is for this. | And looking a bit at the cl-pdf code I see something that looks like | it's sort of trying to do this--CHAR-EXTERNAL-CODE. But it also seems | that that isn't always called. (For instance, never on SBCL in | GET-CHAR-METRICS). Maybe it should be. | | Then if cl-pdf assumes that all characters and strings it gets are or | are made up of Unicode characters, then it seems there are a just a few | places where it can convert the Unicode code-point to a code-point that | can be used with the current font: get-char-metrics, show-char, and | show-text may be it but I haven't done a careful check. It's a bit | hinky that under the covers cl-pdf just converts them to different Lisp | characters and then counts on using an 8-bit clean character encoding | when writing the file but that's just an implementation detail at a | level below the one I'm talking about. | | Obviously we made this change to cl-pdf then cl-typesetting wouldn't | have to worry about it at all. In the latest revision, get-char-metrics and write-to-page do call char-external-code on SBCL. As far as I can guess, SBCL itself lacks an internal machinery for implementing char-external-code. | Finally, to generalize a bit, for Lisps that don't use Unicode code | points, cl-pdf should likewise know how to map from whatever character | encoding they do use to the encoding used within a PDF file. I hope that an internal counterpart of char-external-code should do this on a specific non-Unicode Lisp implementation. The guys who are using such a Lisp should verify the hypothesis and contribute source code. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From peter at gigamonkeys.com Tue Feb 26 17:58:03 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Tue, 26 Feb 2008 09:58:03 -0800 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <000001c87851$d493f6d0$8100a8c0@digo> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com><47C26093.7070102@gigamonkeys.com><47C3180B.9050403@fractalconcept.com> <47C3A829.3030709@gigamonkeys.com> <000001c87851$d493f6d0$8100a8c0@digo> Message-ID: <47C4532B.7030700@gigamonkeys.com> Dmitriy Ivanov wrote: > Peter Seibel wrote on Mon, 25 Feb 2008 21:48:25 -0800 08:48: > > | Actually I now I'm thinking it *is* a cl-pdf issue. Assume for the > | moment that I'm using a Unicode lisp. (I.e. one whose CHAR-CODE returns > | Unicode code points.) I should be able to use cl-pdf directly and have > | it render properly. That PDF under the covers encodes characters using > | octets that are really indices into an array that's part of the font > | should not be an issue I have to deal with. > > Relying on an encoding only does not suffice. An additional notion of > _charset_ is for this. Okay. What's a charset then? Can you perhaps lay out a quick map of the concepts and related terms as used in the cl-pdf/cl-typesetting source? > | And looking a bit at the cl-pdf code I see something that looks like > | it's sort of trying to do this--CHAR-EXTERNAL-CODE. But it also seems > | that that isn't always called. (For instance, never on SBCL in > | GET-CHAR-METRICS). Maybe it should be. > | > | Then if cl-pdf assumes that all characters and strings it gets are or > | are made up of Unicode characters, then it seems there are a just a few > | places where it can convert the Unicode code-point to a code-point that > | can be used with the current font: get-char-metrics, show-char, and > | show-text may be it but I haven't done a careful check. It's a bit > | hinky that under the covers cl-pdf just converts them to different Lisp > | characters and then counts on using an 8-bit clean character encoding > | when writing the file but that's just an implementation detail at a > | level below the one I'm talking about. > | > | Obviously we made this change to cl-pdf then cl-typesetting wouldn't > | have to worry about it at all. > > In the latest revision, get-char-metrics and write-to-page do call > char-external-code on SBCL. As far as I can guess, SBCL itself lacks an > internal machinery for implementing char-external-code. Okay, so it seems that my immediate problem would be fixed fairly simply by applying the attached patch which augments *char-single-byte-codes* to include mappings for all the characters that exist in cp-1252 but with different numeric values than the corresponding Unicode code points. I'm not sure that that variable, or rather the way it is used, is actually 100% right. For instance a single-byte font that uses an encoding (or whatever you want to call it) other than cp-1252 probably needs a different mapping. Practically speaking, such fonts may simply not exist. The Unicode folks provide a set of mappings here: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/ which is where I got the information about Unicode -> CP-1252. -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch URL: From peter at gigamonkeys.com Tue Feb 26 21:50:08 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Tue, 26 Feb 2008 13:50:08 -0800 Subject: [cl-typesetting-devel] patch for spacing after punctuation Message-ID: <47C48990.6040507@gigamonkeys.com> I can't find any good typographic references on the web but this patch produces output that looks better to my eye. Without it the spacing after #\? is way too wide, ISTM. -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: space-after-punctuation.patch Type: text/x-patch Size: 5631 bytes Desc: not available URL: From peter at gigamonkeys.com Tue Feb 26 22:04:45 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Tue, 26 Feb 2008 14:04:45 -0800 Subject: [cl-typesetting-devel] patch for spacing after punctuation In-Reply-To: <47C48990.6040507@gigamonkeys.com> References: <47C48990.6040507@gigamonkeys.com> Message-ID: <47C48CFD.9040006@gigamonkeys.com> Bah. That patch had way too much stuff in it. Try this one. -Peter Peter Seibel wrote: > I can't find any good typographic references on the web but this patch > produces output that looks better to my eye. Without it the spacing > after #\? is way too wide, ISTM. > > -Peter > > > ------------------------------------------------------------------------ > > _______________________________________________ > cl-typesetting-devel site list > cl-typesetting-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-typesetting-devel -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: fixed-space-after-punctuation.patch Type: text/x-patch Size: 631 bytes Desc: not available URL: From divanov at aha.ru Wed Feb 27 11:30:21 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Wed, 27 Feb 2008 14:30:21 +0300 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com><47C26093.7070102@gigamonkeys.com><47C3180B.9050403@fractalconcept.com> <47C3A829.3030709@gigamonkeys.com> <000001c87851$d493f6d0$8100a8c0@digo> <47C4532B.7030700@gigamonkeys.com> Message-ID: <001e01c87934$2a7a0840$8100a8c0@digo> Peter Seibel wrote on Tue, 26 Feb 2008 09:58:03 -0800 20:58: |> Relying on an encoding only does not suffice. An additional notion of |> _charset_ is for this. | | Okay. What's a charset then? Can you perhaps lay out a quick map of the | concepts and related terms as used in the cl-pdf/cl-typesetting source? I have provided some explanation in my post "Mapping useful Unicode characters to single-byte-encoding" recently. In brief, charset is either an atom passed to some implementation-dependent converter a la CHAR-EXTERNAL-CODE or an alist used to retrieve corresponding codes via assoc. The charset "value" is returned by the charset generics applied to an encoding object. |> In the latest revision, get-char-metrics and write-to-page do call |> char-external-code on SBCL. As far as I can guess, SBCL itself lacks |> an internal machinery for implementing char-external-code. | | Okay, so it seems that my immediate problem would be fixed fairly | simply by applying the attached patch which augments | *char-single-byte-codes* to include mappings for all the characters | that exist in cp-1252 but with different numeric values than the | corresponding Unicode code points. Yes, it should work for you in its simplest. But see below... | I'm not sure that that variable, or rather the way it is used, is | actually 100% right. For instance a single-byte font that uses an | encoding (or whatever you want to call it) other than cp-1252 probably | needs a different mapping. Practically speaking, such fonts may simply | not exist. The Unicode folks provide a set of mappings here: | | ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/ | | which is where I got the information about Unicode -> CP-1252. The better solution would be introducing a custom encoding, named for example, "Win1252Encoding", and specifying a charset for it as follows. #+sbcl (defparameter *sbcl-win-1252-charset* (append '((#.(code-char #x0152) . #x8C) ; LATIN_CAPITAL_LIGATURE_OE (#.(code-char #x0153) . #x9C) ; LATIN_SMALL_LIGATURE_OE ...) *char-single-byte-codes*)) (defparameter *win-1252-encoding* (make-instance 'pdf::custom-encoding :name "Win1252Encoding" :keyword-name :win-1252-encoding :base-encoding :standard-encoding :charset #-sbcl :1252 #+sbcl *sbcl-win-1252-charset* ...) Then, (setf *default-encoding* *win-1252-encoding*) or specify it explicitly in get-font calls and so on. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From peter at gigamonkeys.com Thu Feb 28 23:39:35 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Thu, 28 Feb 2008 15:39:35 -0800 Subject: [cl-typesetting-devel] Is there an easy way to find all the ref-points in a page? Message-ID: <47C74637.8040200@gigamonkeys.com> Suppose I want to write a page finalization function (for use with tt:draw-pages) that adds some content based on references (made with mark-ref-point) that occur in the content of the page. Is there any easy way to find them? -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From peter at gigamonkeys.com Fri Feb 29 04:24:03 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Thu, 28 Feb 2008 20:24:03 -0800 Subject: [cl-typesetting-devel] Layout glitch Message-ID: <47C788E3.4010307@gigamonkeys.com> When setting justified text, cl-typesetting tries to justify the last line of a paragraph. That seems contrary to normal typographic style as it results in extremely loose lines. This code produces the attached output which shows the problem: (in-package :tt) (defun foo (&optional (output "/tmp/foo.pdf")) (with-document () (draw-pages (compile-text () (paragraph (:h-align :justify :top-margin 12) "Short paragraph") (paragraph (:h-align :justify :top-margin 12) "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent euismod malesuada enim. Pellentesque elementum dui eget leo. Duis iaculis. Fusce malesuada, lorem quis congue pulvinar, purus mauris commodo risus, a rhoncus ante ipsum bibendum velit. Quisque nisi quam, mollis non, convallis nec, ornare sit amet, justo. Integer tincidunt, dolor vitae lacinia pellentesque, diam magna volutpat dui, et rutrum nisl tellus quis sem. Praesent suscipit tincidunt lacus. Mauris ut odio. Pellentesque in neque a urna lobortis iaculis. Aliquam id metus et ligula placerat cursus. Donec ut libero. Aliquam tempor ornare felis. Aenean convallis. Ut non lacus id urna fermentum bibendum. Aenean adipiscing bibendum pede. Nullam laoreet erat eu elit. Fusce interdum cursus dolor. Fusce nisl. Suspendisse diam libero.")) :break :after :size :Letter :margins '(72 72 72 72) :header-top 36 :footer-bottom 36) (pdf:write-document output))) -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.pdf Type: application/pdf Size: 4118 bytes Desc: not available URL: From peter at gigamonkeys.com Fri Feb 29 06:22:13 2008 From: peter at gigamonkeys.com (Peter Seibel) Date: Thu, 28 Feb 2008 22:22:13 -0800 Subject: [cl-typesetting-devel] Line breaking algorithm? Message-ID: <47C7A495.80501@gigamonkeys.com> What algorithm does cl-typesetting use for computing line breaks? I looked at fit-lines and it was a bit daunting. Is it TeX's algorithm? Something better? Something worse? Or something different and believed to be just as good? -Peter -- Peter Seibel : peter at gigamonkeys.com A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/ Practical Common Lisp : http://www.gigamonkeys.com/book/ Coders at Work : http://www.codersatwork.com/ From divanov at aha.ru Fri Feb 29 06:51:23 2008 From: divanov at aha.ru (Dmitriy Ivanov) Date: Fri, 29 Feb 2008 09:51:23 +0300 Subject: [cl-typesetting-devel] Layout glitch References: <47C788E3.4010307@gigamonkeys.com> Message-ID: <001e01c87a9f$8553ceb0$8100a8c0@digo> Peter Seibel wrote on Thu, 28 Feb 2008 20:24:03 -0800 07:24: | When setting justified text, cl-typesetting tries to justify the last | line of a paragraph. That seems contrary to normal typographic style as | it results in extremely loose lines. |...snip...| My old reply to the "Paragraph Justification" post suggested the following (about a year ago): (paragraph (:h-align :justify) ... :hfill) -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From marc.battyani at fractalconcept.com Fri Feb 29 22:28:41 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Fri, 29 Feb 2008 23:28:41 +0100 Subject: [cl-typesetting-devel] Layout glitch In-Reply-To: <47C788E3.4010307@gigamonkeys.com> References: <47C788E3.4010307@gigamonkeys.com> Message-ID: <47C88719.7020304@fractalconcept.com> Peter Seibel wrote: > When setting justified text, cl-typesetting tries to justify the last > line of a paragraph. That seems contrary to normal typographic style > as it results in extremely loose lines. > > This code produces the attached output which shows the problem: Normally, this case was handled properly when I wrote it but maybe it needs some fix/update now. I will look at it. Marc From marc.battyani at fractalconcept.com Fri Feb 29 22:33:50 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Fri, 29 Feb 2008 23:33:50 +0100 Subject: [cl-typesetting-devel] Line breaking algorithm? In-Reply-To: <47C7A495.80501@gigamonkeys.com> References: <47C7A495.80501@gigamonkeys.com> Message-ID: <47C8884E.4050200@fractalconcept.com> Peter Seibel wrote: > What algorithm does cl-typesetting use for computing line breaks? I > looked at fit-lines and it was a bit daunting. Is it TeX's algorithm? > Something better? Something worse? Or something different and believed > to be just as good? The hyphenation algorithm is the TeX's algo. The positionning on the line is an improved one. The multi-lines breaking is not implemented. I wanted to use Screamer to make a good one but could not find the time to do it. Marc From marc.battyani at fractalconcept.com Fri Feb 29 23:14:42 2008 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Sat, 01 Mar 2008 00:14:42 +0100 Subject: [cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding? In-Reply-To: <000001c87851$d493f6d0$8100a8c0@digo> References: <47C1DA36.7020101@gigamonkeys.com> <47C1DAD7.9050606@gigamonkeys.com><47C26093.7070102@gigamonkeys.com><47C3180B.9050403@fractalconcept.com> <47C3A829.3030709@gigamonkeys.com> <000001c87851$d493f6d0$8100a8c0@digo> Message-ID: <47C891E2.2090106@fractalconcept.com> Dmitriy Ivanov wrote: > Peter Seibel wrote on Mon, 25 Feb 2008 21:48:25 -0800 08:48: > > | Actually I now I'm thinking it *is* a cl-pdf issue. Assume for the > | moment that I'm using a Unicode lisp. (I.e. one whose CHAR-CODE returns > | Unicode code points.) I should be able to use cl-pdf directly and have > | it render properly. That PDF under the covers encodes characters using > | octets that are really indices into an array that's part of the font > | should not be an issue I have to deal with. > > Relying on an encoding only does not suffice. An additional notion of > _charset_ is for this Right but I still think that glyphs substitution is a cl-typesetting issue. Sure, the simple unicode to non-unicode conversion could be just an encoding issue but even that can be more complex. For instance you could gather the unicode glyphs used in a document and dynamically generate a custom non-unicode encoding that maps them to a font glyphs if you have less than 256 glyphs and they are all in the font. Doing more than just the simplest encoding conversions can only be done at the cl-typesetting level. For instance if a font lacks the euro glyph, cl-typesetting could substitute a custom cl-pdf box that draws the euro sign. There are also the ligatures, the ellipsis (... ?) etc. Marc