From marc.battyani at fractalconcept.com Tue Mar 8 08:10:34 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Tue, 8 Mar 2005 09:10:34 +0100 Subject: [cl-pdf-devel] Re: cl-pdf with sbcl on debian References: <87zmxe91zp.fsf@www.codersbase.com> Message-ID: <092901c523b6$4f789610$0a02a8c0@marcxp> "Jason Dagit" wrote: > I've been trying to get cl-pdf working on my computer under sbcl. > After I load cl-pdf (using asdf), I try to run example1 that came with > cl-pdf, and I get the following error: > > * (example1) > > debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread 32052: > encoding error on stream # {9F2A229}> > (:EXTERNAL-FORMAT :ASCII): > the character with code 11377784 cannot be encoded. It is right, a character with code 11377784 is really not ASCII. > This lead me to config.lisp, where I changed +external-format+ to the > following: > > (defconstant +external-format+ #-(or sbcl lispworks clisp allegro) > :default > #+(and allegro mswindows) (excl:crlf-base-ef :1252) ;;:1252-base > #+(and allegro unix) :default > #+lispworks '(:latin-1 :eol-style :lf) > #+sbcl :iso-8859-1 > #+clisp :unix) > > Then I reloaded cl-pdf and tried again. Now I get the same message, > except instead of :ASCII it says :LATIN-1 Same problem, Latin-1 is an 8 bit encoding. There is probably an unicode string somewhere and cl-pdf does not support it yet. Maybe some SBCL user can confirm this. Marc From divanov at aha.ru Tue Mar 8 12:57:36 2005 From: divanov at aha.ru (Dmitri Ivanov) Date: Tue, 8 Mar 2005 15:57:36 +0300 Subject: [cl-pdf-devel] Re: cl-pdf with sbcl on debian References: <87zmxe91zp.fsf@www.codersbase.com> <092901c523b6$4f789610$0a02a8c0@marcxp> Message-ID: <000501c523de$8cebc260$764e02c3@digo> Hello Jason, | "Jason Dagit" wrote: |...snip...| |> debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread 32052: |> encoding error on stream # {9F2A229}> |> (:EXTERNAL-FORMAT :ASCII): |> the character with code 11377784 cannot be encoded. | | It is right, a character with code 11377784 is really not ASCII. |...snip...| | There is probably an unicode string somewhere and cl-pdf does not | support it yet. | | Maybe some SBCL user can confirm this. Though I am an LW user, not SBCL user, but I can confirm this. To deal with Unicode characters in CL-PDF now, you have to convert them to 8-bit encoding manually and accompany by a corresponding Type1 font. For an example how to combat Unicode in LW, please take a look at http://lisp.ystok.ru/cl-pdf.html . -- Sincerely, Dmitri Ivanov lisp.ystok.ru From marc.battyani at fractalconcept.com Tue Mar 8 14:20:49 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Tue, 8 Mar 2005 15:20:49 +0100 Subject: [cl-pdf-devel] Re: cl-pdf with sbcl on debian References: <87zmxe91zp.fsf@www.codersbase.com><092901c523b6$4f789610$0a02a8c0@marcxp> <000501c523de$8cebc260$764e02c3@digo> Message-ID: <0a9801c523ea$08346ca0$0a02a8c0@marcxp> "Dmitri Ivanov" wrote: Hi Dmitri, > Though I am an LW user, not SBCL user, but I can confirm this. To deal with > Unicode characters in CL-PDF now, you have to convert them to 8-bit encoding > manually and accompany by a corresponding Type1 font. > > For an example how to combat Unicode in LW, please take a look at http://lisp.ystok.ru/cl-pdf.html I've just have a look at this and the PNG image support and IMO most of the modifications in di-contrib are already in the main cl-pdf. ;-) (except the modifs for unicode strings) Marc From divanov at aha.ru Wed Mar 9 06:27:47 2005 From: divanov at aha.ru (Dmitri Ivanov) Date: Wed, 9 Mar 2005 09:27:47 +0300 Subject: [cl-pdf-devel] Re: cl-pdf with sbcl on debian References: <87zmxe91zp.fsf@www.codersbase.com><092901c523b6$4f789610$0a02a8c0@marcxp> <000501c523de$8cebc260$764e02c3@digo> <0a9801c523ea$08346ca0$0a02a8c0@marcxp> Message-ID: <001b01c5247f$8cf96bd0$465802c3@digo> Hello Marc, | "Dmitri Ivanov" wrote: | |> For an example how to combat Unicode in LW, please take |> a look at http://lisp.ystok.ru/cl-pdf.html | | I've just have a look at this and the PNG image support and IMO most of | the modifications in di-contrib are already in the main cl-pdf. ;-) | (except the modifs for unicode strings) Sorry, the page is a bit obsolete. A few words about flaws in my current Unicode code. After converting text to Windows-1251 encoding, I am not able (1) to search PDF documents for a national text string using AcroReader's Find dialog, (2) to copy and paste text from AcroReader to Windows applications. I have made an attempt to embed a CMAP stuff (is it the right approach?), not too hard to tell the truth, but failed. -- Sincerely, Dmitri Ivanov lisp.ystok.ru From marc.battyani at fractalconcept.com Thu Mar 10 09:05:01 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Thu, 10 Mar 2005 10:05:01 +0100 Subject: [cl-pdf-devel] Re: cl-pdf with sbcl on debian References: <87zmxe91zp.fsf@www.codersbase.com><092901c523b6$4f789610$0a02a8c0@marcxp> <000501c523de$8cebc260$764e02c3@digo> <0a9801c523ea$08346ca0$0a02a8c0@marcxp> <001b01c5247f$8cf96bd0$465802c3@digo> Message-ID: <035001c52550$3e7bcf80$0b02a8c0@marcxp> "Dmitri Ivanov" wrote: > A few words about flaws in my current Unicode code. After converting text to > Windows-1251 encoding, I am not able > (1) to search PDF documents for a national text string using AcroReader's > Find dialog, > (2) to copy and paste text from AcroReader to Windows applications. > > I have made an attempt to embed a CMAP stuff (is it the right approach?), > not too hard to tell the truth, but failed. As I understand it, it's the right way to do it. But the lack of text selection is not only related to unicode, it's the same on latin-1 strings. Marc From marc.battyani at fractalconcept.com Wed Mar 16 11:29:49 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Wed, 16 Mar 2005 12:29:49 +0100 Subject: [cl-pdf-devel] cl-pdf now uses Zach beane's zlib implementation in Lisp (salza) References: <074c01c503f3$13b432c0$0a02a8c0@marcxp> <08b301c5198a$52a9fb20$0a02a8c0@marcxp> Message-ID: <0f3c01c52a1b$77cdd970$0a02a8c0@marcxp> I've added Zach Beane's zlib implementation in Lisp (salza) It works only in LW (and maybe ACL) for now. You now have to choose the zlib compression used in cl-pdf.asd The code is here: http://www.fractalconcept.com:8000/public/open-source/cl-pdf/ http://www.fractalconcept.com/download/cl-pdf-current.tgz The compression is not optimized yet for LW and is 5.4 times slower than the C one for now. Modifications for other Lisp implementations are welcome. Marc From marc.battyani at fractalconcept.com Thu Mar 17 08:30:16 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Thu, 17 Mar 2005 09:30:16 +0100 Subject: [cl-pdf-devel] switching to binary format Message-ID: <00ab01c52acb$8d7f8740$0a02a8c0@marcxp> Hello, I'm considering to switch cl-pdf to a binary format to avoid all those character encoding problems. Dmitri Ivanov has already started in that direction but it's for LW only. Has anybody else done similar modifications for other Lisp implementations ? The binary format can be a problem for all the text parts written by #'format and #'write-string. Also I don't want to have a performance hit by manually converting strings to byte arrays. Any opinions on this ? Marc From divanov at aha.ru Thu Mar 17 10:29:53 2005 From: divanov at aha.ru (Dmitriy Ivanov) Date: Thu, 17 Mar 2005 13:29:53 +0300 Subject: [cl-pdf-devel] switching to binary format References: <00ab01c52acb$8d7f8740$0a02a8c0@marcxp> Message-ID: <000901c52add$379ea510$1b5802c3@digo> Hello Marc, | I'm considering to switch cl-pdf to a binary format to avoid all those | character encoding problems. Dmitri Ivanov has already started in that | direction but it's for LW only. Has anybody else done similar | modifications for other Lisp implementations ? | | The binary format can be a problem for all the text parts written by | #'format and #'write-string. Also I don't want to have a performance | hit by manually converting strings to byte arrays. | | Any opinions on this ? Good. Without Unicode support in CL-PDF, we have to convert to a code page external format anyhow. If you feel like using fli:convert-to-dynamic-foreign-string in LW, that could be not optimal. According to my experience with YSQL, its counterpart fli:convert-from-foreign-string is implemented rather inefficiently, and "manually converting byte arrays to strings" works faster. BTW, I have just updated my contribution di-pdf.lisp at lisp.ystok.ru. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From marc.battyani at fractalconcept.com Thu Mar 17 11:21:26 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Thu, 17 Mar 2005 12:21:26 +0100 Subject: [cl-pdf-devel] switching to binary format References: <00ab01c52acb$8d7f8740$0a02a8c0@marcxp> <000901c52add$379ea510$1b5802c3@digo> Message-ID: <01a501c52ae3$78e7d810$0a02a8c0@marcxp> "Dmitriy Ivanov" wrote: > | I'm considering to switch cl-pdf to a binary format to avoid all those > | character encoding problems. Dmitri Ivanov has already started in that > | direction but it's for LW only. Has anybody else done similar > | modifications for other Lisp implementations ? > | > | The binary format can be a problem for all the text parts written by > | #'format and #'write-string. Also I don't want to have a performance > | hit by manually converting strings to byte arrays. > | > | Any opinions on this ? > > Good. > > Without Unicode support in CL-PDF, we have to convert to a code page > external format anyhow. If you feel like using fli:convert-to-dynamic-foreign-string > in LW, that could be not optimal. According to my experience with YSQL, its > counterpart fli:convert-from-foreign-string is implemented rather > inefficiently, and "manually converting byte arrays to strings" works > faster. All those format conversion are tedious. Let's go back to asm! ;-) > BTW, I have just updated my contribution di-pdf.lisp at lisp.ystok.ru. OK from a first look you are still using write-string and format on the binary stream. I don't think this will work on other implementations. Marc From divanov at aha.ru Fri Mar 18 15:37:32 2005 From: divanov at aha.ru (Dmitriy Ivanov) Date: Fri, 18 Mar 2005 18:37:32 +0300 Subject: [cl-pdf-devel] switching to binary format References: <00ab01c52acb$8d7f8740$0a02a8c0@marcxp> <000901c52add$379ea510$1b5802c3@digo> <01a501c52ae3$78e7d810$0a02a8c0@marcxp> Message-ID: <00ec01c52bd0$919f3810$8e5802c3@digo> Hello Marc, |> BTW, I have just updated my contribution di-pdf.lisp at lisp.ystok.ru. | | OK from a first look you are still using write-string and format on the | binary stream. I don't think this will work on other implementations. No wonder - the code is LispWorks biased. I always - open files with :element-type '(unsigned-byte 8); - for non-base characters, invoke (write-byte (ef:char-external-code char *pdf-code-page*) *pdf-stream*)); - for base characters, invoke write-char or write-sequence, which are accepted by LispWorks bivalent streams. I suggest following these guidelines in pursuance of compatibility. 1. (deftype octet () '(unsigned-byte 8)) 2. In CL-PDF code, use only write byte or write-sequence that is always given an array of type (vector octet) as an argument. 3. To convert to (vector octet), - either introduce a kind of write-pdf-string and format-pdf functions - or use acl-compat.excl:string-to-octets explicitly. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru From marc.battyani at fractalconcept.com Fri Mar 18 17:13:31 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Fri, 18 Mar 2005 18:13:31 +0100 Subject: [cl-pdf-devel] switching to binary format References: <00ab01c52acb$8d7f8740$0a02a8c0@marcxp><000901c52add$379ea510$1b5802c3@digo><01a501c52ae3$78e7d810$0a02a8c0@marcxp> <00ec01c52bd0$919f3810$8e5802c3@digo> Message-ID: <075c01c52bdd$d09f8850$0a02a8c0@marcxp> "Dmitriy Ivanov" wrote: > |> BTW, I have just updated my contribution di-pdf.lisp at lisp.ystok.ru. > | > | OK from a first look you are still using write-string and format on the > | binary stream. I don't think this will work on other implementations. > > No wonder - the code is LispWorks biased. I always > - open files with :element-type '(unsigned-byte 8); > - for non-base characters, invoke > (write-byte (ef:char-external-code char *pdf-code-page*) > *pdf-stream*)); > > - for base characters, invoke write-char or write-sequence, which are > accepted by LispWorks bivalent streams. > > I suggest following these guidelines in pursuance of compatibility. > > 1. (deftype octet () '(unsigned-byte 8)) > > 2. In CL-PDF code, use only write byte or write-sequence that is always > given an array of type (vector octet) as an argument. > > 3. To convert to (vector octet), > - either introduce a kind of write-pdf-string and format-pdf functions > - or use acl-compat.excl:string-to-octets explicitly. Hello Dmitri, I also use LW so it's easy enough for me. But I would just prefer to avoid to break cl-pdf on every other implementation. If there already exist a portable function to convert string to octets then maybe the performance hit will not be a problem. I looked at acl-compat.excl:string-to-octets but for LW the conversion function just makes the conversion "manually" it's not an optimized function: (loop for from-index from start below end for to-index upfrom 0 do (progn (setf (aref mb-vector to-index) (char-code (aref string from-index))))) I don't know why there is a progn. Maybe I should just try and see how it works... Marc From divanov at aha.ru Fri Mar 18 19:19:37 2005 From: divanov at aha.ru (Dmitriy Ivanov) Date: Fri, 18 Mar 2005 22:19:37 +0300 Subject: [cl-pdf-devel] switching to binary format References: <00ab01c52acb$8d7f8740$0a02a8c0@marcxp><000901c52add$379ea510$1b5802c3@digo><01a501c52ae3$78e7d810$0a02a8c0@marcxp> <00ec01c52bd0$919f3810$8e5802c3@digo> <075c01c52bdd$d09f8850$0a02a8c0@marcxp> Message-ID: <00f401c52bef$c13521b0$8e5802c3@digo> Hello Marc, | I looked at acl-compat.excl:string-to-octets but for LW the conversion | function just makes the conversion "manually" it's not an optimized | function: | | (loop for from-index from start below end | for to-index upfrom 0 | do (progn | (setf (aref mb-vector to-index) | (char-code (aref string from-index))))) | | I don't know why there is a progn. Sorry, I have just forgotten then paserve does not support external formats! I am offering my version of a piece of acl-compat as a starting point. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru -------------- next part -------------- A non-text attachment was scrubbed... Name: acl-compat-lw.zip Type: application/x-zip-compressed Size: 3641 bytes Desc: not available URL: