[cl-pdf-devel] Embedding several PDF files in one document

Marc Battyani marc.battyani at fractalconcept.com
Fri Apr 23 16:03:22 UTC 2004


"Arthur Lemmens" <alemmens at xs4all.nl> wrote:

> Marc Battyani wrote:
>
> > But in the PDF case the grammar is very simple and a parser
> > generator is overkill :)
>
> Yes, I'm beginning to see that now. The lexical structure seems
> quite simple, too.
>
> But I'm getting the impression that you can't just parse a PDF file
> from start to end; in general, you have to read the cross-reference
> table (at the end of the file) first and use random-access to parse
> arbitrary objects in the file.

Yes my parser does all that. I read/parse a pdf file and then I can write to
it, add pages etc.

> I think this is necessary because of the way they specified streams.
> A 'stream' starts with a dictionary which specifies the length of
> the stream. After the dictionary comes the "stream" keyword, followed
> by the contents of the stream, followed by the "endstream" keyword.

Streams are not a problem because you have their size so it's just a
read-sequence.

> Now, I think you can't just parse the stream by reading lines
> until you see the "endstream" keyword. After all, a line starting
> with "endstream" could be just a part of the stream contents. So
> you have to use the length that's specified in the dictionary
> that's in front of the stream. But the length can be specified
> by an indirect object reference; to resolve that reference you
> may need an object that's located after the stream contents.
>
> Do you agree with this, or am I making things more complicated
> than necessary?

Yes, you need all that but it's not too difficult. (my parser has less than
300 lines IIRC)

Marc





More information about the cl-pdf-devel mailing list