1. PDF Overview PDF (Portable Document Format) is a structured document format. It was first released by the famous American typesetting and image processing software Adobe company in 1993 (version 1.0), and in the same year launched its corresponding support software product series AdobeAcrobat version 1.0; Adobe then revised and upgraded it In 1994, version 1.1 was released, and the supporting software product series Adobe Acrobat 2.0 and 2.1 were released. The subsequent PDF 1.2 version was released on November 27, 1996, and the corresponding support software product series Adobe Acrobat was also upgraded to version 3.0. By the end of 1997, the International Organization for Standardization had begun brewing to accept PDF as an international standard. 1. Comparison of PDF and PS PS language (PostScript language, that is, page description language) is also a de facto printing industry standard owned by Adobe. It can describe beautiful layouts and occupies a dominant position in the current printing field. PDF is developed from PS, they have almost the same ability and similar description methods in the description of pages. PDF uses the same rendering model (Imaging Model) as PS to express text and graphics. Like PS language, PDF page description instructions also draw pages by coloring selected areas. The colored areas can be outlines defined by letters, contours, lines and curves, and bitmaps. The color can be arbitrary, and any graphics on the page can be cropped into other shapes. The page is completely empty at the beginning, various instructions draw different graphics onto the page, and the new graphics are opaque, which can cover the old graphics. 2. Features of PDF The characteristics of PDF can be summarized as follows: ①Transferability. The PDF file supports two encoding methods, 7-bit ASCII code and binary code, and can be correctly transmitted in various network environments. ②Support interactive operation. PDF contains interactive objects such as interactive forms and hyperlinks. ③ Support sound and animation. ④Support random access to the content of the page, which improves the speed of various operations of the page. ⑤Support the constantly added modification method in order to facilitate a small amount of modification and improve efficiency. ⑥ Support multiple compression encoding methods, the file structure is more compact. ⑦ Font independence. The PDF file can bring its own font description information, so as to ensure the correct display of the document even if the user system lacks the required fonts. ⑧ Platform independence. The PDF file has the platform independence of software and hardware. This feature is very suitable for information exchange in network transmission, so as to avoid the trouble of garbled characters. ⑨Security control. PDF files support various levels of security control. This security control is very important to protect the copyright of electronic publications. We can set different levels of security settings according to the security requirements of various electronic publications. Second, the PDF principle structure 1. PDF file structure The file structure (ie, physical structure) of PDF includes four parts: file header, file body, cross-reference table, and file end, see Figure 1. The file body consists of a series of PDF indirect objects (IndirectObject). The cross-reference table is an address index table of an indirect object established for random access to the indirect object. The end of the file declares the address of the cross-reference table, which indicates the root object (Catalog) of the file body, and also saves security information such as encryption. 2. PDF document structure The PDF document structure is the logical organization structure of the PDF file content, which reflects the hierarchical relationship between the indirect objects in the file body. The document structure of PDF is a tree structure, as shown in Figure 2. The root node of the tree is also the root object of the PDF file. There are four subtrees below the root node: Page Tree (Pages Tree), Bookmark Tree (Outline Tree), Thread Tree (ArticleThreads) and Name Tree (NamedDestination). Among them, in the page tree, all page objects are leaf nodes of the tree, and they will inherit the attribute values ​​of the parent node as the default values ​​of their corresponding attributes. The bookmark tree organizes bookmarks (Book Mark) according to the hierarchical relationship of the tree hierarchy. The bookmark establishes the association of a book signature with the location of a specific page, which allows users to access the content of the document according to the book signature. The clue tree organizes and manages the article clues and the article bead under the clues. As for the name tree, it establishes a correspondence between a string (name) and a page area. Each leaf node in the tree stores the string and its corresponding page area, while non-leaf nodes are just an index. , So that the application can quickly access the leaf nodes. The role of the name tree is to allow other objects in the PDF file to use a string name to represent a certain page area. 3. Resources in PDF The page content (such as text, graphics, images, etc.) in the PDF is saved in the stream object (hereinafter referred to as the content stream) corresponding to the Contents keyword of the page object. Many basic objects (such as numbers, strings, etc.) are used in the content stream. These are represented by direct objects. But there are other objects (such as fonts, etc.), which are themselves represented by Dictionary objects or Stream objects, and cannot be represented by direct objects, and no indirect objects can appear in the content stream. (Otherwise, it cannot be distinguished from the data of the content itself), so these objects are named differently, and they are represented by corresponding names in the content stream. These objects represented by names are called named resources. In the page object, there is a resource key (Resources Key), which lists all the resources used in the content stream and establishes a mapping table between resource names and resource objects. The named resources in the PDF are: instruction set (Proc Set), font (Font), color space (Color Space), external objects [X Object (including Image, Form and PS Segment), etc.], extended graphics state (Extended Graphics State), pattern and user extended mark list (Property List), etc. 4. PDF page description instruction There are 60 page description instructions in the PDF. These 60 page description instructions describe a series of graphic objects on the page. These graphic objects can be roughly divided into four categories, namely path objects (Path Object), text objects (Text Object), image objects (Image Object) and external objects. They are the basic elements that make up all pages. 3. PDF file generation There are currently two ways to generate PDF files: 1. Generate PDF by printing. In other words, through a virtual PDF printer, the text and graphics of the application (such as the GDI command under Windows, the Quick-Draw command under MAC, etc.) are converted into PDF commands, and they are saved in the corresponding PDF files. , As shown in Figure 3. After installing AdobeAcrobat PDFWriter, in theory, all applications that have a printing function should be able to store the content to be printed in a PDF file. However, there are still many problems in generating PDF files in Chinese. Hanging Hooks,Sawtooth Hangers,Wall Hanging Hooks,Wall Mount Hooks Jiangmen Sunbond Houseware Manufacturing CO.,LTD , https://www.jmsunbondhw.com
Nevertheless, PDF is very different from PS. This is mainly reflected in the following aspects: â‘ PDF files can contain interactive objects, such as hyperlinks, interactive forms, etc., but PS does not. â‘¡PDF is a file structure, and PS is a programming language, therefore, PDF has a higher processing efficiency than PS. â‘¢ The strict structural definition of PDF allows random access to some of the objects in the application, while PS can only access the whole sequentially. For example, to access the 100th page in a PS file, the first 99 pages must be explained in order before the 100th page can be found, and the access to each page in the PDF is as fast. â‘£ The PDF also contains font description information such as the font size of the font, so that when the font does not exist, the font can be simulated (not a simple font replacement) to ensure the consistency of the document display.
The file header indicates the version number of the PDF specification that the file complies with, and it appears on the first line of the PDF file.
Unnamed resources are: Enc oding, Font De s c-riptor, Halftone, Function and C Map. Since unnamed resources are implicitly used, there is no need for naming.
2. Convert from PS to PDF. This is another method of generating PDF. It is the application that first publishes the content to be printed into a PS file, and then Adobe AcrobatDistiller converts the PS file into a PDF file, see Figure 4.
The two methods of generating PDF have their own advantages and disadvantages. The advantage of generating PDF by printing is that it can be closely combined with the application. From the user's point of view, it generates PDF directly from the application, but the disadvantage is due to the limitations of the GDI instruction set and Quick-Draw instruction set itself , It is difficult to generate high-precision PDF. However, although there is one more process to convert from PS to PDF, because the PS itself has a high-precision description capability, the generated PDF can reach the quality and precision of the printing level. After the PDF file is generated, users can use AcrobatReader to read and print, and can also specifically use AcrobatExchange to add a series of interactions such as page thumbnails, hyperlinks, bookmarks (or directories), comments, etc. to the PDF file Attributes. When using tools provided by Adobe to generate PDF, there are currently problems with Chinese support, such as downloading of Chinese fonts is not supported, Chinese display depends on the operating system, and so on.