A big part of the complexity of text and fonts in PDF format has a historical reason. Digital typography and PDF format have both had a long evolution. Some technical solutions were temporary, to be abandoned later for more flexible new ones.
PDF format was born to display printed content without altering its appearance. While in its first version the main focus was on the integrity of the presentation, each revision of the format has attached increasing importance to maintaining the integrity of the content and, if possible, its structure. This is particularly true for texts.
Texts are composed of characters (the basic signs that make up a script). The characters of a script are concepts that are presented with a particular form by glyphs (glyphs). Each character can (and usually does) have more than one glyph.
These glyphs are organised by groups in programs that cannot function autonomously and that serve to inform other programs how to represent the characters of texts. These programs or sets of glyphs are the digital typefaces.
In PDF format a font is a resource that is inside a special dictionary, where it is identified and the necessary data for its use in the document are provided.
In a PDF document, glyphs are displayed on each page by calling a font dictionary and a string of characters, which are interpreted as specific glyphs in that font. This is done by means of codes that establish the relationship between the set of characters and the set of glyphs in that particular font.
Embedding fonts in a PDF
When the entire character set of a font is included in the dictionary, the font is said to be embedded in full, but when only the characters that have been used and no more are included, the font is said to be embedded as a subset. When no data is embedded but an external file (which must be in the system) is referenced, the font is said to be unembedded.
Depending on how the PDF was created, the same font may be embedded several times as different subsets (which may have the same encoding or several different ones).
If a font is not embedded and is not found in the system, security mechanisms are provided in the PDF format to try to represent texts in a reasonable way.
The base or standard 14 fonts
All versions of the PDF format define the existence of 14 font variants, corresponding to 5 typefaces, which are called the 14 base or standard fonts. The basic fonts are a monospaced font (Courier), a serif font (Times), a font for mathematical expressions (Symbol) and a font with small flourishes (Zapf Dingbats):
- Courier (Courier, Courier Bold, Courier Oblique, Courier BoldOblique).
- Helvetica (Helvetica, Helvetica Bold, Helvetica Oblique, Helvetica BoldOblique).
- Times Roman (Times Roman, Times Bold, Times Italic, Times BoldItalic).
- Zapf Dingbats.
When Adobe launched the PDF format in 1993, it was thought that defining a basic set of fonts and including them in all reader programmes (which then were theirs alone) would reduce documents' size and transmission overhead. The 14 fonts were a basic set, "guaranteed to be present when Acrobat Exchange and Acrobat Reader are installed" (according to Adobe itself at the time). That is why in those days these fonts were not included inside the PDF documents.
The legacy of this practice is that there are many PDFs in which these fonts are not embedded but merely referenced by name. Ascurrently the presence of these fonts in every system is not always certain, it has become rather convenient to embed them totally or partially.
Missing or undefined glyphs (.notdef)
In the PDF format (as a legacy of PostScrip), when a text encoding points to a glyph that is not defined in a font, the place must not be left empty, but must be occupied by a special glyph called .notdef (from "[glyph] not defined") to indicate that absence.
All fonts must contain a 0 glyph, which is that .notdef glyph. Its form is not mandatory but the convention is to use the ones seen above or similar.
Main types of fonts
Nowadays, the PDF format supports only the following font formats: TrueType, Type 1, Type 3, Composite fonts (of Type 0, which are CIDFontType0 (Type 1) and CIDFontType2 (TrueType), Multiple Master and OpenType (as of PDF format level 1.6). The usual practice with other types of fonts when creating a PDF is to transform them into one of these supported fonts, for example Type 3.
In fact, when creating a PDF, the fonts that are in the source document are not always passed straight to the resulting PDF, but they are passed through some kind of processing and conversion. Many programs do not pass the font with its original name, but they assign them a new name with some unique identifier or, as in the case of Acrobat Distiller, using its original PostScript name, if available.
These are the main types of fonts we may find (although, as we have said, several of them do not pass as they when creating a PDF):
In this font format, each character is a rectangle of pixels where the values of each pixel draw the shape of the glyph, which, being formed in this way, has an optimal resolution for reproduction. These fonts are quite old and very limited in use. That is why their encoding may pose problems as well.
PDF-creating software usually incorporate them into a PDF by converting them to Type 3 fonts, although lesser quality programmes rasterize them directly (so they fall outside this discussion because they are no longer text but images).
Type 1 fonts
This is the first vector font format that accompanied the desktop publishing revolution in 1984 and has been present in the PDF format since its inception. Despite their name, Type 1 fonts can only use some features of the PostScript language. For commercial reasons, their initial use was reserved to business firms, so they tended to be of high quality and were synonymous of "trouble-free printing".
Each variant of a family is composed of two files and they are not cross-platform (tha is: We can neither use a Macintosh font with Windows nor vice versa). They can be present straight away in a document, (embedded as a whole font or as a subset, or unembedded).
As with TrueType fonts, Type 1 fonts are being phased out in favour of OpenType. Adobe has announced that its software will completely stop operating with this format by the beginning of 2023 (however PDFs with embedded Type 1 fonts will continue to display and print without problems).
Type 3 fonts
Until recently, to find a Type 3 font in a PDF, we had to examine very old documents and it was unusual. Originally, PostScript Type 3 font format was used mainly for decorative fonts, as it supported more PostScript language features than Type 1 format, as well as supporting bitmap embedding. For historical commercial reasons it was also possible to find some normal fonts that had been created as Type 3.
However, with the relatively recent emergence of fonts such as OpenType SVG fonts, finding documents with Type 3 fonts is no longer so rare. The reason for this is that the PDF format does not currently support embedding of SVG fonts, so creating-PDF software does so by converting them to Type 3 fonts.
Note: When authoring programmes encounter fonts of formats unsupported in PDF format that incorporate commands, they treat them as Type 3 fonts, which makes this font format a kind of catch-all device.
As font of this type may include bitmaps (pixel objects), masks, gradients, patterns, colouring commands and the like. In general, because of their special characteristics and oddity they may cause technical or quality problems in their processing at printing time.
This is a vector font format launched in 1991 by Apple and supported by Microsoft to compete with PostScript Type 1 fonts. Each variant of a font consists of a single cross-platform file (can be used on either Windows or Macintosh).
For reasons that are irrelevant here, TrueType has been the most common format on personal computers for many years, and most of the free or copycat fonts created by amateurs or directly pirated low-quality versions use this format. This (and not a supposedly lower technical quality of the format itself, which is not true) means that many of the problems of low quality fonts occur with this format, which has led many professionals to view it with suspicion.
As with PostScript Type 1 fonts, TrueType is being phased out in favour of OpenType (TrueType-based OpenType fonts retain the "
.ttf" file extension).
This is a type of vectorial fonts created by Adobe in 1992 as an extension of PostScript Type 1. Its main characteristic was that, by interpolating values from some axes of some master fonts, it allowed users to create and use fonts of variable weight and width immediately. Its development was abandoned with the appearance of OpenType. Its recent successors are the variable OpenType fonts.
Composite fonts (Tipo 0)
This type of fonts were a way to solve the limitations of the original PostScript fonts, which only supported 256 characters or glyphs.
In order to meet the needs of scripts and languages such as Chinese, Japanese or Korean, which have a much larger number of characters, Adobe developed the idea of a font that groups and contains several fonts (hence the name "composite" fonts).
Composite fonts appeared in Level 2 of the PostScript language, but their behaviour within the PDF format has some slight differences. In the PDF format, a composite font is contained in a dictionary in which the fonts that are included are organised hierarchically, as if it were a tree. The parent font in the hierarchy is identified as Type 0. Non-composite fonts are considered as "simple fonts".
Child fonts within a composite font may be simple fonts or other composite fonts. The first fonts following this concept were referred to as "Original Composite Fonts" (OCF).
With PostScript Level 3, a new type of composite fonts called "CID fonts" or "CID-keyed fonts" appeared. These fonts were primarily intended for Asian scripts with a large number of characters, such as Chinese, for example.
These fonts are based on a collection of characters that are related by Character Identifier (CID) numbers to a set of glyphs. In this system, the glyphs have no name but this numeric identifier.
The main point to understand here is that what PitStop actions calls "T1 composite fonts" are those type CidFontType0, which apply the concept of CID composite fonts to PostScript Type 1 fonts; while the so-called "TT composite fonts" are those of type CidFontType2, which do the same with TrueType fonts.
Nothing prevents OpenType fonts from using this method (and they are then considered Type 2 fonts, because they use a glyph description format called Compact Font Format (CFF), which is specific to that type).
Due to various programming decisions, some versions of some PDF-creating software pass into PDF what were originally non-composite fonts as composite fonts.
This vectorial font format is a development from the TrueType and Type 1 formats. It is a joint work of Microsoft and Adobe that was published in 1996 to overcome the limitations of those formats.
As a format, it has undergone a continuous transformation and, up to now, it is the most recommended format for digital fonts as it aims to take into account all the improvements of digital typography.
All OpenType fonts use Unicode standards, which are designed for very large multi-language and standardised multi-script capability. Its ability to include many glyphs means that files can be comparatively very large and complex. The actual implementation of Unicode in each font depends on the manufacturer and the level of both standards (OpenType and Unicode) at the time of its creation.
Without entering into technical details, an OpenType font may have an internal structure similar to TrueType fonts or Type 1 fonts (as Compact Font Format (CFF); they are not Type 1 fonts). The font files of the first type (TrueType) keep the extension "
.ttf", while the CFF format files inherited from Type 1 have the file extension "
Native embedding of OpenType fonts as such is only possible from PDF level 1.6 onwards and it is currently not supported by the major programs. The common alternative is to pass a OpenType font to the PDF dictionaries as TrueType or Type 1 format, depending on its origin (this is what Illustrator or InDesign do, for example). Therefore, when examining a PDF in which OpenType fonts have been used, it is very unusual to find them as such Type 1 or True Type.
What PDF-creating software usually do when creating a PDF is to take the glyph data present in CFF format within an OpenType font and include it in the file without the rest of the font data, which are not needed. This skimming and the fact that the data are CFF is what causes PDF reading programs to identify the resulting font resource as Type 1 or Type 0 (depending on its original structure).
This format was developed by Adobe and Mozilla and it consolidated around 2016. At this moment, there are several formats, all of which consist of a single file for each typographic variant. As a basis for drawing the glyphs, it uses SVG files, which are vectorial but support the embedding of pixel images, the use of gradients, patterns and colours (hence they are also called Color Fonts). These fonts conform to OpenType specifications and they use Unicode encoding in its UTF-8 variant.
Although the possibility is under consideration, the PDF format does not incorporate them directly. When a document is created, the PDF-creatin software converts them to Type 3 fonts.
Warning: The fonts with this format may cause problems in print processing because of the complex characteristics that some of them have. Besides, it is not always possible to solve this by converting them to paths and/or rasterising them automatically.
OpenType variable fonts
This format is a development of the OpenType fonts, which takes over with improvements the possibility of Multiple Master fonts to modify the weight, slant, width and other parameters of a font to create intermediate variants.
Currently, they are transferred to a PDF as Type 1 or TrueType fonts, depending on their origin.
[© Gustavo Sánchez Muñoz, 2024] Gustavo Sánchez Muñoz (also identified as Gusgsm) is the author of the content of this page. Its graphic and written content can be shared, copied and redistributed in whole or in part without the express permission of its author with the only condition that it cannot be used for directly commercial purposes (that is: It cannot be resold, but it can form part as reasonable quotations in commercial works) and the legal terms of any derivative works must be the same as those expressed in this statement. The citation of the source with reference to this site and its author is not mandatory, although it is always appreciated.