Unicode is the standard for defining computer characters. It removes the limitations and conflicts of traditional encodings. With 137,929 characters, it has enough capacity to completely cover the world’s current and historic languages. It also contains symbols and special characters like emojis.
UTF-8 is a variable length encoding format where the first 128 characters (1st octet) are the original ASCII character set – bare bones text, numbers and simple punctuation without an support for foreign language or special characters. All characters in the global Unicode character set are encoded using one to four 8-bit bytes (octet). UTF-8 is the dominant character encoding used on the Web, in email and with XML/HTML.
Unicode characters are at the heart of everything you read. Visual effects like typeface, font size and color embellish the characters. Line and paragraph spacing, tables and graphics make reading easier. Without these added features, text looks like simple typewriter characters.
Databases and spreadsheets output text. This data passes through systems where it’s formatted as reports and statements. In addition, text is extracted form databases to generate keywords, abstracts and excerpts. The Unicode format ensures that there are no conflicts in these operations. For example, Unicode ensures that content management systems are free of conflicts between overlapping language character sets.
More on Unicode
Visit the Unicode Organization
Fun Fact – Emojis
Everyone uses emojis. The Unicode Consortium approves and manages these popular images. They represent things like faces, weather, emotions, animals and languages. They also express love, thanks and congratulations. In fact, more emojis are added all the time.
ISO-8859 encoding is an extension to to the basic ASCII character set to accommodate non-English languages. It dates back to 1983 and was last updated in 1998. ASCII used the first 7 bits of the 8-bit octet (128 characters). The remaining bit was used to add extended, language specific characters. Usage of the extra bit has been recorded in 16 different variations such as Latin-1 for Western Europe, Latin-2 for Eastern Europe, Latin-Cyrillic and so on. Because the definitions were regional, each variation was unable to completely handle all the additional characters needed for a specific language so users were forced to adjust in how they used their language on computers. The Dutch, for example, have a letter which is a combination of I and J (“IJ”). Latin 1 did not have space to add this character so, in Holland, users adapted to writing “IJ” as two separate characters.
Background on the ISO-8859 Encoding
ISO-8859 encoding was a great solution if you operated in one region and could cover everything you needed to in 256 characters. With increasing globalization, and companies sending emails across borders and languages, a more comprehensive solution was needed.
The bottom half of the table contains the Latin-1 character set. There are 16 tables like this, each containing a different set of extended characters. The Arabic version, for example, would include all the special Arabic characters. ISO 8859 definitely helped with international character support, but as more and more document and emails were being exchanged, text would look corrupted if the reader had a different character set on their PC.
In 1990, the first version of Unicode was defined using ISO-8859 encoding (Latin 1) as the first 256 Unicode code points. Now it was possible to contain all of the world’s characters in one table!
Extracting Text from PDF Documents
Scientists use illustrations and diagrams to communicate ideas and findings. Today, papers, posters and scientific journals appear in print and on-line. They contain charts, equations, line-art, diagrams and drawings. Use SVG for all types of scientific illustrations. It’s is the ideal format for the scientific arena.
Ensure top quality by using SVG for Scientific Drawings
SVG is the standard vector format for the Web. It’s a feature in most of today’s illustration programs. Vector graphics are ideal for sharing scientific information. Bitmap formats like PNG and JPEg are also used, but, they have limitations compared to vector graphics. They are not editable and don’t zoom without losing quality. Read more about the difference between vector and raster images.
Creating SVG for Scientific Illustrations
For original SVG art, you can use a program like Inkscape. It’s free, feature-rich and produces SVG as its native format.
For files created in other programs, convert them using pdf2picture. This program converts PDF, EPS and Adobe Illustrator vector files to SVG. It’s simple to use and works in vector or image mode. It creates SVG as well as EPS, WMF, EMF, JPEG, PNG, GIF, BMP, and TIFF. These are basically all the formats you need for publishing today.
SVG Example – Scientific Drawing
To zoom in and out on Mac, use command + and command –, On Windows, use ctrl + and ctrl –
If you can not see the SVG image below, please upgrade your browser. All of the latest browsers support SVG.
Enhanced Metafile (EMF) is one of two metafile formats supported in Microsoft Windows. Metafiles are the native vector format for Microsoft Office and other Windows applications. They are also the core graphics display format of the Windows operating system.
WMF is a 16-bit version found in 16-bit and 32-bit Windows platforms. EMF is an enhanced 32-bit format update that supports the newer operating systems. When you copy a drawing to the Windows clipboard, it’s a metafile. Developers use metafiles to render output to the screen or send it to the printer. Most Windows applications read and write both WMF and EMF.
Although standard on Windows, metafiles are not supported in XML, UNIX, CAD, Web and PDF print.
WMF is a safe choice and supported in all graphical Windows applications. EMF offers more. It’s customizable, extendable and has any more features. But, these benefits also make it a riskier choice. The flexibility can introduce incompatibilities between EMF files and applications that support EMF.
Benefits of EMF include scaling, Bezier curves, cropping, color support and device independence. EMF also includes support for metadata which saves text descriptions with the file. Because developers can add functionality to the EMF specification, they risk creating format mismatches. This modification feature is what leads to incompatibilities between different types of EMF pictures.
In Windows, right-click on a graphic and select “Save As”. Choose EMF.
Use pdf2picture to convert any PDF, EPS or Adobe Illustrator file to EMF. This can be done on a single file or in batch mode.
EMF Support Notes:
- EMF is the enhanced 32-bit metafile format while WMF is the well-established 16-bit format.
- WMF has excellent application support at all levels within the 16-bit and 32-bit Microsoft Windows platforms
- EMF adds support for curves, cropping, and extremely large drawings
- WMF and EMF both support vector graphics, raster images, text and fonts
- Visual Integrity software overcomes the limitations of the metafile format by emulating missing features.
- We offer options to convert from metafiles to other formats as well as from PDF to metafiles.
- If you have a choice and do not need the advanced graphical features that EMF supports, using WMF as your input format is a safer choice.
Windows Metafiles (WMF) and Enhanced Metafiles (EMF) are the core display formats of the Windows operating system. They are also the native graphics formats for Microsoft Office applications. WMF is still in wide-use today across 16-bit and 32-bit Windows platforms. EMF is an enhanced 32-bit format update.
When you copy a drawing. Windows stores it as a metafile, the native Windows vector format. Metafiles render output on screen and send it to print. Most Windows applications read and write WMF and EMF. Metafiles are not used in XML, UNIX, CAD, Web and PostScript/PDF print environments.
What’s a Metafile?
Metafiles are formats that can store different types of data like vectors, bitmaps and text in one file. Examples of common metafile formats today are PDF, SVG, WMF and EMF. Because they contain all the information needed to display and print, they are a foundational component of operating systems. Windows uses WMF and EMF while the Mac uses PDF. They store this mix of information in memory or in files and then push it to the output devices.
In Windows, you can right-click on a vector graphic and Select “Save As”. Three of the formats offered are vector formats – WMF, EMF and SVG.
Use pdf2picture to convert any PDF, EPS or Adobe Illustrator file to either WMF or EMF. This can be done on a single file or in batch mode.
WMF Support Notes
- WMF is the 16-bit format. It has excellent application support at all levels within the 16-bit and 32-bit Microsoft Windows platforms
- EMF is the enhanced 32-bit metafile format. It adds support for curves, cropping, and larger file sizes.
- WMF and EMF both support vector graphics, raster images, text and fonts in one format.
- Visual Integrity compensates for inherent Metafile limitations. For example, WMF does not support curves or cropping while PDF does. Our engine emulates these PDF features in the metafile output.
- META FLY can convert both 16-bit WMF and 32-bit EMF to vector and image formats.
- Of the two, WMF is the most reliable format if you don’t need EMF’s enhanced features.
- Learn More from Wikipedia
TIFF traces its origins to the 1980 where it was proposed as a standard for the emerging desktop scanner market. Originally, it was a simple format and could only understand black & white. Now, it is a robust format which handles high color, greyscale and large files like a breeze. Today, TIFF is a popular format for high color-depth images, along with JPEG and PNG. It is also a flexible, adaptable file format for handling images and data within a single file, by including header tags indicating size, compression, etc. It is regularly used in FAX and archive applications.
TIFF Format Overview
TIFF (Tagged Image File Format) is a widely supported high-quality raster image format, and a standard in areas like faxing, imaging and archiving. TIFF files can be viewed on just about every computer using a variety of applications. They can be single-page or multi-page, black & white or full color, high resolution for printing or low resolution for screen display.
TIFF file formats are used for storing high quality images. They can be very large. It is the favored image format in many graphic applications. These include FAX, archival and scanning applications, image manipulation programs, desktop publishing and 3-D imaging applications. A TIFF version called GeoTIFF is used to store geo-referenced raster images.
- Supports CCITT G3 and G4 encoding, LZW and MacIntosh RLE (packbits) encoding
- Supports 1-bit B&W, 8-bit colormap and 24-bit True Color
- Output to any resolution (dpi)
- Supports output to single-page, multiple single-page, single multi-page
- Option to output to standard or custom page size or as a cropped portion
- Source files may include vector graphics, raster images, text and fonts
The BMP format is the native image format for the Microsoft Windows operating system. Historically it was an important format in productivity and publishing applications but has been displaced by more compact, efficient and portable formats such as JPEG, PNG and TIFF. BMP is still used in some applications. BMP files are high quality but large in comparison to the alternatives. Unlike other image-file formats like GIF and JPEG, the BMP file format was not designed to be portable. It was really designed to easily work with the Windows API using the same structures that Windows applications use to manipulate in-memory bitmaps. As Windows has changed, so has the BMP file format. Windows now has several documented versions of the BMP format. Visual Integrity can support any of these variations.
JPEG format, or JPG, is a widely-used raster graphics format in print and on-line. It is also the most common format for still digital camera output. JPEG stands for Joint Photographic Experts Group, Today, the JPEG format is an open standard format that is jointly maintained by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC),
Lossy or Lossless Compression
The terms “lossy” and “lossless” refer to what happens to an image’s integrity during the compression process. When an image stays exactly the same when compressed, it’s called a “loss”less compression. The image is exactly the same after being resized. In lossy compression, the colors of pixels are changed to fit a smaller size. This is “loss”y compression. The image will look good because of the intelligence applied to color it properly when scaled. However, it will never be exactly the same again.
Creating JPEG – Convert PDF to JPEG
Most of today’s paint and graphics programs save files in JPEG (also known as the JPG format). Screen capture utilities and Web design tools also do. If you need to convert PDF to JPEG, use pdf2image and pdf2picture. JPEGs can also be generated programmatically using the PDF Conversion Server via command-line or the PDF Conversion SDK API.
JPEG compresses both full-color and grayscale images. Its target is digital still photographs and realistic images. For line art, lettering, cartoons and other graphics with few colors and sharp lines, PNG is best suited.
JPEG is very good at compressing rich color images without any perceptible change in the file. This is lossy compression. Even though colors drop to achieve compression, it’s balanced so that it fools the eye. If you decompress the image, it doesn’t regain the same quality as the original. GIF and PNG are lossless formats. Compressing them doesn’t change the image.
JPEG Format – Example
The JPEG format serves two primary purposes: it makes image files smaller and it stores 24-bit per pixel color data (full color) which is better in many cases that 8-bit per pixel data. Making image files smaller is essential for storing and transmitting files. Being able to compress a 2MB full-color file down to, for example, 100KB makes a big difference in disk space and transmission time. JPEG can quickly provide 20:1 compression of full-color data. With GIF images, the size ratio is usually more like 4:1.
- Visual Integrity solutions output to 24-bit color with JPEG encoding by default
- Output to the JPG format in any dpi resolution or pixel size (preserving aspect ratio)
- Source files for conversion to the JPEG format may include vector graphics, raster images, and font text strings
- Advanced anti-aliasing applied during the conversion process
- pdf2image and pdf2picture are desktop programs that convert PDF and EPS files to the JPEG format (JPG Format)
- Convert PDF to JPEG programmatically using the PDF Conversion Server via command-line or the PDF Conversion SDK API.
JPEG Format Alternatives to Watch
JPEG first appeared in 1986. The joint standard has evolved over the years to support the innovations in storage and computing capabilities. Today, options for high-end imaging are proliferating. All of the new formats have significantly improved compression and image quality.
- JPEG 2000 is from the originators of JPEG – Joint Photographic Experts Group and is wavelet-based. It was designed as the successor to jPEG. What makes it stand out, is that it can be either lossy or lossless. It promises high quality and low latency.
- JPEG XR is a compressed image format used primarily for HD Photo files. It supports monochrome, CMYK and bit-depths of 16 or greater.
- WebP is an image format developed by Google. It supports both lossy and lossless compression in Chrome and Opera. It’s derived from the VP8 video format.
- HEIC is a format supported by Apple as a replacement for the JPEG format in iOS.
The Graphics Interchange Format (GIF) is a raster output format. Although still widely used for animations, it’s been replaced by PNG for static images. It’s a good format for graphics with sharp borders and distinct color transitions It’s used for logos, line-art and other images with few colors. Compact with fast load times GIF also supports transparency and animation. It is well supported by all browsers. GIF is rarely used in desktop publishing.
When to Use GIF
GIF is not well suited for photographs or other high-color images. JPEG is usually the format of choice for these. In fact, GIF is an 8-bit format which means that an image may not contain more than 256 colors. GIF is still the best choice if you want to animate a series of images. It has lost ground to PNG in recent years. PNG was developed as an alternative to GIF when it’s patent holders, Unisys and Compuserve, threatened to charge royalties for use of the format on the Web. These patents expired in 2006 but the damage had been done. PNG was adopted as a good alternative and technically had more to offer.
GIF is a good format for any simple image with few colors. It is also the only image format that supports animation of a series of files. It is supported well on all browsers, including legacy versions of Internet Explorer. When creating new content online today, most people choose PNG because of it’s better transparency, alpha-channel support, indexed color and smaller file size. GIF is still an absolutely fine, safe choice. if that’s what your editing software outputs and 256 colors is enough, go ahead and use GIF.
- Supports 1-bit B&W with CCITT G3/4 encoding
- Supports 8-bit colormap with LZW compression
- Output to any dpi resolution or pixel dimensions preserving the aspect ratio
- Source files may include vector graphics, raster images, text strings and fonts
- Advanced anti-aliasing applied during production
Example of GIF Animation
PDF is both a Visual Integrity input and an output format. The Portable Document Format (PDF) is the defacto, worldwide standard for capturing, reviewing and exchanging information from almost any application or computer system and sharing it with others. Although sometimes sharing is enough, often people need to extract useful data from their PDF files in order to feed content managements systems, publish to the web or edit drawings and illustrations. Visual Integrity has been offering reliable and accurate solutions to convert, create, merge and modify PDF files for more than 17 years.
PDF Format Overview
PDF is the Portable Document Format originally developed by Adobe® for platform and application independent exchange, viewing and printing of digital documents. Now it’s an open international ISO Standard 32000-2. PDF files can be created from virtually any source application or database. They can can handle long, composite documents as well as raster images and complex vector graphics. It has become ubiquitous and a defacto standard across every industry.
Often, recipients of PDF files need to reuse the information contained in these files. Visual Integrity products are designed to digest the PDF and produce output files of the highest standard which can be opened and edited in most applications including office software, publishing systems and engineering applications.
Visual Integrity offers several different products for converting PDF files. Choose the one that’s right for you and then learn more:
- pdf2image – converts any PDF file into your choice of image formats for the web or your office documents – GIF, PNG and JPEG.
- pdf2picture – produces scalable picture files (WMF) which can be ungrouped and edited in Microsoft Office and other Windows applications which support drawing features. Also produces image
- pdf2cad – generated editable CAD files from PDF pages. Once these files are opened in your engineering software, they can be modified or annotated.
- PDF FLY – if you need to convert to several output formats or need to convert more than PDF, PDF FLY is for you, Choose multiple files or directories and then convert to your choice of 13 different standard vector and bitmap output formats. PDF FLY is a complete suite of input and output formats.
- PDF Conversion Server – Automate your PDF conversions using the simple command-line interface. Use the fine-tuning controls to obtain a high degree of control over the final output.
- PDF Conversion SDK – Use the API or Windows DLL to integrate support for viewing, importing and exporting vector and bitmap formats into your applications.
- Supports conversion of single and multipage PDF files up to version 1.7 (Acrobat 9).
- Files may contain vector graphics, raster images and text with fonts (PostScript, Type 1 or TrueType fonts).
- We do not yet support conversion/extraction of sound, video, animation, 3D or metadata. If you require this, please inquire about the status.
5 Ways to Create PDF Files
PDF is ideal for presenting just about anything that can be printed. Once in the PDF format, light editing is permitted using products such as Acrobat Pro. Using Visual Integrity software you can then turn these PDF files into a format easily digested by software you already own which saves both time and cost. If you need to create PDF files for your documentation, publishing and data management needs, try the following ideas:
– Use Adobe Acrobat or Acrobat Professional
– Print to Acrobat Distiller
– Use a reliable 3rd-party PDF writer or printer drivers such as CutePDF (free)
– Output PDF directly from your source application or database if it is supported. The latest versions of Microsoft Office can all create PDF.
– Create PDF from data by using FOP to transform XML to PDF