Overview

DjVu (pronounced like the French phrase déjà vu) is a file format and associated software for storing scanned documents and image-heavy pages. It was created in the late 1990s to address the need for compact, searchable reproductions of books, magazines and technical papers. The format is particularly suited to pages that include a mixture of continuous-tone imagery (photographs), sharp line art and textual content.

How DjVu works

DjVu reduces file size by partitioning a page into different layers and using compression techniques tailored to each layer. Typical layers include a smooth, color background for photographs, a sharp foreground for text and line art, and a bitonal mask that preserves fine edges. For bitonal content, DjVu uses a specialized compression (often called JB2) that can recognize repeating shapes such as letters; for continuous-tone parts it uses a wavelet-based codec. The combination often yields much smaller files than a single full-color scanned image compressed with JPEG.

Characteristics and features

  • Layered representation (background, foreground, mask) to preserve both photographic detail and crisp text.
  • Efficient compression for bi-level text and graphics, and wavelet compression for color regions.
  • Optional text layer produced by OCR so documents can be searched and text copied, similar to the searchable layer in a PDF (OCR).
  • Support for progressive rendering and tiled access so viewers can display pages quickly without downloading entire files.
  • Openly documented formats and free implementations (for example DjVuLibre) alongside commercial tools.

History and development

DjVu emerged in the late 1990s as researchers sought better ways to publish scanned content on the web and to distribute digital archives. Compared with embedding a photographic scan directly into a PDF, DjVu's approach often produced much smaller files for the same visual quality. Developers of DjVu have reported typical file sizes for scanned materials such as color magazine pages, black-and-white technical papers and older manuscripts; these figures illustrate the format's focus on compact storage and efficient transmission.

Uses and examples

Because of its efficiency with mixed-content pages, DjVu has been used by libraries, archives and personal digitization projects that supply large collections of scanned pages. It is also used as an alternative to other image formats such as JPEG when the goal is to retain legible text and fine line detail while minimizing file size. Some institutions publish searchable DjVu editions alongside or instead of PDF versions (comparison with PDF).

Tools, compatibility and notable facts

Several free and commercial viewers and converters exist for DjVu, and there are software libraries that enable conversion to and from other formats. Browser plugin support was more common in earlier web browsers; modern workflows frequently use server-side conversion or dedicated viewers. The format's technical specification and supporting code are available from multiple sources (format documentation), and converters and readers often appear in digital archive projects. For practical examples and sample pages, some project pages demonstrate DjVu's compression on sample magazine and book pages (sample pages).

Overall, DjVu remains a useful option when the objective is to distribute large quantities of scanned pages with good legibility and much smaller download sizes than simple photographic scans.