Find #mupdf_net on Discord

The Basics#

Supported File Types#

MuPDF.NET can open files other than just PDF.

The following file types are supported:

Document Formats#

PDF
XPS
EPUB
MOBI
FB2
CBZ
SVG
TXT
MD

Image Formats#

Input formats JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD

Output formats JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS

Opening a File#

To open a file, do the following:

using MuPDF.NET;

Document doc = new Document("a.pdf"); // open a document

Extract text from a PDF#

To extract all the text from a PDF file, do the following:

using MuPDF.NET;

Document doc = new Document("example.pdf");
FileStream wstream = File.OpenWrite("1.txt");

for (int i = 0; i < doc.PageCount; i ++)
{
    string text = doc[i].GetText("html");
    Console.WriteLine(text);
    if (!string.IsNullOrEmpty(text))
        wstream.Write(Encoding.UTF8.GetBytes(text));
}

wstream.Close();

Of course it is not just PDF which can have text extracted - all the supported document file formats such as MOBI, EPUB, TXT can have their text extracted.

Note

Taking it further

If your document contains image based text content then use OCR on the page for subsequent text extraction:

TextPage tp = page.GetTextPageOcr();
string text = page.GetText(textpage: tp);

API reference

Page.GetText()

Extract images from a PDF#

To extract all the images from a PDF file, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document

for (int i = 0; i < doc.PageCount; i++)
{
    Page page = doc[i];
    List<Entry> images = page.GetImages();

    for (int j = 0; j < images.Count; j++)
    {
        int xref = images[j].Xref; // get Xref
        Pixmap pix = new Pixmap(doc, xref);

        if (pix.N - pix.Alpha > 3)
            pix = new Pixmap(Utils.csRGB, pix);

        pix.Save($"page_{i}-image_{j}.png");
        pix = null;
    }
}

Note

API reference

Extract vector graphics#

To extract all the vector graphics from a document page, do the following:

Document doc = new Document("some.file");
Page page = doc[0];
List<PathInfo> paths = page.GetDrawings();

This will return a dictionary of paths for any vector drawings found on the page.

Note

API reference

Page.GetDrawings()

Merging PDF files#

To merge PDF files, do the following:

using MuPDF.NET;

Document doc_a = new Document("a.pdf"); // open the 1st document
Document doc_b = new Document("b.pdf"); // open the 2nd document

doc_a.InsertPdf(doc_b); // merge the docs
doc_a.Save("a+b.pdf"); // save the merged document with a new filename

Merging PDF files with other types of file#

With Document.InsertFile() you can invoke the method to merge supported files with PDF. For example:

using MuPDF.NET;

Document doc_a = new Document("a.pdf"); // open the 1st document
Document doc_b = new Document("b.svg"); // open the 2nd document

doc_a.InsertFile(doc_b); // merge the docs
doc_a.Save("a+b.pdf"); // save the merged document with a new filename

Note

Taking it further

It is easy to join PDFs with Document.InsertPdf() & Document.InsertFile(). Given open PDF documents, you can copy page ranges from one to the other. You can select the point where the copied pages should be placed, you can revert the page sequence and also change page rotation.

API reference

Adding a watermark to a PDF#

To add a watermark to a PDF file, do the following:

using MuPDF.NET;

Document doc = new Document("document.pdf"); // open a document

for (int page_index; page_index < doc.PageCount; page_index ++) // iterate over pdf pages
{
    Page page = doc[page_index]; // get the page

    // insert an image watermark from a file name to fit the page bounds
    page.InsertImage(page.bound(), filename: "watermark.png", overlay: false)
}
doc.Save("watermarked-document.pdf"); // save the document with a new filename

Note

Taking it further

Adding watermarks is essentially as simple as adding an image at the base of each PDF page. You should ensure that the image has the required opacity and aspect ratio to make it look the way you need it to.

In the example above a new image is created from each file reference, but to be more performant (by saving memory and file size) this image data should be referenced only once.

API reference

Adding text to a PDF#

Using the TextWriter.WriteText() method from the TextWriter class is the simplest way to add text to a PDF. Using this method you can define many options for how your text looks and where it is positioned.

Document doc = new Document();
Page page = doc.NewPage();
string text = "MuPDF.NET: the C# bindings for MuPDF"; // define some text!
MuPDF.NET.Font font = new MuPDF.NET.Font("helv"); // define a font to use, in this case Helvetica
MuPDF.NET.TextWriter tw = new MuPDF.NET.TextWriter(page.Rect); // define the rectangle for the text
tw.Append(new(50, 100), text, font); // define the point where you want to add the text
tw.WriteText(page); // use the TextWriter to write the text to the page
doc.Save("test.pdf"); // save the result!

Note

See Inbuilt Fonts for the available options which can be used for Font without having to define an external font file on your system.

Adding an image to a PDF#

To add an image to a PDF file, for example a logo, do the following:

using MuPDF.NET;

Document doc = new Document("document.pdf"); // open a document

for (int page_index; page_index < doc.PageCount; page_index ++) // iterate over pdf pages
{
    Page page = doc[page_index]; // get the page

    // insert an image logo from a file name at the top left of the document
    page.InsertImage(new Rect(0,0,50,50),filename: "my-logo.png");
}

doc.Save("logo-document.pdf"); // save the document with a new filename

Note

Taking it further

As with the watermark example you should ensure to be more performant by only referencing the image once if possible - see the code example and explanation on Page.InsertImage().

API reference

Converting a PDF page to an image#

Pages can be converted to image data with the Page.GetPixmap() method.

It is then possible to save these images as files, for example this would save all the document pages as individual image files:

using MuPDF.NET;

Document doc = new Document("document.pdf"); // open a document

for (int page_index; page_index < doc.PageCount; page_index ++) // iterate over pdf pages
{
    Page page = doc[page_index]; // get the page
    Pixmap pixmap = page.GetPixmap(); // get as Pixmap
    string filename = $"file-{page_index}.png";
    pixmap.Save(filename);
}

doc.Close();

Converting from an image to a PDF#

This would involve creating a blank PDF, with a blank page, then adding the image to the page and setting the page dimensions to match the image.

For example:

using MuPDF.NET;

Pixmap pxmp = new Pixmap("image.png");
Document doc = new Document();
Page page = doc.NewPage(width:pxmp.W, height:pxmp.H); // create a blank document page with the same dimensions as the image

page.InsertImage(page.Rect, pixmap: pxmp); // inset the image on the page
doc.Save("output.pdf", pretty: 1);
doc.Close();

Markdown to PDF#

As Markdown files are supported input files they can be easily converted to PDF by opening the Markdown file and calling the Document.Save() method.

In the simplest case you can just open the Markdown file and call the method to get a PDF representation of the content.

Defining paper size#

The default paper size is 400 x 600 Rect but you can specify a custom paper size if you wish, to do this just send through the Rect parameter as required, for example:

MuPDF.NET.Document md_doc = new MuPDF.NET.Document("example.md", Utils.PaperRect("A4")); // A4 size

Defining CSS#

By default, the Markdown content will be converted to PDF using a default CSS stylesheet. However, you can specify your own CSS stylesheet to customize the appearance of the resulting PDF. To do this, define your css and apply it.

For example, to make all h1 headers red (The single # symbol in Markdown), you could do the following:

MuPDF.NET.Document md_doc = new MuPDF.NET.Document("example.md", Utils.PaperRect("A4"));

string css = "h1 {color:red;}";
md_doc.ApplyCss(css);

md_doc.Save("red-colored-header.pdf");

Note

The support for CSS is currently limited.

Using CSS Custom Element Selectors#

Markdown input supports custom element selectors in the CSS, so you can define your own tags in the Markdown and then refer to them in the CSS.

In this way you can specify custom styles for specific elements in the Markdown content. For example, you could define a custom tag called mytag in the Markdown and then refer to it in the CSS to make it red:

css = """
mytag {
    color: red;
}
""";

And the corresponding Markdown:

# This is a header

This is some text.

<mytag>This text will be red.</mytag>

This is particularly useful for defining image sizes. For example, you could define a custom class called my_image_class in the CSS and then refer to it in the Markdown to style images:

css = """
my_image_class img {
    width: 100px;
    height: 100px;
}
""";

With the corresponding Markdown:

# This is a header

This is some text.

<my_image_class><img src="pie-chart.png" /></my_image_class>

Defining Fonts#

Fonts can be defined by using the Archive parameter to provide a custom Archive containing the font files.

The fonts must exist in an archive which is provided to the Archive parameter when opening the Markdown file. The CSS can then refer to these fonts by their names as defined in the archive.

For example, assuming you have access to the source files for the “Comic Sans” font for all text, you could do the following:

// Global CSS instructions to use the "Comic Sans" font for all text. The font files must be provided in the archive.
string css = """
@font-face {font-family: sans-serif; src: url(comic.ttf);}
@font-face {font-family: sans-serif; src: url(comicbd.ttf); font-weight: bold;}
@font-face {font-family: sans-serif; src: url(comicz.ttf); font-weight: bold; font-style: italic;}
@font-face {font-family: sans-serif; src: url(comici.ttf); font-style: italic;}
""";

Archive archive = new Archive("C:/Windows/Fonts");  // the fonts are here
archive.Add(".");  // we've stored the archive image in this script's folder

string md_file = "sample.md";
Document md_doc = new MuPDF.NET.Document(  // open the Markdown document
    md_file,
    archive: archive,  // where to look for resources (fonts, images)
    rect: Utils.PaperRect("A4")  // page dimension ISO A4
);

md_doc.ApplyCss(css);

Extracting & Drawing vector graphics#

The following example shows how to extract drawings from a page in a PDF and then recreate them in a new PDF.

Essentially the process is:

Load the PDF with the drawing information you want to extract.
Extract drawing info from the page as a list of path information by using Page.GetDrawings().
Create a blank document for the output.
Add a Shape to the page to hold the drawing info.
Iterate the path information and look for lines, bezier, rectangle or quad objects.
Draw the required paths onto the new Shape.
Decorate the shape with the required styling.
Commit the shape.
Save the output document.

using MuPDF.NET;

Document doc = new Document("pdf-with-vector-drawings-on-page-one.pdf");
Page page = doc[0];
List<PathInfo> paths = page.GetDrawings();

var outpdf = new Document();
var outpage = outpdf.NewPage(width: page.Rect.Width, height: page.Rect.Height);
var shape = outpage.NewShape();

foreach(PathInfo path in paths)
{
    foreach (Item item in path.Items)
    {
        if (item.Type == "l")
            shape.DrawLine(item.P1, item.LastPoint);
        else if (item.Type == "c")
            shape.DrawBezier(item.P1, item.P2, item.P3, item.LastPoint);
        else if (item.Type == "re")
            shape.DrawRect(item.Rect, item.Orientation);
        else if (item.Type == "qu")
            shape.DrawQuad(item.Quad);
        else
            throw new Exception("unhandled drawing");
    }
    shape.Finish(
        fill: path.Fill,
        color: path.Color,
        dashes: path.Dashes,
        evenOdd: path.EvenOdd,
        closePath: path.ClosePath,
        lineJoin: (int)path.LineJoin,
        lineCap: ((int)path.LineCap.ElementAt(0)),
        width: path.Width,
        strokeOpacity: path.StrokeOpacity,
        fillOpacity: path.FillOpacity
        );
}

shape.Commit();
outpdf.Save("graphics-redrawn.pdf");

Rotating a PDF#

To add a rotation to a page, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open document
Page page = doc[0]; // get the 1st page of the document
page.SetRotation(90); // rotate the page
doc.Save("rotated-page-1.pdf");

Note

API reference

Page.SetRotation()

Cropping a PDF#

To crop a page to a defined Rect, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open document
Page page = doc[0]; // get the 1st page of the document
page.SetCropBox(new Rect(100, 100, 400, 400)); // set a cropbox for the page
doc.Save("cropped-page-1.pdf");

Note

API reference

Page.SetCropBox()

Attaching Files#

To attach another file to a page, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf") // open main document
Document attachment = new Document("my-attachment.pdf"); // open document you want to attach

Page page = doc[0]; // get the 1st page of the document
Point point = new Point(100, 100); // create the point where you want to add the attachment
byte[] attachment_data = attachment.Write(); // get the document byte data as a buffer

// add the file annotation with the point, data and the file name
Annot file_annotation = page.AddFileAnnot(point, attachment_data, "attachment.pdf");

doc.Save("document-with-attachment.pdf"); // save the document

Note

Taking it further

When adding the file with Page.AddFileAnnot() note that the third parameter for the filename should include the actual file extension. Without this the attachment possibly will not be able to be recognized as being something which can be opened. For example, if the filename is just “attachment” when view the resulting PDF and attempting to open the attachment you may well get an error. However, with “attachment.pdf” this can be recognized and opened by PDF viewers as a valid file type.

The default icon for the attachment is by default a “push pin”, however you can change this by setting the icon parameter.

API reference

Embedding Files#

To embed a file to a document, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf") // open main document
Document embedded_doc = new Document("my-embed.pdf") // open document you want to embed

byte[] embedded_data = embedded_doc.Write(); // get the document byte data as a buffer

// embed with the file name and the data
doc.AddEmbfile("my-embedded_file.pdf", embedded_data);

doc.Save("document-with-embed.pdf"); // save the document

Note

Taking it further

As with attaching files, when adding the file with Document.AddEmbfile() note that the first parameter for the filename should include the actual file extension.

API reference

Deleting Pages#

To delete a page from a document, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document
doc.DeletePage(0); // delete the 1st page of the document
doc.Save("test-deleted-page-one.pdf"); // save the document

To delete a multiple pages from a document, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document
doc.DeletePages(from: 9, to: 14); // delete a page range from the document
doc.Save("test-deleted-pages.pdf"); // save the document

What happens if I delete a page referred to by bookmarks or hyperlinks?#

A bookmark (entry in the Table of Contents) will become inactive and will no longer navigate to any page.
A hyperlink will be removed from the page that contains it. The visible content on that page will not otherwise be changed in any way.

Note

Taking it further

The page index is zero-based, so to delete page 10 of a document you would do the following doc.DeletePage(9).

Similarly, doc.DeletePages(from: 9, to: 14) will delete pages 10 - 15 inclusive.

API reference

Re-Arranging Pages#

To change the sequence of pages, i.e. re-arrange pages, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document
doc.MovePage(1, 0); // move the 2nd page of the document to the start of the document
doc.Save("test-page-moved.pdf"); // save the document

Note

API reference

Document.MovePage()

Copying Pages#

To copy pages, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document
doc.CopyPage(0); // copy the 1st page and puts it at the end of the document
doc.save("test-page-copied.pdf"); // save the document

Note

API reference

Document.CopyPage()

Selecting Pages#

To select pages, do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document
doc.Select(new List<int>([0, 1])); // select the 1st & 2nd page of the document
doc.Save("just-page-one-and-two.pdf"); // save the document

Note

Taking it further

With MuPDF.NET you have all options to copy, move, delete or re-arrange the pages of a PDF. Intuitive methods exist that allow you to do this on a page-by-page level, like the Document.CopyPage() method.

Or you alternatively prepare a complete new page layout in form of a list, that contains the page numbers you want, in the sequence you want, and as many times as you want each page. The following may illustrate what can be done with Document.Select()

doc.Select(new List<int>([1, 1, 1, 5, 4, 9, 9, 9, 0, 2, 2, 2]))

Now let’s prepare a PDF for double-sided printing (on a printer not directly supporting this):

The number of pages is given by len(doc) (equal to doc.PageCount).

This snippet creates the respective sub documents which can then be used to print the document:

doc.Select(p_even) // only the even pages left over
doc.Save("even.pdf") // save the "even" PDF
doc.close() // recycle the file
doc = new Document(doc.name) // re-open
doc.Select(p_odd) // and do the same with the odd pages
doc.Save("odd.pdf")

For more information also have a look at this Wiki article.

The following example will reverse the order of all pages (extremely fast: sub-second time for the 756 pages of the Adobe PDF References):

int lastPage = doc.PageCount - 1;
for(int i = 0; i < lastPage; i ++)
    doc.MovePage(lastPage, i) // move current last page to the front

This snippet duplicates the PDF with itself so that it will contain the pages 0, 1, …, n, 0, 1, …, n (extremely fast and without noticeably increasing the file size!):

int pageCount = doc.PageCount;
for(int i = 0; i < pageCount; i ++)
    doc.CopyPage(i) // copy this page to after last page

API reference

Document.Select()

Adding Blank Pages#

To add a blank page, do the following:

using MuPDF.NET;

Document doc = new Document(...) // some new or existing PDF document
Page page = doc.NewPage(-1, // insertion point: end of document
                    width: 595, // page dimension: A4 portrait
                    height: 842)
doc.Save("doc-with-new-blank-page.pdf") // save the document

Note

Taking it further

Use this to create the page with another pre-defined paper format:

(int w, int h) = Utils.PaperSize("letter-l"); // 'Letter' landscape
Page page = doc.NewPage(width: w, height: h);

The convenience function PaperSize() knows over 40 industry standard paper formats to choose from. To see them, inspect dictionary paperSizes. Pass the desired dictionary key to PageSize() to retrieve the paper dimensions. Upper and lower case is supported. If you append “-L” to the format name, the landscape version is returned.

Here is a 3-liner that creates a PDF: with one empty page. Its file size is 460 bytes:

doc = new Document();
doc.NewPage();
doc.Save("A4.pdf");

API reference

Document.NewPage()
PaperSize

Inserting Pages with Text Content#

Using the Document.InsertPage() method also inserts a new page and accepts the same width and height parameters. But it lets you also insert arbitrary text into the new page and returns the number of inserted lines.

using MuPDF.NET;

Document doc = new Document(...)  // some new or existing PDF document
int n = doc.InsertPage(-1, // default insertion point
                    text: "The quick brown fox jumped over the lazy dog",
                    fontsize: 11,
                    width: 595,
                    height: 842,
                    fontname: "Helvetica", // default font
                    fontfile: None, // any font file name
                    color: new float[]{0, 0, 0}) // text color (RGB)

Note

Taking it further

The text parameter can be a (sequence of) string (assuming UTF-8 encoding). Insertion will start at Point (50, 72), which is one inch below top of page and 50 points from the left. The number of inserted text lines is returned.

API reference

Document.InsertPage()

Splitting Single Pages#

Splitting considers creating new PDF documents from an existing input file. To split we need to find the pages we are interested in and insert them into a new PDF document.

Example #1 - Split document by each page#

using MuPDF.NET;

Document src = new Document("test.pdf"); // open a document

for (int i = 0; i < src.PageCount; i ++) // for each page in input
{
    Page page = src[i];
    Document splitDocument = new Document();
    splitDocument.InsertPdf(src, fromPage: i, toPage: i);
    splitDocument.Save("test-"+i+ ".pdf");
}

Example #2 - Split document pages by bookmark#

using MuPDF.NET;

Document src = new Document("test.pdf"); // open a document
var toc = src.GetToc();

foreach (var item in toc)
{
    Console.WriteLine("title=" + item.Title);
    Console.WriteLine("item=" + item.Page);
    Document splitDocument = new Document();
    splitDocument.InsertPdf(src, fromPage: item.Page, toPage: item.Page);
    splitDocument.Save("test-"+item.Title+ ".pdf");
}

Note

API reference

Document.InsertPdf()

Combining Single Pages#

This deals with joining PDF pages to form a new PDF with pages each combining two or four original ones (also called “2-up”, “4-up”, etc.). This could be used to create booklets or thumbnail-like overviews.

using MuPDF.NET;

Document src = new Document("example.pdf");
Document doc = new Document();  // empty output PDF
(int width, int height) = Utils.PaperSize("a4");
Rect r = new Rect(0, 0, width, height);

// define the 4 rectangles per page
Rect r1 = new Rect(0, 0, r.Width / 2, r.Height / 2);  // top left rect
Rect r2 = new Rect(r.Width / 2, 0, r.Width, r.Height / 2);  // top right
Rect r3 = new Rect(0, r.Height / 2, r.Width / 2, r.Height);  // bottom left
Rect r4 = new Rect(r.Width / 2, r.Height / 2, r.Width, r.Height);  // bottom right

// put them in a list
Rect[] r_tab = new Rect[] { r1, r2, r3, r4 };
Page? page = null;

// now copy input pages to output
for (int i = 0; i < src.PageCount; i++)
{
    if (i % 4 == 0)  // create new output page
    {
        page = doc.NewPage(-1,
                      width: width,
                      height: height);
    }

    // insert input page into the correct rectangle
    if (page != null)
    {
        page.ShowPdfPage(r_tab[i % 4],  // select output rect
                         src,  // input document
                         i); // input page number
    }
}

// by all means, save new file using garbage collection and compression
doc.Save("4up.pdf", garbage: 3, deflate: 1);

Example:

Note

API reference

Page.CropboxPosition()
Page.ShowPdfPage()

PDF Encryption & Decryption#

Starting with version 1.16.0, PDF decryption and encryption (using passwords) are fully supported. You can do the following:

Check whether a document is password protected / (still) encrypted (Document.NeedsPass, Document.IsEncrypted).
Gain access authorization to a document (Document.Authenticate()).
Set encryption details for PDF files using Document.Save() or Document.Write() and
- decrypt or encrypt the content
- set password(s)
- set the encryption method
- set permission details

Note

A PDF document may have two different passwords:

The owner password provides full access rights, including changing passwords, encryption method, or permission detail.
The user password provides access to document content according to the established permission details. If present, opening the PDF in a viewer will require providing it.

Method Document.Authenticate() will automatically establish access rights according to the password used.

The following snippet creates a new PDF and encrypts it with separate user and owner passwords. Permissions are granted to print, copy and annotate, but no changes are allowed to someone authenticating with the user password.

using MuPDF.NET;

string text = "some secret information"; // keep this data secret
int perm = int(
    PdfAccess.PDF_PERM_ACCESSIBILITY // always use this
    | PdfAccess.PDF_PERM_PRINT // permit printing
    | PdfAccess.PDF_PERM_COPY // permit copying
    | PdfAccess.PDF_PERM_ANNOTATE // permit annotations
);
string owner_pass = "owner"; // owner password
string user_pass = "user"; // user password
int encrypt_meth = (int)PdfCrypt.PDF_ENCRYPT_AES_256; // strongest algorithm
Document doc = new Document(); // empty pdf
Page page = doc.NewPage(); // empty page
page.InsertText((50, 72), text); // insert the data
doc.Save(
    "secret.pdf",
    encryption=encrypt_meth, // set the encryption method
    owner_pw=owner_pass, // set the owner password
    user_pw=user_pass, // set the user password
    permissions=perm, // set permissions
);

Note

Taking it further

Opening this document with some viewer (Nitro Reader 5) reflects these settings:

Decrypting will automatically happen on save as before when no encryption parameters are provided.

To keep the encryption method of a PDF save it using encryption=PdfCrypt.PDF_ENCRYPT_KEEP. If doc.CanSaveIncrementally() == true, an incremental save is also possible.

To change the encryption method specify the full range of options above (encryption, owner_pw, user_pw, permissions). An incremental save is not possible in this case.

API reference

Document.Save()

Getting Page Links#

Links can be extracted from a Page to return Link objects.

using MuPDF.NET;

for (int i = 0; i < doc.PageCount; i ++) // iterate the document pages
{
    Page page = doc[i];
    link = page.FirstLink;  // a `Link` object or `None`

    while(link != null) // iterate over the links on page
        // do something with the link, then:
        link = link.Next // get next link, last one has `None` in its `next`
}

Note

API reference

Page.FirstLink()

Getting All Annotations from a Document#

Annotations (Annot) on pages can be retrieved with the Page.GetAnnots() method.

using MuPDF.NET;

for (int i = 0; i < doc.PageCount; i ++)
{
    Page page = doc[i];

    List<Entry> annots = page.GetAnnots();

    for (int j = 0; j < annots.Count; j++)
    {
        Console.WriteLine("Annotation on page:"+annots[j]);
    }

}

Note

API reference

Page.GetAnnots()

Redacting content from a PDF#

Redactions are special types of annotations which can be marked onto a document page to denote an area on the page which should be securely removed. After marking an area with a rectangle then this area will be marked for redaction, once the redaction is applied then the content is securely removed.

For example if we wanted to redact all instances of the name “Jane Doe” from a document we could do the following:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document

// Iterate over each page of the document
for (int i = 0; i < doc.PageCount; i ++)
{
    Page page = doc[0]

    // Find all instances of "Jane Doe" on the current page
    List<Entry> instances = page.SearchFor("Jane Doe");

    // Redact each instance of "Jane Doe" on the current page
    for (int j = 0; j < instances.Count; j ++) {
        page.AddRedactAnnot(instances[j]);
    }

    // Apply the redactions to the current page
    page.ApplyRedactions();
}

doc.Save("redacted_document.pdf");

doc.Close();

Another example could be redacting an area of a page, but not to redact any line art (i.e. vector graphics) within the defined area, by setting a parameter flag as follows:

using MuPDF.NET;

Document doc = new Document("test.pdf"); // open a document

// Get the first page
Page page = doc[0];

// Add an area to redact
rect = new Rect(0,0,200,200);

// Add a redacction annotation which will have a red fill color
page.AddRedactAnnot(rect, fill: new float[]{1,0,0});

// Apply the redactions to the current page, but ignore vector graphics
page.ApplyRedactions(graphics: 0);

// Save the modified document
doc.Save("redacted_document.pdf");

// Close the document
doc.Close();

Warning

Once a redacted version of a document is saved then the redacted content in the PDF is irretrievable. Thus, a redacted area in a document removes text and graphics completely from that area.

Note

Taking it further

The are a few options for creating and applying redactions to a page, for the full API details to understand the parameters to control these options refer to the API reference.

API reference

Working with Barcodes#

MuPDF.NET supports reading and writing of barcodes.

Reading Barcodes#

To read barcodes from a page define the area you wish to check for and use the ReadBarcodes method as follows:

using MuPDF.NET;

Document doc = new Document("document_with_barcodes.pdf"); // open a document

// Get the first page
Page page = doc[0];

// Define an area where you want to check for barcodes
Rect rect = new Rect(290, 590, 420, 660);

// Search the area for barcode data
List<Barcode> barcodes = page.ReadBarcodes(rect);

// List any barcode data
foreach (Barcode barcode in barcodes)
{
    BarcodePoint[] points = barcode.ResultPoints;
    Console.WriteLine($"Page {i++} - Type: {barcode.BarcodeFormat} - Value: {barcode.Text} - Rect: [{points[0]},{points[1]}]");
}

Note

If the Rect parameter is ommitted then the whole page will be searched for barcodes, however this is a slower operation.

Writing Barcodes#

To write barcodes you can either add them directly to pages with Page.WriteBarcode() or create images directly with Utils.WriteBarcode().

Example #1: adding a barcode to a page

using MuPDF.NET;

Document doc = new Document("document_with_barcodes.pdf"); // open a document

// Get the first page
Page page = doc[0];

Rect rect = new Rect(100, 0, 190, 90);
page.WriteBarcode(rect, "Hello World!", BarcodeFormat.QR_CODE);

Example #2: create a barcode as an image file

using MuPDF.NET;

Utils.WriteBarcode("QR_CODE.png", "Hello World!", BarcodeFormat.QR_CODE, width: 300, height: 300);

Example #3: create a barcode as a `Pixmap`

using MuPDF.NET;

Pixmap pixmap = Utils.GetBarcodePixmap("Hello World!", BarcodeFormat.QR_CODE);

Note

See BarcodeFormat for available barcodes.

See Page.ReadBarcodes.

See Page.WriteBarcode.

See Utils.ReadBarcodes.

See Utils.WriteBarcodes.

Using Image Filters#

MuPDF.NET supports applying image filters to images when pre-processing OCR images for better recognition.

The following example shows how to apply a gamma correction and scaling filter to the images used for OCR when extracting text from a page.

using MuPDF.NET;

string testFilePath = Path.GetFullPath(@"E:\Ocr.pdf");
Document doc = new Document(testFilePath);
Page page = doc[0];

// build the pipeline
var pipeline = new ImageFilterPipeline();
pipeline.Clear();
pipeline.AddGamma(gamma: 1.5f);
pipeline.AddScale(scaleFactor: 3f, quality: SKFilterQuality.High);

TextPage tp = page.GetTextPageOcr((int)TextFlags.TEXT_PRESERVE_SPANS, full: true, imageFilters: pipeline);
string txt = tp.ExtractText();
Console.WriteLine(txt);

doc.Close();

Note

See ImageFilterPipeline for available image filter pipeline methods.