Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@
^pkgdown$
^doc$
^Meta$
^CRAN-SUBMISSION$
7 changes: 7 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,10 @@
# Upstream patches must stay byte-identical to what `git format-patch`
# / `git apply` expect — no autocrlf.
*.patch binary

# Shell scripts that R CMD INSTALL executes must keep LF endings even
# on Windows checkouts, or `sh` rejects them with `\r: command not
# found`. This includes the package's configure scripts.
configure text eol=lf
configure.win text eol=lf
*.sh text eol=lf
3 changes: 3 additions & 0 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Version: 0.1.0
Date: 2026-05-31 21:42:49 UTC
SHA: e121bc96335cd429524d5f53d1bf584856a9ef1b
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: pdfium
Title: Idiomatic R Bindings to the PDFium PDF Engine
Title: Idiomatic R Bindings to the 'PDFium' PDF Engine
Version: 0.1.0
Authors@R: c(
person("Bill", "Denney", , "wdenney@humanpredictions.com",
Expand All @@ -9,7 +9,7 @@ Authors@R: c(
comment = "Authors of bundled PDFium binaries (BSD-3-Clause)")
)
Description: Read PDF documents at the level of pages, page objects, and path
geometry using Google's PDFium engine. Surfaces path segments, stroke and
geometry using Google's 'PDFium' engine. Surfaces path segments, stroke and
fill style, transformation matrices, text positions and content, font
metadata, image metadata, and page rendering. Complements 'pdftools' and
'qpdf' by exposing vector-path information no other R package surfaces.
Expand Down
2 changes: 1 addition & 1 deletion R/api_completion.R
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ pdf_page_has_transparency <- function(page) {
cpp_page_has_transparency(page$ptr)
}

#' Page bounding box (cropbox mediabox)
#' Page bounding box (cropbox intersect mediabox)
#'
#' Wraps `FPDF_GetPageBoundingBox` — returns the rectangle that
#' encloses the visible portion of `page` after intersecting the
Expand Down
2 changes: 1 addition & 1 deletion R/form_fields.R
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ form_field_flag_decode <- function(flags, bit) {
#' `TRUE` / `FALSE` for `checkbox` / `radiobutton` fields,
#' `NA` for every other field type.
#' * `control_count` integer - total number of widgets in this
#' field's control group (≥ 1; `> 1` for radio button groups
#' field's control group (`>= 1`; `> 1` for radio button groups
#' with multiple physical widgets). `NA` if PDFium reports
#' failure.
#' * `control_index` integer - 0-based position of this row's
Expand Down
20 changes: 16 additions & 4 deletions R/pdfium-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,22 @@
#'
#' @section Binary distribution:
#'
#' The underlying `libpdfium` shared library is downloaded from
#' [bblanchon/pdfium-binaries](https://github.com/bblanchon/pdfium-binaries)
#' the first time the package is installed. The pinned version lives in
#' `tools/pdfium-version.txt`.
#' At install time, the `configure` script picks a `libpdfium` to
#' build against, in this order:
#'
#' 1. The `PDFIUM_HOME` environment variable, if it points at a
#' directory containing `include/fpdfview.h` and a
#' `libpdfium` shared library (`lib/libpdfium.{so,dylib}` on
#' POSIX, or `lib/libpdfium.dll.a` + `bin/libpdfium.dll` on
#' Windows).
#' 2. `pkg-config --exists libpdfium` (POSIX only).
#' 3. Standard system prefixes: `/usr/local`, `/usr`,
#' `/opt/homebrew`, `/opt/local` (POSIX only).
#' 4. Download from
#' [bblanchon/pdfium-binaries](https://github.com/bblanchon/pdfium-binaries).
#' The pinned release lives in `tools/pdfium-version.txt`.
#' Set `PDFIUM_OFFLINE=1` and stage the tarball under
#' `inst/pdfium-binaries/` for offline installs.
#'
#' @keywords internal
#' @name pdfium-package
Expand Down
52 changes: 43 additions & 9 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,6 @@ under `dev/decisions/`.

## Installation

`pdfium` downloads its `libpdfium` binary from
[bblanchon/pdfium-binaries](https://github.com/bblanchon/pdfium-binaries)
at install time. The pinned version lives in
`tools/pdfium-version.txt`. If your install runs without internet
access, set `PDFIUM_OFFLINE=1` and place the matching tarball under
`inst/pdfium-binaries/` before installing.

```r
# Release version (once on CRAN):
install.packages("pdfium")
Expand All @@ -85,6 +78,46 @@ install.packages("pdfium")
remotes::install_github("humanpred/rpdfium")
```

### Where the `libpdfium` binary comes from

At install time, the `configure` script picks a `libpdfium` to
build against, in this order:

1. **`PDFIUM_HOME`** — if this environment variable is set and
points at an existing install, that install is used. The
directory must contain headers and the shared library in the
conventional layout:

| Platform | Required files under `$PDFIUM_HOME` |
|------------|----------------------------------------------------------------------|
| Linux | `include/fpdfview.h` and `lib/libpdfium.so` (or `lib64/`) |
| macOS | `include/fpdfview.h` and `lib/libpdfium.dylib` |
| Windows | `include/fpdfview.h`, `lib/libpdfium.dll.a`, `bin/libpdfium.dll` |

Useful when you have a hand-built PDFium, a vendored copy,
or a CI artefact you want to pin against.

2. **`pkg-config --exists libpdfium`** *(POSIX only)* — if a
`libpdfium.pc` is on the `pkg-config` search path, the
reported `includedir` / `libdir` are used.

3. **Standard system prefixes** *(POSIX only)* — `/usr/local`,
`/usr`, `/opt/homebrew`, `/opt/local`. The first one
containing both `include/fpdfview.h` and a `libpdfium`
shared library wins.

4. **Download from
[bblanchon/pdfium-binaries](https://github.com/bblanchon/pdfium-binaries)**
— the pinned release lives in `tools/pdfium-version.txt`. If
your install runs without internet access, set
`PDFIUM_OFFLINE=1` and place the matching tarball under
`inst/pdfium-binaries/` before installing.

When a system install is found, no download happens and no
`libpdfium` is bundled into the installed package — your
existing copy resolves at load time via the platform's normal
shared-library search path.

## Example

```{r example, eval = FALSE}
Expand All @@ -105,5 +138,6 @@ package = "pdfium")`, etc.) and on the

`pdfium` is MIT-licensed. The bundled `libpdfium` binary is BSD-3-Clause
and is *not* distributed in the source tarball — see
[`LICENSE.md`](LICENSE.md) and
[`dev/decisions/ADR-003-binary-distribution.md`](dev/decisions/ADR-003-binary-distribution.md).
[`LICENSE.md`](https://github.com/humanpred/rpdfium/blob/main/LICENSE.md)
and
[`dev/decisions/ADR-003-binary-distribution.md`](https://github.com/humanpred/rpdfium/blob/main/dev/decisions/ADR-003-binary-distribution.md).
133 changes: 84 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,54 @@
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file. -->



# pdfium

<!-- badges: start -->

[![R-CMD-check](https://github.com/humanpred/rpdfium/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/humanpred/rpdfium/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/humanpred/rpdfium/branch/main/graph/badge.svg)](https://app.codecov.io/gh/humanpred/rpdfium)
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN
status](https://www.r-pkg.org/badges/version/pdfium)](https://CRAN.R-project.org/package=pdfium)
[![Codecov test
coverage](https://codecov.io/gh/humanpred/rpdfium/graph/badge.svg)](https://app.codecov.io/gh/humanpred/rpdfium)
[![Codecov test coverage](https://codecov.io/gh/humanpred/rpdfium/branch/main/graph/badge.svg)](https://app.codecov.io/gh/humanpred/rpdfium)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/pdfium)](https://CRAN.R-project.org/package=pdfium)
[![Codecov test coverage](https://codecov.io/gh/humanpred/rpdfium/graph/badge.svg)](https://app.codecov.io/gh/humanpred/rpdfium)
<!-- badges: end -->

`pdfium` provides idiomatic R bindings to [Google’s PDFium
engine](https://pdfium.googlesource.com/pdfium/) — the same library that
powers Chromes PDF viewer. It has two halves:
`pdfium` provides idiomatic R bindings to
[Google's PDFium engine](https://pdfium.googlesource.com/pdfium/) — the
same library that powers Chrome's PDF viewer. It has two halves:

- a **read surface** that exposes vector-path geometry — stroke / fill /
Bezier control points / transformation matrices — alongside text,
fonts, images, annotations, form fields, attachments, signatures,
structure tree, and rendering. The path geometry, in particular, no
other CRAN package surfaces today.
- a **mutation surface** (opt-in via `readwrite = TRUE`) that lets you
rotate / reorder / merge pages, draw fresh page objects, create and
edit annotations, fill form fields, and add file attachments — then
save the result.
* a **read surface** that exposes vector-path geometry —
stroke / fill / Bezier control points / transformation matrices —
alongside text, fonts, images, annotations, form fields,
attachments, signatures, structure tree, and rendering. The path
geometry, in particular, no other CRAN package surfaces today.
* a **mutation surface** (opt-in via `readwrite = TRUE`) that lets
you rotate / reorder / merge pages, draw fresh page objects,
create and edit annotations, fill form fields, and add file
attachments — then save the result.

## What it is for

- **Auditing** PDF figures (which lines, which colors, which fonts).
- **Extracting** curves from regulatory filings and scientific
* **Auditing** PDF figures (which lines, which colors, which fonts).
* **Extracting** curves from regulatory filings and scientific
publications.
- **Building** PDF normalization pipelines that need geometry, not just
text.
- **Filling** AcroForm fields programmatically and flattening the result
for downstream tooling.
- **Authoring** programmatic PDFs from vector graphics, JPEG images,
text in the 14 standard fonts or any TrueType / Type1 typeface, and
annotations (think: figure callouts, table reports, annotated source
documents). `/Info`-dict writes and on-save encryption are the
remaining v0.1.0 gaps — both need upstream PDFium changes that we’ve
proposed but Google hasn’t shipped yet.
- Anything you’d otherwise drop into Python with `pypdfium2`.

See
[`vignette("mutating-pdfs")`](https://humanpred.github.io/rpdfium/articles/mutating-pdfs.html)
* **Building** PDF normalization pipelines that need geometry, not
just text.
* **Filling** AcroForm fields programmatically and flattening the
result for downstream tooling.
* **Authoring** programmatic PDFs from vector graphics, JPEG
images, text in the 14 standard fonts or any TrueType / Type1
typeface, and annotations (think: figure callouts, table
reports, annotated source documents). `/Info`-dict writes and
on-save encryption are the remaining v0.1.0 gaps — both need
upstream PDFium changes that we've proposed but Google hasn't
shipped yet.
* Anything you'd otherwise drop into Python with `pypdfium2`.

See [`vignette("mutating-pdfs")`](https://humanpred.github.io/rpdfium/articles/mutating-pdfs.html)
for a walkthrough of the writer surface, and
[`vignette("comparison")`](https://humanpred.github.io/rpdfium/articles/comparison.html)
for how `pdfium` lines up against `pdftools`, `qpdf`, `magick`,
Expand All @@ -63,23 +63,57 @@ under `dev/decisions/`.

## Installation

`pdfium` downloads its `libpdfium` binary from
[bblanchon/pdfium-binaries](https://github.com/bblanchon/pdfium-binaries)
at install time. The pinned version lives in `tools/pdfium-version.txt`.
If your install runs without internet access, set `PDFIUM_OFFLINE=1` and
place the matching tarball under `inst/pdfium-binaries/` before
installing.

``` r
```r
# Release version (once on CRAN):
install.packages("pdfium")

# Development version:
remotes::install_github("humanpred/rpdfium")
```

### Where the `libpdfium` binary comes from

At install time, the `configure` script picks a `libpdfium` to
build against, in this order:

1. **`PDFIUM_HOME`** — if this environment variable is set and
points at an existing install, that install is used. The
directory must contain headers and the shared library in the
conventional layout:

| Platform | Required files under `$PDFIUM_HOME` |
|------------|----------------------------------------------------------------------|
| Linux | `include/fpdfview.h` and `lib/libpdfium.so` (or `lib64/`) |
| macOS | `include/fpdfview.h` and `lib/libpdfium.dylib` |
| Windows | `include/fpdfview.h`, `lib/libpdfium.dll.a`, `bin/libpdfium.dll` |

Useful when you have a hand-built PDFium, a vendored copy,
or a CI artefact you want to pin against.

2. **`pkg-config --exists libpdfium`** *(POSIX only)* — if a
`libpdfium.pc` is on the `pkg-config` search path, the
reported `includedir` / `libdir` are used.

3. **Standard system prefixes** *(POSIX only)* — `/usr/local`,
`/usr`, `/opt/homebrew`, `/opt/local`. The first one
containing both `include/fpdfview.h` and a `libpdfium`
shared library wins.

4. **Download from
[bblanchon/pdfium-binaries](https://github.com/bblanchon/pdfium-binaries)**
— the pinned release lives in `tools/pdfium-version.txt`. If
your install runs without internet access, set
`PDFIUM_OFFLINE=1` and place the matching tarball under
`inst/pdfium-binaries/` before installing.

When a system install is found, no download happens and no
`libpdfium` is bundled into the installed package — your
existing copy resolves at load time via the platform's normal
shared-library search path.

## Example


``` r
library(pdfium)

Expand All @@ -90,13 +124,14 @@ pdf_page_count(doc)
pdf_doc_close(doc)
```

More examples ship in the vignettes
(`vignette("getting-started", package = "pdfium")`, etc.) and on the
More examples ship in the vignettes (`vignette("getting-started",
package = "pdfium")`, etc.) and on the
[pkgdown site](https://humanpred.github.io/rpdfium/).

## License

`pdfium` is MIT-licensed. The bundled `libpdfium` binary is BSD-3-Clause
and is *not* distributed in the source tarball — see
[`LICENSE.md`](LICENSE.md) and
[`dev/decisions/ADR-003-binary-distribution.md`](dev/decisions/ADR-003-binary-distribution.md).
[`LICENSE.md`](https://github.com/humanpred/rpdfium/blob/main/LICENSE.md)
and
[`dev/decisions/ADR-003-binary-distribution.md`](https://github.com/humanpred/rpdfium/blob/main/dev/decisions/ADR-003-binary-distribution.md).
Loading
Loading