Skip to main content

Content Encryption

Intro

In Q3 2023, we started working on adding a layer of encryption to the files served to the reader as an additional security measure to the ones we already had. We continued this effort through 2024, and this documentation offers a detailed end-to-end walkthrough of how it works.

Because of the nature of our product, this new functionality has to be coordinated between different moving parts. Specifically, the projects farfalla, volpe, micelio, and fenice.

The main point of coordination is the property called pla_publication_manifest_version ("publica.la publication manifest version"), which ensures all moving parts agree on how to transfer files.

The pla_publication_manifest_version property is transferred in two ways:

  1. Explicitly, and in plain text, in the JSON payloads that farfalla generates for volpe or fenice.
    1. More on this in farfalla's section.
  2. Explicitly, and encrypted, in micelio's URLs Signatures that farfalla generates for volpe and fenice.
    1. More on this in micelio's section.

Context

*It may be helpful to first read our Product Architecture and Engineering documentation for a broader perspective on our product and key infrastructure.

Glossary

  • Session: Refers to "session", "reader session", "reading session", "listening session" and "offline session" interchangeably. It denotes the period during which a user interacts with the content on the platform, either online or offline.
  • pla_publication_manifest_version: A property used to ensure consistency and coordination across various components, dictating how files are transferred and encrypted on the platform.
  • CDN (Content Delivery Network): An infrastructure designed to serve content as fast and cost-effectively as possible. We utilize Cloudflare as our CDN.
  • POP (Points of Presence): These are data centers or physical locations where a CDN serves content from. The performance of a CDN is often influenced by the proximity of the POP to the user—the closer, the better.
  • EPUB: A format for digital books, essentially a ZIP archive without compression, containing a collection of files such as XML, HTML, CSS, JS, images, and multimedia. It is considered a packaged webpage.
  • PDF: A format for digital documents. On our platform, PDFs are converted into JPEG and HTML files for serving.
  • Audiobook: Digital audio content, typically served in MP3 format, with each chapter being a separate MP3 file.
  • Encryption: The process of converting data into a secure format that cannot be easily understood by unauthorized parties. We use AES-256-GCM encryption to secure content on the platform.
  • Decryption: The process of converting encrypted data back into its original format. Decryption on our platform is handled by the volpe component.
  • AES-256-GCM: An encryption algorithm that provides both security and performance, ensuring the confidentiality and integrity of the data.
  • Tarball: A tarball is a compressed archive file format used primarily on Unix-based systems. It consolidates multiple files and directories into a single file with a .tar extension, which can then be compressed further, typically using gzip, resulting in a .tar.gz or .tgz file.
  • ZIP: A ZIP file is a widely-used archive format that compresses one or more files or directories into a single file with a .zip extension. It is commonly used across different operating systems for reducing file size and organizing files for easier distribution. On our platform, ZIP files are not used directly for content encryption, but understanding ZIP is helpful since EPUB files (a key file type on our platform) are essentially ZIP archives containing multiple content files like HTML, CSS, and images.

Broad technology context

  • We offer a web-based reading and listening experience.
    • To provide that experience, we work with EPUBs, PDFs, and Audiobooks.
  • Serving files on the web presents the inherent challenge that the web was actually designed for the free flow of information.
    • Regardless of the usage of technologies such as HTTPS, once the content reaches the end user's device, it's essentially out of our complete control.
    • The end user's device might have a debugging proxy running, might override our JavaScript code with custom one, etc.
  • There's no standards-based DRM solution for text and audio on the web, as there is for video. Therefore, we designed our own.
    • We based it on Readium's LCP. We didn't implement it directly mainly due to the following reasons:
      • It only supported EPUBs; we need to also support PDFs and Audiobooks.
      • Its server-side implementation is written in Go, our expertise is in PHP and JavaScript. And we know how to host JavaScript very efficiently and cost-effectively, especially taking into account the required bandwidth.

What are the actual file types being transferred?

We pre-process the original content files before serving:

  • EPUB

    • It's essentially a ZIP archive without compression and a specific internal format.
    • Inside, it's comprised of XML, HTML, CSS, JS, JPEG, PNG, MP3, MP4, etc. files.
      • We serve those files, not the EPUB itself, and some metadata as JSON.
    • It can be considered a packaged webpage.
  • PDF

    • As of 2024, we convert PDFs into JPEG and HTML files.
      • We serve those files, not the PDF itself, and some metadata as JSON.
  • Audiobook

    • Each chapter is its own MP3 file.
    • As of 2024, we don't apply any processing to those MP3 files.
      • We serve those files and some metadata as JSON.

Where and when is content encrypted?

  • In micelio, synchronously (on the fly) during online sessions. Each time volpe requests a file, micelio encrypts it and sends it to volpe.
  • In farfalla, asynchronously (on a queued job) for downloads for offline sessions. Generating a tarball

Where and when is content decrypted?

  • In volpe, synchronously (on the fly) while the user consumes the content. Each time volpe requests a file, it has to decrypt it before using it.

How is content encrypted?

  • EPUB
    • XML, HTML, CSS, and JS files are AES-256 encrypted.
  • PDF
    • JPEG files are AES-256 encrypted.
  • Audiobook
    • MP3 files are AES-256 encrypted.

What's farfalla's role?

  • To determine pla_publication_manifest_version to use.
  • To encrypt files and generate tarballs for fenice.
  • More on this in farfalla's section.

What's micelio's role?

  • To serve volpe with files, encrypted or not.
  • To serve fenice with the tarballs generated by farfalla.
  • To assert if each request is from a real user and for legitimate use.
  • More on this in micelio's section.

What's volpe's role?

  • It's the reading and listening UI for the users.
  • To request publications details from farfalla or fenice.
  • To request publications files micelio or fenice.
  • To decrypt publications files.

What's fenice's role?

  • fenice's only role is to be the middleman between farfalla+micelio and volpe.
  • fenice won't encrypt or decrypt files; it will download files from micelio and prepare them to be used by volpe.
  • From fenice's point of view, the entire process is opaque, and it doesn't need to know if the files are encrypted or not.
  • More on this in fenice's section.

publica.la publication manifest

First things first, it's called "publica.la publication manifest" because it's inspired in https://github.com/readium/webpub-manifest and it's meant to be the place where we can centralize and version publications and volpe configs.

For example:

  • Whether a user can copy text from the publication, how much and under which circumstances.
  • Which features of volpe are available in each publication, such as TTS or AI Tools.

All that being said, at the time of writing this we only use pla_publication_manifest_version to coordinate the security aspect.

Versions

We currently support the following versions of pla_publication_manifest_version

1.0.0

  • This is the legacy and standard version we set when first implementing pla_publication_manifest_version
  • It's meant for backwards compatibility with versions of fenice+volpe that do not even support pla_publication_manifest_version
  • Does not use encryption
  • Uses micelio/felini v1

2.0.0

  • Intermediary version that was not released to production
  • This is the first actual version we set when first implementing pla_publication_manifest_version
  • Uses encryption
  • Uses micelio/felini v2
  • Shares a single IV

2.1.0

  • Intermediary version that was not released to production
  • Uses encryption
  • Uses micelio/felini v2
  • IV from tenant_id, user_id and issue_id

2.2.0

  • First version released to production
  • This version uses multiple tarballs for any publication download
    • These tarballs are reusable for any tenant+issue combination
    • Micelio serves these via the routes:
      • /v2/epub-download/*
      • /v2/pdf-download/*
  • Includes the total download size in the download responses
  • Uses micelio/felini v2
  • Uses an IV specific for each tenant+issue combination. We use the store tenant_id, not the owner tenant_id.
  • In EPUBs:
    • 'html', 'htm', 'css' and 'xml' files are encrypted
    • 'jpg' or 'jpeg' files are untouched
  • In PDFs:
    • 'html' from text and annotation la files are encrypted
    • Large 'jpg' files are byte flipped
    • Thumnail 'jpg' files are untouched
  • In Audiobooks: Encryption is disabled, for all online and offline cases.

2.3.0 -> FUTURE version

  • This version uses multiple tarballs for any publication download
    • These tarballs are reusable for any tenant+issue combination. We use the store tenant_id, not the owner tenant_id
    • Micelio serves these via the routes:
      • /v2/epub-download/*
      • /v2/pdf-download/*
  • Includes the total download size in the download responses
  • Uses stream encryption
  • Uses micelio/felini v2
  • In EPUBs:
    • 'html', 'htm', 'css' and 'xml' files are encrypted
    • 'jpg' or 'jpeg' files are encrypted
  • In PDFs:
    • 'html' from text and annotation la files are encrypted
    • Large 'jpg' files are encrypted
    • Thumnail 'jpg' files are untouched
  • In Audiobooks: Encryption is disabled, for all online and offline cases.

2.4.0 -> FUTURE version

  • WIP same as 2.3.0 but with support for Audiobooks

How is the correct version determined

In all scenarios, farfalla has the responsibility of deciding which pla_publication_manifest_version to use.

It will do so as follows:

Online web: Farfalla will use the latest version available, considering Volpe's ability to update automatically in the future and remain independent of Fenice's version. This ensures the response is aligned with the most recent encryption features.

Online or Offline Fenice: Farfalla will determine the version based on the specific Fenice version making the request, taking into account the following scenarios:

For versions <= 1.22, if encryption is disabled (Feature Encriptación = False), the base version 1.0.0 will be used. This applies to both online and downloadable content.

For versions <= 1.22 with encryption enabled (Feature Encriptación = True), the base version 1.0.0 will still be used due to limitations in recognizing encryption.

For versions <= 1.22 with encryption enabled, the manifest version <= 2.2.0 will apply, allowing for decryption of earlier versions but restricting audio due to lack of encryption.

For versions >= 1.23 with encryption enabled, the manifest version >= 2.3.0 will be applied, ensuring compatibility with previous encrypted content.

Encryption And Manifest Version Matrix

Fenice VersionFeature EncryptionModeManifest Version ResultantComments
<= 1.22FalseOnline1.0.0Base version, unencrypted.
<= 1.22FalseDownload1.0.0Base version, unencrypted.
<= 1.22TrueOnline1.0.0Base version, unencrypted but encryption is enabled in settings.
<= 1.22TrueDownload1.0.0Base version, unencrypted but encryption is enabled in settings.
<= 1.22FalseOnline1.0.0Base version, unencrypted.
<= 1.22FalseDownload1.0.0 (Not for desktop)Desktop download not allowed for unencrypted versions.
<= 1.22TrueOnline<= 2.2.0Encrypted version with decryption up to 2.2.0.
<= 1.22TrueDownload<= 2.2.0Encrypted version with decryption up to 2.2.0; audio not allowed as it's unencrypted.
=> 1.23TrueOnline=> 2.3.0Encrypted version 2.3.0; compatible with previous content.
=> 1.23TrueDownload=> 2.3.0Encrypted version 2.3.0; compatible with previous content; audio not allowed as it's unencrypted.

farfalla will state which pla_publication_manifest_version to use in two places:

  1. Explicitly, and in plain text, in the JSON payloads that farfalla generates for volpe or fenice.
    1. More on this in farfalla's section.
  2. Explicitly, and encrypted, in micelio's URLs Signatures that farfalla generates for volpe and fenice.
    1. micelio will adhere to this version set by farfalla.
    2. More on this in micelio's section.

Release Procedure for web

In order to ensure farfalla can be the decision maker, we need to carefully follow this release procedure:

  1. Release micelio update.
  2. Release volpe update.
  3. Release farfalla update.

This way, we can rely on the fact that every time a new reader session is started, it will have the latest version of volpe, micelio will already be ready, and farfalla will be able to default to the latest pla_publication_manifest_version.

Release Procedure for fenice

In this scenario, farfalla will always use the latest version supported by fenice. That means, the specific version of fenice that is making the requests.

In order for this to work correctly, we rely on the fact that fenice informs its own version to farfalla and micelio in the header X-FeniceBase-Version.

When fenice makes its own requests, it will set this header and ensure it's part of the requests.

fenice also injects this header into the WebView that ultimately loads volpe. This is how we can differentiate between volpe requests made from the web and inside fenice's WebView. Both farfalla and micelio may use this to determine the correct pla_publication_manifest_version to use, apply security checks, etc.

How to upgrade

In order to upgrade any part of this security system, we need to pay special attention to backwards compatibility and rely on the pla_publication_manifest_version value.

For example, if we are running v2.1.0 and need to release v2.2.0, we need to:

  • Release a version of volpe that can use it.
  • Release a version of micelio that can use it.
  • Release a version of farfalla that can use it.
  • Ensure farfalla can also temporarily follow the v2.1.0 protocol until usage is low enough to be deprecated.
    • volpe on the web will update almost instantly, but the version embedded in fenice apps could take months or more.
  • Ensure micelio can also temporarily follow the v2.1.0 protocol until usage is low enough to be deprecated.
    • volpe on the web will update almost instantly, but the version embedded in fenice apps could take months or more.

In general, we need to keep backwards compatibility on the server side because the client is partially out of our control.

How we keep track of these and narrow down the backwards compatibility scenarios is crucial for keeping the platform as a whole secure. It doesn't matter if we can quickly patch a vulnerability in a new pla_publication_manifest_version if we need to keep the previous one available forever for all clients/users/devices.

Ultimately, we will have to forcefully deprecate versions and prompt the users to upgrade. This has to be a core mechanism of fenice.

Backwards Compatibility Example

The following example illustrates a future upgrade from pla_publication_manifest_version v2.1.0 to v2.2.0:

  1. An app based on fenice 1.22.1 and volpe 2.1 downloads the publication issue_id=14625 from tenant_id=62 for user_id=30828:
  • farfalla inspects fenice's request and sees that it's from fenice 1.22.1; farfalla knows that it has to use pla_publication_manifest_version v2.1.0.
  • This tarball is generated by farfalla according to the spec of pla_publication_manifest_version v2.1.0.
  • farfalla's response also explicitly states the usage of pla_publication_manifest_version v2.1.0.
  • fenice stores the publication JSON payload and the unpacked files from the tarball.
  1. Then, when said app is updated to fenice 1.25.0 and has volpe 2.5:
  • volpe 2.5 is backwards compatible with pla_publication_manifest_version v2.1.0, but also has support for the newer pla_publication_manifest_version v2.2.0.
  • When opening the downloaded publication, fenice constructs the publication payload honoring the original pla_publication_manifest_version v2.1.0 that farfalla previously stated.
  • volpe 2.5 will be able to correctly handle the publication.
  1. On subsequent downloads, when the app starts a download:
  • farfalla will inspect fenice's request and see that it's from fenice 1.25.0; farfalla knows that it has to use pla_publication_manifest_version v2.2.0.
  • This tarball is generated by farfalla according to the spec of pla_publication_manifest_version v2.2.0.
  • farfalla's response also explicitly states the usage of pla_publication_manifest_version v2.2.0.
  • fenice stores the publication JSON payload and the unpacked files from the tarball.

Content Serving Scenarios

Now let's review the different flows starting with the legacy ones:

Legacy

1. Online web session without encryption

Projects at work: farfalla, volpe, and micelio

  1. A user opens the reader.
  2. farfalla initiates the reading session and sets pla_publication_manifest_version as '1.0.0'.
  3. farfalla boots volpe in an iframe.
  4. volpe initializes and hits /api/v1/sessions to get the publication details and file URLs.
  5. volpe hits those URLs.
  6. micelio handles each request individually, validates it, and returns the file as the response.
    • If micelio detects something wrong with the request, it may respond with a 404.

2. Offline web PWA session without encryption

This PWA has not received ongoing maintenance for some time and will soon be completely deprecated once Publica Reader Desktop is available.

Projects at work: farfalla, volpe, and micelio

  • A user triggers the download of a publication.
  • farfalla PWA JavaScript hits /api/v1/pwa/issue/ and persists the payload in IndexedDB.
  • farfalla initiates the reading session and doesn't set a pla_publication_manifest_version property.
  • farfalla PWA JavaScript hits /api/v1/sessions to get the publication details and file URLs, and persists the payload in IndexedDB.
  • farfalla PWA hits those URLs and persists the files in IndexedDB as an ArrayBuffer.
  • micelio handles each request individually, validates it, and returns the file as the response.
    • If micelio detects something wrong with the request, it may respond with a 404.
  • At some point in the future, farfalla boots volpe in an iframe and sets the offline_issue property with all the information from IndexedDB. In this scenario, volpe doesn't need to hit /api/v1/sessions and can continue working offline.

3. Online fenice session without encryption

Projects at work: farfalla, volpe, micelio, and fenice

4. Offline fenice session without encryption

Projects at work: farfalla, volpe, micelio, and fenice

  • A user triggers the download of a publication.
  • fenice hits:
  • fenice hits those URLs and persists the files in the local filesystem.
  • micelio handles each request individually, validates it, and returns the file as the response.
    • If micelio detects something wrong with the request, it may respond with a 404.
  • At some point in the future, fenice boots volpe and sets the offline_issue property with all the information required. In this scenario, volpe doesn't need to hit /api/v1/sessions and can continue working offline.
    • In fenice, volpe is handled inside a WebView. That WebView loads a simple HTML file that then loads volpe inside an iframe.

5. Online and offline tartaruga sessions without encryption

Tartaruga was the project name of the previous generation of mobile apps, which was fully deprecated in August 2024 in favor of fenice.

New

1. Online web session with encryption

Projects at work: farfalla, volpe, and micelio

2. Online fenice session with encryption

Projects at work: farfalla, volpe, micelio, and fenice

3. Offline fenice session with encryption

Projects at work: farfalla, volpe, micelio, and fenice

  • A user triggers the download of a publication.
  • fenice hits:
  • fenice hits those URLs and persists the files in the local filesystem.
    • fenice will unpack the tarballs and assemble a final directory structure with the content.
    • More on fenice's section,
  • micelio handles each request individually, validates it, and returns the file as the response.
    • If micelio detects something wrong with the request, it may respond with a 404.
  • At some point in the future, fenice boots volpe and sets the offline_issue property with all the information required. In this scenario, volpe doesn't need to hit /api/v1/sessions and can continue working offline.
    • In fenice, volpe is handled inside a WebView. That WebView loads a simple HTML file that then loads volpe inside an iframe.

Encryption strategy

Encryption Algorithm

We use AES-256-GCM because it is:

  • Fast enough to encrypt
  • Fast enough to decrypt
  • Supports streaming encryption and decryption
    • So we don't need to keep the entire file in memory
    • This is key for efficient and cheaper processing in Cloudflare workers
  • The resulting payload size is only slightly bigger than the original size before encryption

How AES-256-GCM Works

Overview

AES-256-GCM (Advanced Encryption Standard with Galois/Counter Mode) is a symmetric encryption algorithm widely used for securing data due to its balance of performance and security. It provides both encryption and integrity checking, making it particularly suitable for high-performance environments like ours.

Key Characteristics:

  • Symmetric Encryption: The same key is used for both encryption and decryption.
  • 256-bit Key Length: Provides a very high level of security, resistant to brute-force attacks.
  • AES (Advanced Encryption Standard): Is a widely-used encryption algorithm that securely converts plaintext into ciphertext, ensuring that data cannot be easily accessed by unauthorized parties.
  • Galois/Counter Mode (GCM): Is an encryption mode for AES that combines encryption with authentication. It ensures data integrity by producing a tag that verifies the data hasn't been tampered with.
  • Authenticated Encryption: GCM produces an authentication tag alongside the ciphertext, verifying that the data hasn't been tampered with.
Encryption Process

When we use AES-256-GCM to encrypt content, the following steps occur:

  1. Key Generation: A 256-bit key is used, which is derived from the CONTENT_ENCRYPTION_PASSPHRASE . This key is crucial for both encrypting and decrypting the content.

    1. This is also commonly known as a "passphrase", it's essentially a password.
  2. IV (Initialization Vector): An IV, also known as a nonce, is a random or pseudo-random value used along with the key. In our implementation, the IV is 12 bytes long, derived from the CONTENT_ENCRYPTION_IV , and must be unique for each encryption operation to ensure security.

    1. In practice, this works as an additional password. This is the one we put most of our efforts on, at least in pla_publication_manifest_version as '2.1.0'.
  3. Encryption: The content is encrypted in blocks using the AES algorithm in counter mode. GCM mode ensures that each block of plaintext is XORed with the output of the AES-encrypted IV and then produces the ciphertext.

    1. XOR (exclusive or) is a mathematical operation used in encryption where each bit of the plaintext is combined with a bit of another value (like an encrypted IV) to produce the ciphertext, enhancing security.
  4. Authentication Tag: As the encryption process proceeds, GCM generates an authentication tag. This tag is used to verify the integrity of both the ciphertext and additional authenticated data (if any), ensuring that the content has not been altered.

    1. In Web Cryptography API this is appended automatically to the encrypted data.
    2. In PHP's openssl_encrypt this has to be done manually.
Decryption Process

The decryption process essentially reverses the encryption steps:

  1. Retrieve Key and IV: The same CONTENT_ENCRYPTION_PASSPHRASE and CONTENT_ENCRYPTION_IV are used to generate the key and IV required for decryption.

  2. Decrypt Content: Using the key and IV, the encrypted content is decrypted by reversing the counter mode operation. The original plaintext is reconstructed from the ciphertext.

  3. Verify Integrity: The authentication tag generated during encryption is checked during decryption to ensure that the data has not been tampered with. If the tags do not match, decryption fails, indicating potential data corruption or tampering.

Usage in Our System

In our system, AES-256-GCM is implemented to secure content served to users. Here's how the CONTENT_ENCRYPTION_PASSPHRASE and CONTENT_ENCRYPTION_IV are used:

  • CONTENT_ENCRYPTION_PASSPHRASE: This passphrase is shared between components (e.g., micelio, farfalla, and volpe) and is used to generate the 256-bit encryption key. Consistent passphrase usage ensures that all parts of the system can correctly encrypt and decrypt content.

    • It's a 32 byte, 256 bit key as a HEX encoded string. For example: 1013940600cea415ba6ca03d1dbeba53899163fdbde412a03e43d99c2a1c1eba

    • Set in environment variables and hardcoded when building.

  • CONTENT_ENCRYPTION_IV: This 12-byte IV ensures that each encryption operation produces a unique ciphertext, even when the same content is encrypted multiple times. It is derived from a shared secret, ensuring synchronization across components.

    • It's a 12 byte, 96 bit iv as a HEX encoded string. For example: cbbcb30546ee18e0ec48be65
    • Set in environment variables and hardcoded when building.
    • This is not the actual IV that will be used, this is just just part of the data we use to generate the final IV. More on this bellow.
Key Points
  • Security: AES-256-GCM is highly secure, with the combination of a strong key and IV ensuring that content cannot be easily decrypted by unauthorized parties.
  • Integrity: The GCM mode provides built-in integrity checks, preventing tampering with encrypted content.
  • Performance: GCM mode is designed for efficiency, allowing us to handle large volumes of encrypted content with minimal overhead, crucial for our use in Cloudflare Workers and other performance-sensitive environments.

Files Being Encrypted

The files that are currently encrypted are specific to each pla_publication_manifest_version, so check the publica.la publication manifest section for details.

Key and IV

The Key and IV are crucial details that needs to be kept in sync between micelio, farfalla, and volpe. Failing to do this this will result in volpe not being capable of decrypting files.

The specific key and IV generation mechanism is tied to the pla_publication_manifest_version number.

As of pla_publication_manifest_version as '2.1.0', we share a common key (CONTENT_ENCRYPTION_PASSPHRASE) between projects, but vary how the IV is generated.

IV Generation Process

Signature Version Check:

The IV generation process differs based on the signature.version. This version determines how the IV is computed, ensuring compatibility across different system versions.

pla_publication_manifest_version version 2.2.0 IV Generation

This is the first version we intend to ship to production and release generally to all users.

In version 2.2.0 we do not use the user_id in the IV generation process. This approach allows for encrypted data to be reused across different users, which can be especially useful during the early stages of release when resource consumption and costs are uncertain.

  1. Input Data: The IV is generated using a combination of:
    1. tenant_id: the unique ID of the tenant using the content. This is the store's ID, not the content owner ID.
    2. issue_id: the unique ID of each issue.
    3. iv_seed: the hardcoded CONTENT_ENCRYPTION_IV from environment variables.
  2. Message Construction: A JSON object is created that includes the tenant_id, issue_id, and iv_seed.
  3. Hashing: This JSON object is converted to a UTF-8 encoded string and then hashed using the SHA-256 algorithm.
  4. IV Extraction: The first 12 bytes of the SHA-256 hash are extracted and used as the IV. This truncated hash ensures a consistent 96-bit (12 bytes) IV length, which is required for AES-GCM.

TypeScript Implementation (Version 2.2.0):

// IV generation without user_id
const message = JSON.stringify({
tenant_id: signature.tenant.id,
issue_id: signature.issue.id,
iv_seed: env.CONTENT_ENCRYPTION_IV,
});

const encoder = new TextEncoder();
const data = encoder.encode(message);
const hash = await crypto.subtle.digest('SHA-256', data);

return new Uint8Array(hash).slice(0, 12); // Extract the first 12 bytes, as an array buffer
// Could be converted from the byte array to a hexadecimal string with:
// Array.from(ivArray).map(b => b.toString(16).padStart(2, '0')).join('')

PHP Implementation (Version 2.2.0):

// IV generation without user_id
$ivData = [
'tenant_id' => $tenant->id, // The tenant ID
'issue_id' => $issue->id // The issue ID
];

$message = json_encode([
...$ivData,
...['iv_seed' => config('reader.micelio.encryption_iv')],
]);
$data = utf8_encode($message);
$hash = hash('sha256', $data, true);

return substr($hash, 0, 12); // Extract the first 12 bytes, as a string
pla_publication_manifest_version version 2.1.0 IV Generation
  1. Input Data: The IV is generated using a combination of the tenant ID, issue ID, user ID, and a predefined seed (the hardcoded CONTENT_ENCRYPTION_IV).
  2. Message Construction: A JSON object is created that includes the tenant_id, issue_id, user_id, and the iv_seed.
  3. Hashing: This JSON object is converted to a UTF-8 encoded string and then hashed using the SHA-256 algorithm.
  4. IV Extraction: The first 12 bytes of the SHA-256 hash are extracted and used as the IV. This truncated hash ensures a consistent 96-bit (12 bytes) IV length, which is required for AES-GCM.

Example implementations:

TypeScript Implementation (Version 2.1.0):

// This implementation is from micelio, where the source of data is the encrypted URL signature
const message = JSON.stringify({
tenant_id: signature.tenant.id,
issue_id: signature.issue.id,
user_id: signature.user.id,
iv_seed: env.CONTENT_ENCRYPTION_IV,
});
const encoder = new TextEncoder();
const data = encoder.encode(message);
const hash = await crypto.subtle.digest('SHA-256', data);

return new Uint8Array(hash).slice(0, 12); // Extract the first 12 bytes, as an array buffer
// Could be converted from the byte array to a hexadecimal string with:
// Array.from(ivArray).map(b => b.toString(16).padStart(2, '0')).join('')

PHP Implementation (Version 2.1.0):

// This implementation is from PHP, where the source of data is most probably a regular model
$ivData = [
'tenant_id' => $tenant->id, // The tenant ID
'issue_id' => $issue->id, // The issue ID
'user_id' => $user->id // The user ID
];

$message = json_encode([
...$ivData,
...['iv_seed' => config('reader.micelio.encryption_iv')],
]);
$data = utf8_encode($message);
$hash = hash('sha256', $data, true);

return substr($hash, 0, 12); // Extract the first 12 bytes, as a string

Example

For this data:

$ivData = [
'tenant_id' => 12345,
'issue_id' => 98765,
'user_id' => 54321
];

$iv_seed = 'cbbcb30546ee18e0ec48be65';

We'll get this final IV: 6504304e7d6aaae66ab4cca0

pla_publication_manifest_version version 2.0.0 IV Generation

For version 2.0.0, the IV is directly derived from the CONTENT_ENCRYPTION_IV environment variable without any hashing or additional processing. The IV is converted from its hexadecimal string representation to a binary format.

TypeScript Implementation:

return hexStringToArrayBuffer(env.CONTENT_ENCRYPTION_IV);

*This is an intermediate implementation that never reached production and we don't have a PHP implementation.

micelio

micelio is the next version of a previous project called "felini", more here. Consider "felini" and "micelio" as interchangeable, micelio replaced felini and, while most URLs still use felini's domain, everything is handled by micelio.

In the context of content delivery, micelio's main purposes are:

  • Be Secure
    • Securely transfer content via HTTPS and private encryption scheme
  • Provide Granular Control
    • Using the information provided in the encrypted signatures, micelio has deep knowledge of each file request's context
  • Be Fast and Low Overhead
    • It has to add as little overhead as possible, as each request is in the critical path of a user's reading experience and in many cases, multiple requests are required to start a reading session
  • Be Cheap
    • An average reading session requires dozens of files, each a unique request to micelio. It's crucial that micelio remains cheap to operate at scale

micelio's speed, nearly zero overhead, and cost scalability are mostly taken care of by the fact that it runs on Cloudflare's Workers, a massively distributed hosting platform that acts as a CDN and allows execution of JavaScript code on each POP during each request.

URLs Signatures

In order to provide granular control, micelio relies on information provided by farfalla.

How micelio accesses that information is crucial from a performance perspective, so micelio is designed to be effectively stateless.

When farfalla is generating the list of micelio URLs that fenice or volpe are going to use, it embeds information inside those URLs in what we call an encrypted signature.

All URLs from a session share the same signature, meaning it's session-specific.

We currently support two signature formats:

v1 signature

This version is soon to be deprecated.

This version uses AES-256-CBC encryption, with a key that is shared between farfalla and micelio.

For regular reader sessions, these are the URLs used:

  • /pdf/{volpe_host}/{encrypted v1 signature}/
  • /epub/{volpe_host}/{encrypted v1 signature}/
  • /audio/{volpe_host}/{encrypted v1 signature}/

Where volpe_host can be any one of:

  • farfalla
  • farfalla_pwa
  • fenice
  • fenice_download
  • fenice_download_legacy
  • tartaruga (deprecated)

And the encrypted signature contains a JSON like this:

{
// This one is for PDFs
"public_path": "https://storage-minio-local.zoo.localhost/path/to/processed/pdf/files/",
// These two are for EPUBs and Audiobooks
"file_url": "https://storage-minio-local.zoo.localhost/path/to/processed/epub-or-audiobook/files/",
"is_preview": false,
// Common fields
"tenant": {
"id": 123,
"name": "Example Tenant",
"url": "https://tenant.example.com"
},
"user": {
"id": 456,
"email": "user@example.com"
},
"issue": {
"id": 789,
"name": "Issue Name"
}
}

Example URLs:

  • PDF page image: https://felini.localhost/pdf/MTZjczRvbkhNT3hJMXFBYzBzT2p5Y1pDVTVOc2djOXFIMDRnZ2F6MzV1ekh0bmN6UUpqTVdqUTYyYS9KY2hwT1krQ2M3WkhjVFE3ZW1SeEZRdWFGQVVBNFZWRytIRlBNaFRUT1Nwdmdtd0pPbmgvVXFiNDhIS0gzV0ZGbi9ncDZ6TUxiVTRZWUtLVExxVlcvNFdmL0VubmdubUxqREpFU0l1NkQvWWZJVUpZb29IeEp6YzRtcXZWb3dwQlVKby9qbzFRRjVPVGxMbmREdDNabTl2Q3phYW8waVVkUUl3ZkkvTzd3RGVMZ3EvaE12UlpyVkV5Z0Q0Qy9ZNTlFZmR2VzFHUE93MUZVeFJxdjFPTzlXZ1d1dHZ4U2Q5Vy9NdlErT2FVWk1scjBveWVINnNERUJUc1prQjFNZmRmKzdBNVRHTHhYMCtTejc3bnp4UnVuVzhIcWx0WVBpY0JlWndqV1VLT05yRE02MWJodTQrS0VDUDJEZTZLQU5LS0Rxd1FDaEtndmptaldqZWFsN0w5U2ExaGJ5aWYxQzdjRjFUcnRQSnJmY0ZMS0Jva3VwRkt0c1JCK2ZudXVaMy85MnZEVTAvUXNweVFnSFFISFVtTmFOMUVpamNhMVU4cktrVjBnRXdGUXZiY1g4Ykt6UGpKZGtydU9adDEwZ1lIeWVLYUpOMnRrQ25XWHFqRkZ3cEJ4d2FmdVJZZjBKR201bkszZWhSaGREQ3kycXlaaVdDazRPdzlkWkU4L0I2cjBBVnRT/large/2_v1_2083_2947_dfdfdf.jpg
  • Same PDF page image but from another reading session: https://felini.localhost/pdf/MTZjczRvbkhNT3hJMXFBYzBzT2p5Y1pDVTVOc2djOXFIMDRnZ2F6MzV1ekh0bmN6UUpqTVdqUTYyYS9KY2hwT1krQ2M3WkhjVFE3ZW1SeEZRdWFGQVVBNFZWRytIRlBNaFRUT1Nwdmdtd0tVYjNEa0NEbGUxYkdISU1CS3lZMk9rQ1ZpYUUvWWFMKzhRU3pManJieE1ZN2VkcFVMYXV4ZzJnczVSMEphSFltSENMemVvVUlWcjBrNVkrcHE0dVpPRU9CVHhzU1lTd1MzSi9qS1BBdEZOZWQ5UFNyZDdHUmEwUndKSFpvWEZPaHJUM1o3WjVORWlPQmpTdW5CaVBNcW8wNDNpTDZJRTlwNDZWVFp6ZTROZnVXd0xJU2JhMWhKN1BTRjE1TFh4bUlLbnVWb1BUdVBhVDZmMlJ1SmZRM2g0czRMbFM4aU1WUk0zbTJwb3kxK1pWZG50YXNnSXk0bXo2V0ZUeGpPczRDOCtITDNQT0RPbmhpcGZ6QXVIcURtaEJLZjdPU29oY1B0VUJoWXN2eGd5WnhCbmFsTkJuKy9pbklWS0Q4c0hBaTh0TUNZR2ZkNmROYjdBVzhNMFZuVTVGZjNReTRnUHk4TWRXZzZDaEJzOHUvdWpRRXFyQkErS1hYY0M1dkF5ZkZDNU9RMmdJMFY5SFQvcGdrT1BCUW0rTmJNTW1zUDRoZm5RVmVNUzhkVTh3a05kZ2Vjd3ppQ1FJYUJUOGNhcEVDcUxVeFRTK1pmQ2xRNm94b3NUQ3QvUTZKQ3ZlMlE5K3VWNlJmSloxM1g0dz09/large/2_v1_2083_2947_dfdfdf.jpg
  • PDF text layer: https://felini.localhost/pdf/MTZjczRvbkhNT3hJMXFBYzBzT2p5Y1pDVTVOc2djOXFIMDRnZ2F6MzV1ekh0bmN6UUpqTVdqUTYyYS9KY2hwT1krQ2M3WkhjVFE3ZW1SeEZRdWFGQVVBNFZWRytIRlBNaFRUT1Nwdmdtd0tmMjZXNkR1MkwycDRyWmNLbjVmMm9ocGFUQVRtWHFoZEUxd3haNjhUeWJwZTB2YlZTQW5oSjlEV040VkNiL0N4UmlLQ3ZDdXFnbCt3TlpRdEhXMEh0Tkp4VHJYWThnS3Q2NXJmUWpoQTJsN2hpS2NSdk5GRGV2ZVQvQ0lmMU9uY0I1VFJ3N0tkSUg2Sm9ZQUxDOExXK0hCeDErL1dVVHpXQkRrR1VveUZqRzlPQXAyQkdha2NoWHZyM3c4YUpmQlBDTDY4Q0NEakNkMC8rTHI2RUVHYXV1L1IyVVZybzB2UjhEYktrdkxoTkZFazFSa3ZGaDRxTDBrNnFkUno3bHFFTCtmN1ErM0ZBOGROUXpPbXlneUVkRWNMYzEwOUt4YTg0RXhDTnIwOWkvQ0dyK21MMnBoYTkxYlBpeDI1SngraFBROHV5dnpUeENEdmRmZGRiUTAvSTRaQTIxWEtpZkpydWp6VjJjQUpuNXVZV1Q1Si9VY3gxS2ovQXIxTTZycDBIRU8zcFErNUxyc0lDQTdTNGNCdWtiY3NKVG9Nc0dQMDdqYm1NWWhyTW5ZbVhmWEcya2tMalFuZE5XUU1tYVYrUUl1cmtGN3lrRjJaU092d0JmbkVicHJNQVZvVXlOZmtTN3NHU1h6Vk9xZz09/text-layer/1.html
  • EPUB root: https://felini.localhost/epub/ZFlDSlpIQ2xnMisvLzl2VEpOS3ZISnhpNHRZVXprMGxRWm85amhQZ0xmejZJN1dwckg1WTJzTkRRMVdJdHVaTFUzMUsyT0hnUnorc0x1SlMyalFHcjBVUjgyUGhXdGNEK01wMVptRVM0SmlPUkNUQVF4MHFKdUx6a1Nnd1BsV3JrZUhjWnB4K3MwMXhGSGRCUGRJdm1Nc3g2cCtVRDNwZm9oZmxwMTIxTFpFQ2hNUUNobTJCbUV2bVBEQUhaL1h1d1N3L0NhY2FraG9jMlFGU3p0dXowVmExclp6YkFSZDRZQkZCNmF3anRqTG1uaHcvbVJ6ZUQ3eG1pciszdFFqZlRIMDZOUlZJVExZZ0UwRzB3QlFLd3MwUFlOUzE1WklTU3ByQXlXT0I2Vk8xbVNucE5uQi9HTDRSRmYvYStGU2dUdXY0YzR5KzdtZ0FLbGpNeHBCZklvMjBwb0wyN2t1S0pjcy9Kd3NJMHpmREluMG9KY1BNeU5pRy9RQzZOVDlZVG10cW1HWjYrc3JlSW54UHRTNGticGZBYkNqaWdHM2ZtMFJ1eDN5NkVEL1RBamFpZzJFVzZaV2RZeGNRbWljZHlJeGJlNW93S1VxUUdiWmswQ2xFUW5pUVh5eXdpMVJBNnVYRms5azMyb3l5TmNuaDEydkEzZG1xOFVtSlo4OVZ6Q3R3cUtzcDhHd3U4WVdvRlZqRm5jMHdNMHh5cWZoUXkxK3FGRE9ZWDl1c2Y4S0V2QytLQitFbHlSRFc5L09BNko2OFhOV0JxbUhpVkFoeHBZZ1AxRGFVbitieUVRVHNkcVpDOFFDbDBHdUJMRlJuU1BWaEtYNmV2OTF2TjhkOQ==/
  • EPUB root with signed bucket URL: https://felini.localhost/epub/ZFlDSlpIQ2xnMisvLzl2VEpOS3ZIS3lhUzB6RjJGa0kvRXlLSHo1Z0RhdFJ3RzhFekwwc0JpeHJDNnM1NEJtTTNwYllHcmVlQ2F5RFRidjJCcTd6K293cmN0VU4rRWZsMmNZVHd1TktNeFRBM0FYVE83Rkk3ME1MZ0NONHk1NWtyZVowcnNNOU1JY2M2Z09pZGNZWjJRK1psQ21GelhVMWRLbnJkczhTR25ha1pHQ254TVArS1BXZGJpUGo1SC9YQ1Y3UXhjU3A0Rmd6bjlXeU1sWjZKV1lKdWxDZmVac09ORUVMSHRLTzFmVU1Fa0c5QXpleVBkS1Y2SWQyc29rd05sTFAzekgyQjZTNXM1VU9Ea0lBNW9zeDg1YUhGVVlXQXpjZlFkK2t1ZXQzRVFESldLWmQ0RVR4MkhHQ3ZKQUZqSThpRlN5dU9qbUpLOGV2Z3E4N3BpNWt1czdGWjBOdHFlaThiVG1nNWlCMStsNHdLb0NvMGhTSWhNNWlYVlpWVnJGUkJMOFZpQnRqMjFZN2JoVElQa1Y1ZDh5ZHJvUG1RYnhUN2VkMnpzY1IybDVJWmtCUmZYMGpvWlgxTzNURkxIN2p3bThOYk9yNVRHMmRlMFlUa1VVVVJXcHpneXpzeUQvTVhqcWRhY1kzNWFuUGRiV2tUMGpNSW5KNzJmT1VNOVFOR0tJd0ZVV3ZVeVNsaXlyNkZLRXhjOFcwMnFGTUVZcjFXSzVoOGJwNWFvdklNeVNDb29zNjRHMTAwbCtDZlR1REpMUk5QZWVvVzh4QTFxYlY2RVd4OEIzOUZwMVNmaDZwN3ZlUGpJL2xkeHFkODJuWlo2UEpLT2xrQVR4Z3JwVDVuS2lxZGQ2SzVpbDc2bG1nOGdJVlJSM2g2TnZXOUFxdC9xVmZueDBtOFYvWFJ4b3hmWTNCdE9Nc2ZKOGszbjlIeXdZMUVTK05saTRROTB1NmRiZXZRTGd5ZlVFeUZONko1aGdmRmJ6dEg5WnZrM1F2ejlwdWMvbFZFWnpQRjgyVmV0T1MyYVMybDJ2L0s2T2dQM3hGYTNrY2dsVlFOUzhBYUlhTnFhdkh3SC9iUXgxRFd4aWRFYUFlNGpsY3owMTBNemI1dHhSMzg3ZHBQT3FnWml6R1FXYW1MdGpsU3BBS2JXKytSYnQ1R1ZaeTF6QVJEUnEybTdaZ0hOUWJsTTRUY1IxTEdDTVRMN1kwQTJuWnNnYnRxdnNZNklFTW1QQ2ZDeURiNDhOVy9vM2lTT1ZCa0tOOEsycEw1ZXN5MTcxUjQ0Qml3Sk1EUjJnS0UzVVZEN0xmNUUvaE4vekc4SlBucWJjQVpPejZEb2RVa0RDenZrSzQvQ1RhM2doUjZiRXFiR0g2L2hUVVJHOUI4RHR1eTVibGovQkJ5SXRBZnNPblpGQ0JQc24xcGxDbUpaTEsvaHFLenczQ2pSOTNJa0J2c2lqb1dWQW9zUTRVcnNiOUV5eWxCSG5wUlhvK21LMVFIYVBsUlVRZ2RLVHRjOFR1dVpwSWpLbDRONzV2TWtxWEZvVStDeGZ5QkE2aVBML0hDTVFPQm10UXRFODcvSVprZEg2VklmYjZZcVFDMmt5K3JLSzhKVW5DUXJaajV1UHhVMGltOWpSaGhOckc5OHNwd2xFVWRqM3Zjc096d2FScnZRNW14TG41UXRCcEhPYVhJQVJRMUI5d29QN0w=
    • It's much longer than the previous one because the URL to the file also contains AWS signature
  • Audiobook: https://felini.localhost/audio/ZFlDSlpIQ2xnMisvLzl2VEpOS3ZISnhpNHRZVXprMGxRWm85amhQZ0xmejZJN1dwckg1WTJzTkRRMVdJdHVaTFUzMUsyT0hnUnorc0x1SlMyalFHcjgxUndQaEtMUVVqUXgvb01vYUJqSGNTNlpmSGw5RVdJNHhnZCthZDFEejV2MzkwQUdaTHV0RU1kZjExVEdoengzdHJ1ZHI4eHpEVUZtVXhNdHpKYzJpM3BYcEU4TE9mVlFwT0FGN1pSeU9ITmVQQWFpUWNlY01PL0VwS1FXVXIrdzBZQU5jbXd3Zi9OcTFlc0FOcFVaWEtja2pQL3ZObnErQTFDcXQyOFJFQzRJR1hzTmsrZkNsVGx4VFQrc2ZQWVB2a1RZWFhMUU5VYmFrQVg3a3lEK2FsRjY5QnFlMmlBK3FWMFIwbUFrdHBoTzFHQlVVZFdSZVF5MHdyWmh6M2h6L0ZPWUZtRC9TMzVlQ0FVU0ozVXRJMjNnVnZDeUFJQ3VLWExJZk9GMW05b0h0Y1VmdFNVTTNlbHFzVTVwUXpYdXUzTlhpclUxYWRzeFBxMURrbnl6N013ei9icENKbnBIY3lHS1RXUEg5ZHltR2N5cDdYcVVJZUxXQlRCSlFxVmhMZDdJSXphaTJwUW9NY2xsMGlyWWJhUU4xVmFzdmhYNHV6Q2FpUVVIRTNoM2RkejlRUXBwMGhMM2tUK3ZOdHNBQ0NjUzQ4VUZmVFZLb3BOZHpaUzc0a3ZqK1lqSFI1UFZlQ0J3L3Z3MUs5TnVkbDMrZTdka00vcExEbjJ3TEFCeDM5RWRIallNTEsrS0RtL2lvNkhXd0t4QkxEdzcvaWVJTHR4b2ozOFluejlXQ3c1TmVrUVh1VElRaTBnM3Bmams1dEVGWFpNZDlTWHBhVXRwdzRUMm9USU15djM3NDJBelp5c01PM3pXREc=.mp3
    • It ends with ".mp3"

For original content download URLs:

  • /download/{encrypted v1 signature}/

And the encrypted signature contains a JSON like this:

{
"file_url": "https://storage-minio-local.zoo.localhost/path/to/original/file.pdf",
"file_type": "pdf",
}

Example URL https://felini.localhost/download/ZFlDSlpIQ2xnMisvLzl2VEpOS3ZIS3lhUzB6RjJGa0kvRXlLSHo1Z0RhdFJ3RzhFekwwc0JpeHJDNnM1NEJtTTNwYllHcmVlQ2F5RFRidjJCcTd6K29JcHNFT09tcXQ2eit5bC93dFUvNGVNVGUrcytBQVp4eHpXdEI5aDZSVytnMWlEOHRRbE9Rbi9nSnlPZGRUT0hHTDhDMXJUZk11dnN5MWpBbzQxZ2ZHanFTbDd0K2JPMzBad000M2dVU0FESDd6cGVkREZMSTZ6MnY2Y0l3Qi9MdjYrUGNyOHZnR3ByYklpRE1OVEJpZUttak43RmIyS2hRUzJDbkFzR1h4QjFwM0VhaG15cUdVc2dibkpqNHh5c2Nmai9XZy8zbUdCd2F4V1AyRjVUSjRxU1ZIUm1NK0g2d1AydllSZmtXdGZrUU4vS21Mb3BHU2VUR0ZaaktQNVp2eTdNUGpEZmRFVWEzelBOejNDZWFDY3p4OS9oK25zZkZtSS9TamhBZlNUNEw0M3p1SXdKZ1lXL1VEQUFIL3hUbnBpVW5IZGN3ZVltYm1aY0ZaSEdjYXduSWp3NWRyeG9uVG94YnYxVHcwN2YwRTRMZ2tmUVEvYlRGMlZYZ1VGbExYTjg3Tlp5c3hKbHZWSzhSbjY0eTFiYytucTREZXJSd2tGeEhlZkhkWXEveVFOdzZiUDU1dkg2dUpnUms4emRseTZFZWY1d0xQVzNwc0ppMVliYXg3YWc0ekNXRzhiLzJvZGJ3UDZxL29YS1c1ZzR4bEpVZGZmM2Y3V0dvRmhYMUlkanJyMnN4OWZ0TmhvRFVnTThPRmFYaXpEYVhpdVNEZ2lPYjg3b1ZIT0xXeVNnK3BNWjVicldNSGhTeDRUWGVvVSsxVmx1V25EMXlFTVhDNDBJM2t1L1MyNytyb2p6UVdTWVdUUWg1aGl4bThxSWxPalRMUnFEbzNkd2gwS0VSb1ZHK29qazZWbGlYTzFKSFMzdmNCZXFTaTcxNFZGL1ZwbktBTXczSkhqUlYvUUI5aUdMaW94MERzVHJYRWJGUjRidHp2SEhVY2Q5cFpFeC9sRFd2Q2VXWmozMDdGQzVDWmxvNnBnanc0OWhpS014TVl4OVRocGlUVjh4d3plZEE9PQ==.pdf

v2 signature

This version uses AES-256-GCM encryption, with a key that is shared between farfalla and micelio.

Using GCM has the advantage that the resulting encrypted string is slightly shorter, which is very useful as v2 signatures contain much more information.

For regular reader sessions, these are the URLs used:

  • /v2/pdf/{volpe_host}/{encrypted v2 signature}/ wip
  • /v2/pdf-download/{volpe_host}/{encrypted v2 signature}/ wip
  • /v2/epub/{volpe_host}/{encrypted v2 signature}/ wip
  • /v2/epub-download/{volpe_host}/{encrypted v2 signature}/ wip
  • /v2/audio/{volpe_host}/{encrypted v2 signature}/ wip
  • /v2/audio-download/{volpe_host}/{encrypted v2 signature}/ wip

Where volpe_host can be any one of:

  • farfalla
  • fenice
  • fenice_download

And the encrypted signature contains a JSON like this:

{
"signature": {
// This is the `pla_publication_manifest_version`
"version": "2.1.0"
},
"volpe": {
"host": "farfalla"
},
"tenant": {
"id": 123
},
"issue": {
"id": 456,
"file_type": "pdf",
"file_path": "https://storage-minio-local.zoo.localhost/path/to/processed/issue/files/"
},
"session": {
"is_preview": true,
"reason_to_access": "preview_issue",
"reason_to_access_value": null,
"access_expires_at": null
},
"user": {
"id": 789,
"email": "user@example.com",
"external_id": "ext-12345"
}
}

Example URLs:

  • EPUB root
    • https://felini.localhost/v2/epub/farfalla/MC8KhoTPvpG34TP2=5ojC2hkNd9qzlavnH8N1VoGa21Z0b5C_46bQYvPnLdRy2ERzX2S-ep9NSSwOzmpR6yiDMgfCy3_oP_OVNYeGw95g6NGkivVi8GCmsh0Hhdt8wcCj1GZ1VnZpyn3MmJMZWjuCXTq84JNBs7GbazcqHWQgD2ICmn0EHDzKOzXONZ4sdXHLT3uPvsovxs7B3049n2_metEThXOtx5UHPSTI3VR3EbFlSPYtfsffY-CnnBKGUp1tn4q-MgHGWxRM3Vun_mTfQI87F33dcAiG55AWxaD_aJ_PSWbn2Cr8kVelWGIdweZY8K7MX2XYVUTwt_XH68W5W0cVDML2pPM-PMOOxu0If_HEaMSKcwrnDTulcXRRT_TgARBYM2QouGwNedXj0KpYBfoq4UpWxW62YZ-4MeIcbsE0I3SZzsJyOSs-w-OQlrvPSB5-itEvJE0O1dt_GTxM7Jvseu1ISpFsVnuuGXGDiqk79ivOtODyX0Pz44m0voUxQjjPYkXnJRKcJbVEE9BgdpFARJnAjL4FYNdOnS8MvqoNJkeQJp60lqd0PTCQkg=tNtGzltH52n3hKeVVMeewQ/OPS/main2.xml
  • Audiobook
    • https://felini.localhost/v2/audio/farfalla/ig8_WgDfH22fFz0y=U1oTxf5kWN-etmxfCK9H6R6lmTBBDVFegzTf6-WUHFl8FVQeWxoJZMOZBCGu4AndLud5hXBWRAoOBdhEsENSYeR7Ss5mJT_FTlUUV2kBQ-F3fcunfbBG3dN77GERnV5M3r2_ug_wG13mQi7yUYHwC4eLt3Iibf5_JtAjTJallfmY2tIb1mxHA6KcjL0IdfJpcfgeW0_45t4g3-rwovJu8ZjGmdyo7arphF0-EaU5xLueYFQlo0UJRO34icrMnYmfvwwE7hwRk9XKxzQPsF9N5d3I2trCFYIkUIs48QdgELkmGKb7u8hIj5_v0mRrBNMCukg0DfIuW7pdkzxsnxnnmaTxHHMJzMh_DSxZaRze9SJfjZFIiLWCt5eQyibjoq_l7WFbuOT5RfWRURiCkdfCadpXUoT8GL8QD19j_aHuvsoN9vNTqQwtPUtt5bTgtwwxG4Qfd_gghZIhfoKovhUg-N-QaQ5StaPJjllHXfExCS-kjshZzF7kWUuNeam0IMqBPSuYJeIWaGCCFuX9W5ZktBjqkEFz-nvWmsvk-Y6B0DK1LBQBuOMNDLo1VAYM9y9RZB4=GtjNDL2MF8h5kIQGfPqWoQ

Monitoring

  1. farfalla:
    • For when volpe consumes online in farfalla
    • It will work with both v1 and v2, meaning encrypted and unencrypted content
    • Monitoring here.
  2. farfalla_pwa:
    • For when the PWA of farfalla downloads so that volpe can consume offline
    • Only with v1, unencrypted content
    • Monitoring here.
  3. fenice:
    • For when volpe consumes online in fenice
    • It will work with both v1 and v2, meaning encrypted and unencrypted content
    • Monitoring here.
  4. fenice_download:
    • For when fenice downloads encrypted content for volpe to consume offline
    • This URL will always go with v2 because encrypted is always v2
    • Monitoring here.
  5. fenice_download_legacy:

farfalla

farfalla is in charge of generating the publication download responses and building the required tarballs.

When farfalla receives a request to download a publication, it first checks if the downloadable tarballs are already generated. If not, it triggers the generation. After triggering the generation, it will return the response that contains the list of tarballs to download.

A single publication can be comprised of many files:

  • EPUB
    • As it's essentially a webpage, it may contain many HTML and IMG files, potentially hundreds.
  • PDF
    • Each page of a PDF is converted into:
      • A high-resolution JPEG, called "large"
      • A low-resolution JPEG, called "thumb"
      • A text layer, that enables text selection.
        • This layer is almost always generated.
      • An annotations layer, that contains all the internal and external links of the page.
        • This layer is only generated if the page has links, be it to other pages (e.g., an index) or to websites.
    • A PDF of 100 pages may result in as many as 200 to 400 files, but a PDF of 1000 pages could go as high as 4000 files.
  • Audiobook
    • Each chapter is its own MP3 file, and there are audiobooks with many hundreds of chapters.

Also, the total amount of data that compose a single publication may vary from less than 100KB to a few GB. The most extreme cases are really long audiobooks and PDFs with many hundreds or thousands of pages.

So, overall, we deal with potentially hundreds of files and up to a few GB of total data.

At the same time, we need to build the tarball with the encrypted files as quickly as possible. Users are not expecting to wait more than a few seconds for a download to start, at most.

These tarballs are stored in Cloudflare R2 and automatically expire after 48 hours of being stored. This is another reason why farfalla first checks if the tarballs are already generated. More information here.

Tarballs Generation

In order to do this as fast as possible, we split the publication into smaller chunks, and each has its own tarball.

Currently:

  • EPUB
    • One single tarball.
  • PDF
    • A tarball every 50 pages.
  • Audiobook
    • A tarball every 10 chapters.

The tarball generation requires the following steps:

  1. Check the publication type.
  2. Decide the total amount of tarballs and which files will be contained in each.
  3. Trigger the parallel generation of the tarballs.
  4. Each parallel generation will do the following steps, also in parallel:
    1. Download each file.
    2. Check if it needs to be encrypted.
      1. If it needs to be encrypted, encrypt it and mark it as ready.
      2. If it does not need to be encrypted, just mark it as ready.
    3. After all files are ready, assemble the tarball.
    4. Upload the tarball to Cloudflare R2, where they last up to 48 hours.

The tarballs themselfs must follow a specific naming convention of:

{UUIDv4}-{timestamp}.tar

* This ☝️ specific naming convention needs to be confirmed when implementing it, we need to define things such as:

  • Do we want an easy reference to the tenant or issue in the name?
  • Are we going have a log of the tars generated for each tenant+issue combination?
    • A temporary log, should we cleaned up after 7 days. We use a few more days than what the tarballs last just in case we need the information for debugging.
  • Also, how do the tar files expire?
    • It could happen that if a publication is split into 10 tar files, only 3 of them expire. We need to regenerate them beforehand, ensuring they expire atomically.
    • We could also make it so that we have a tar file that contains a list of other tars inside. We can check just one. Validate if it has expired, or if it's ready and needs to be regenerated. This should be done during consumption, not by a cron job. It's not useful to regenerate something that won't be used.

fenice

This is what fenice does:

  1. Downloads the files listed by farfalla.
  2. Persists the files securely in a directory provided by the OS that is not accessible by the user.
    • Those files downloaded by fenice are simple tarballs that contain the encrypted files of the publication.
    • fenice will download those tarballs in parallel and use a retry mechanism.
      • That retry mechanism is important because it may be the case that one of the tarballs is not ready when fenice first tries to download it.
  3. Unpacks the tarballs in the same directory, preparing them to be served to volpe.
  4. When opening the publication, either online or offline, fenice does these two things:
    • Spins up a local HTTP server and serves the individual files to volpe.
      • From volpe's point of view, it's the exact same scenario as when the files are served directly from micelio.
      • The files are individually served. So, for example, volpe never receives a packaged EPUB but local URLs to the EPUB's inner files.
    • Constructs a publication payload and sends it to volpe in the offline_issue property with all the information required, including the URLs using the local HTTP server.

Unpacking the tarballs

This step is considered part of the download, and it is required for the download to finish successfully.

fenice might receive a list of tarballs to download, in which case it will download and unpack all of them, merging the files in a target directory.

For example:

  • tarball 1 contains "large/page1.jpg", "large/page2.jpg", "thumb/page2.jpg".
  • tarball 2 contains "thumb/page1.jpg", "thumb/page3.jpg", "large/page3.jpg".

The resulting directory structure will be:

  • download-ID/
    • large/
      • page1.jpg
      • page2.jpg
      • page3.jpg
    • thumb/
      • page1.jpg
      • page2.jpg
      • page3.jpg

In other words, the files might be mixed, and fenice has to keep the intended final directory structure.

It's worth mentioning that fenice won't even load if the device is rooted, so downloading content on a rooted device would't be possible.

Loading volpe

volpe is implemented as a client-side web app, it loads from a single index.html and a number of JavaScript and CSS files.

In fenice, volpe is loaded inside a WebView and from a local HTTP server managed by fenice.

There are two key security aspects of the local HTTP server:

HTTPS

To ensure the security of the content and user's privacy, it's crucial that volpe uses HTTPS for all requests. That includes both:

  • Requests loading volpe's own files.
  • Requests made by volpe to load publication's files offline.
    • These are served by the same local HTTP server that serves volpe itself.

X-FeniceBase-Version

fenice injects a header called X-FeniceBase-Version into the WebView.

The value of this header is the base fenice version used to build that particular app. The app version might be 103.0.5, but the base fenice version used to build it could be 1.22.1.

Prompting users to upgrade

This is an area of fenice that is not yet designed nor implemented.

In an effort to minimize the usage of legacy pla_publication_manifest_version versions, we need end users to keep fenice updated. This has proven to be very difficult, and it's not uncommon for apps to take months to be updated.

In order to minimize those scenarios, drastically speed up our iteration velocity on the security aspect, and gain some control over the process, we need fenice to prompt the users to upgrade to the latest version.

We might also consider stopping support for some fenice versions, and in those cases, we need fenice to emphatically explain the situation to the user and suggest updating.

This will probably mean we need a new endpoint in farfalla that checks fenice's version and decides if:

  • It needs to be updated.
  • It can continue to be used until updated or not.
  • fenice needs to forcefully delete all downloaded content immediately.

Other fenice security measures

  1. fenice won't load if the device is rooted

X

Graph View