Skip to main content

Content Import Overview

The Content Import system is a browser-based bulk import dashboard that allows tenants to upload spreadsheets (with metadata) and content files (PDF, EPUB, audio) to create products in Farfalla. Users upload files directly to S3 via presigned URLs, provide a metadata spreadsheet that maps each row to a content file, and the system validates, batches, and sends everything to Farfalla's /api/v3/content/bulk endpoint.

Architecture

Data Model

Tables

TablePurpose
importsTop-level import session
import_filesUploaded content files stored in S3 (PDF, EPUB, MP3, etc.)
import_recordsOne row per spreadsheet row sent to the API (processing log)
import_productsOne row per unique product, identified by external_id (analogous to OnixIssue)

Relationships

  • ContentIntake (type=import) has many Import
  • Import has many ImportFile and ImportRecord
  • ImportRecord belongs to both Import and ContentIntake
  • ImportProduct belongs to ContentIntake, Team, and tracks the latest Import that touched it
  • ImportProduct is created when ProcessImportBatch successfully ingests a record; updated by ProcessFileUpdateBatch on file updates

Status Enums

ImportStatus

Used for both imports.status and import_records.status.

ValueLabelDescription
draftDraftImport created, files being uploaded
discoveredDiscoveredRecords parsed, awaiting processing
on-queueProcessingBatch dispatched to queue
doneDoneSuccessfully processed
failedFailedProcessing failed

ImportFileStatus

ValueLabelDescription
uploadingUploadingBrowser upload in progress via S3
completedCompletedFile successfully stored in S3
failedFailedUpload failed

ImportRecordIgnore

Applied to failed records to categorize the failure.

ValueLabelDescription
client-errorClient Error4xx from Farfalla or per-record error
server-errorServer Error5xx from Farfalla after retries
general-errorGeneral ErrorUnexpected failure

SpreadsheetFormat

ValueLabel
publicalaPublica.la
vitalsourceVitalSource

Key Services

  • SpreadsheetParser (app/Services/Import/SpreadsheetParser.php) — Streaming parse of XLSX/XLS/CSV via Spatie SimpleExcel. Auto-detects format by matching headers against known formats with a 60% match threshold.
  • ImportValidator (app/Services/Import/ImportValidator.php) — Validates each row for required fields (name/title, file type) and matches file url to uploaded ImportFile records by case-insensitive basename. External URLs pass through without matching.
  • ImportS3Service (app/Services/Import/ImportS3Service.php) — Generates presigned PUT URLs (60 min TTL) for browser uploads and GET URLs (24 hours) for API consumption. S3 key pattern: imports/{contentIntakeId}/{importId}/{filename}.
  • ImportContentData (app/DataTransferObject/ImportContentData.php) — Immutable DTO that maps spreadsheet rows to the Farfalla bulk API payload. Factory methods fromPublicalaRow() and fromVitalSourceRow() handle format-specific field mapping.
  • ProcessImportBatch (app/Jobs/ProcessImportBatch.php) — Queue job that sends batches of records to Farfalla, handles rate limiting/errors/retries, and self-chains until all records are processed.
X

Graph View