Skip to main content

Processing Flow

End-to-end flow of a content import, from file upload to completion.

1. File Upload

The browser uploads content files directly to S3 using presigned PUT URLs.

  1. User selects files (PDF, EPUB, MP3, JPG, JPEG, PNG) in the drag-and-drop zone.
  2. For each file, Livewire generates a presigned PUT URL via ImportS3Service and creates an ImportFile record with status uploading.
  3. The browser uploads files in parallel (concurrency controlled by import.upload_concurrency, default 5).
  4. On completion, the browser calls fileUploadCompleted() which sets the ImportFile status to completed.

The Import record is created in draft status when the first presigned URL is requested.

Limits: Content files max 500 MB each. Allowed extensions: .pdf, .epub, .mp3, .jpg, .jpeg, .png.

2. Metadata Upload

The metadata upload has two entry paths:

Drag & drop (auto-detection):

  1. User drags a spreadsheet file (XLSX, XLS, or CSV, max 10 MB) onto the drop zone.
  2. SpreadsheetParser streams the file using Spatie SimpleExcel (memory stays under ~3 MB regardless of file size).
  3. Headers are normalized to lowercase and compared against known formats using a 60% header match threshold.
  4. If a format is detected, headers are validated against it and a 5-row preview is shown.
  5. If no format reaches the threshold, a modal prompts the user to select the format manually. Headers are then re-validated against the selected format.

Browse file (manual selection):

  1. User clicks "Browse file" → a format selection modal appears before the file picker opens.
  2. User selects the format (Publica.la or VitalSource).
  3. The native file picker opens, user selects the spreadsheet file.
  4. The file is parsed with the pre-selected format — auto-detection is skipped.
  5. Headers are validated against the chosen format and a 5-row preview is shown.

3. Validation

ImportValidator checks each parsed row:

CheckRule
NameRequired — checks name (Publica.la) or title (VitalSource)
File typeMust be one of: pdf, epub, audio, physical (after normalization via FileTypeNormalizer)
DuplicateIf the row has an external_id (ISBN or VBK ID) that already exists in import_products for the same content_intake_id and team, it's flagged as already imported
File URLIf an external URL (passes FILTER_VALIDATE_URL), it's accepted as-is. Otherwise, the basename is matched case-insensitively against uploaded ImportFile.original_name. Physical products skip file matching entirely.

Results are split into valid and errors arrays. If there are errors, the user sees the first 50 and can choose "Continue with valid rows" to proceed without the invalid rows.

File Matching Logic

The system uses three strategies to resolve the file url column:

  1. External URL — Value passes FILTER_VALIDATE_URL → used directly as the file URL in the API call.
  2. Local file match — The basename is extracted via PHP's basename() (everything after the last /), then compared case-insensitively (strcasecmp) against ImportFile.original_name. This means the value can contain directory paths — only the filename portion is used for matching. A presigned GET URL is generated at processing time.
  3. Physical product — File type is physical → file matching is skipped entirely.

Examples:

Spreadsheet ValueStrategyResult
https://cdn.example.com/book.pdfExternal URLPassed through to API
my-book.pdfLocal matchMatched to uploaded my-book.pdf in S3
My-Book.PDFLocal matchMatched (case-insensitive)
uploads/tenant/catalog/978-3-16-148410.epubLocal matchBasename 978-3-16-148410.epub matched
(empty, file type is physical)Physical skipNo file URL needed

4. Record Creation

  1. Valid rows are bulk-inserted into import_records in chunks of 500, each with status discovered.
  2. The Import status changes to on-queue.
  3. A ProcessImportBatch job is dispatched to the import-processing queue.

5. Batch Processing

ProcessImportBatch processes records in configurable batches (default 50 records per job).

Job Configuration

  • Queue: import-processing
  • Tries: 3
  • Timeout: 120 seconds
  • Middleware: WithoutOverlapping (lock key = import ID, expires after 5 min, non-blocking)

Processing Steps

  1. Fetch next batch of discovered records (limit: batch_size).
  2. Mark batch as on-queue with sent_at = now().
  3. Build ImportContentData DTOs for each record, resolving file URLs (external passthrough or presigned GET URL from S3).
  4. POST to {tenant_domain}/api/v3/content/bulk with { items: [...] }.
  5. Handle the response (see below).
  6. If more discovered records remain, dispatch a new ProcessImportBatch (self-chaining).
  7. When no records remain, evaluate final import status.

Response Handling

ResponseAction
429Reset batch to discovered, re-dispatch with 30s delay
5xxCache-based retry tracking (max retries, default 3). If retries left: reset & re-dispatch with 30s delay. If exhausted: mark batch as failed with server-error
4xxMark batch as failed with client-error
200Process per-record results: success=truedone + ingested_at + create ImportProduct; success=falsefailed + client-error + error message

Final Status Evaluation

When no discovered or on-queue records remain:

  • All failed → Import status = failed
  • Any succeeded → Import status = done

Failed Job Recovery

If the job itself fails (exception), on-queue records are reset to discovered so they can be retried. The RescueStaleImports command runs every 3 minutes to re-dispatch jobs for imports stuck in on-queue for more than 5 minutes.

6. Progress Tracking

  • The UI polls every 3 seconds (wire:poll.3s="getProgress") during processing.
  • Progress shows processed count / total count with a percentage bar.
  • When 100% is reached, a brief delay displays before transitioning to the completed view.
  • The completed view shows final stats: total done, total failed.
  • Import history (ImportHistory component) polls every 5 seconds but only while there are in-progress imports.

Error Handling Summary

Error SourceBehaviorUser Visibility
Validation errorRow excluded from importShown before import starts
429 rate limitTransparent retry with 30s delayNone (progress stalls)
5xx server errorUp to 3 retries, then server-errorRecord marked failed
4xx client errorImmediate client-error on batchRecord marked failed
Per-record failureIndividual record marked failed with API messageVisible in report/history
Job crashRecords reset to discovered, rescued by cronTemporary stall
X

Graph View