Processing Flow
End-to-end flow of a content import, from file upload to completion.
1. File Upload
The browser uploads content files directly to S3 using presigned PUT URLs.
- User selects files (PDF, EPUB, MP3, JPG, JPEG, PNG) in the drag-and-drop zone.
- For each file, Livewire generates a presigned PUT URL via
ImportS3Serviceand creates anImportFilerecord with statusuploading. - The browser uploads files in parallel (concurrency controlled by
import.upload_concurrency, default 5). - On completion, the browser calls
fileUploadCompleted()which sets theImportFilestatus tocompleted.
The Import record is created in draft status when the first presigned URL is requested.
Limits: Content files max 500 MB each. Allowed extensions: .pdf, .epub, .mp3, .jpg, .jpeg, .png.
2. Metadata Upload
The metadata upload has two entry paths:
Drag & drop (auto-detection):
- User drags a spreadsheet file (XLSX, XLS, or CSV, max 10 MB) onto the drop zone.
SpreadsheetParserstreams the file using Spatie SimpleExcel (memory stays under ~3 MB regardless of file size).- Headers are normalized to lowercase and compared against known formats using a 60% header match threshold.
- If a format is detected, headers are validated against it and a 5-row preview is shown.
- If no format reaches the threshold, a modal prompts the user to select the format manually. Headers are then re-validated against the selected format.
Browse file (manual selection):
- User clicks "Browse file" → a format selection modal appears before the file picker opens.
- User selects the format (Publica.la or VitalSource).
- The native file picker opens, user selects the spreadsheet file.
- The file is parsed with the pre-selected format — auto-detection is skipped.
- Headers are validated against the chosen format and a 5-row preview is shown.
3. Validation
ImportValidator checks each parsed row:
| Check | Rule |
|---|---|
| Name | Required — checks name (Publica.la) or title (VitalSource) |
| File type | Must be one of: pdf, epub, audio, physical (after normalization via FileTypeNormalizer) |
| Duplicate | If the row has an external_id (ISBN or VBK ID) that already exists in import_products for the same content_intake_id and team, it's flagged as already imported |
| File URL | If an external URL (passes FILTER_VALIDATE_URL), it's accepted as-is. Otherwise, the basename is matched case-insensitively against uploaded ImportFile.original_name. Physical products skip file matching entirely. |
Results are split into valid and errors arrays. If there are errors, the user sees the first 50 and can choose "Continue with valid rows" to proceed without the invalid rows.
File Matching Logic
The system uses three strategies to resolve the file url column:
- External URL — Value passes
FILTER_VALIDATE_URL→ used directly as the file URL in the API call. - Local file match — The basename is extracted via PHP's
basename()(everything after the last/), then compared case-insensitively (strcasecmp) againstImportFile.original_name. This means the value can contain directory paths — only the filename portion is used for matching. A presigned GET URL is generated at processing time. - Physical product — File type is
physical→ file matching is skipped entirely.
Examples:
| Spreadsheet Value | Strategy | Result |
|---|---|---|
https://cdn.example.com/book.pdf | External URL | Passed through to API |
my-book.pdf | Local match | Matched to uploaded my-book.pdf in S3 |
My-Book.PDF | Local match | Matched (case-insensitive) |
uploads/tenant/catalog/978-3-16-148410.epub | Local match | Basename 978-3-16-148410.epub matched |
| (empty, file type is physical) | Physical skip | No file URL needed |
4. Record Creation
- Valid rows are bulk-inserted into
import_recordsin chunks of 500, each with statusdiscovered. - The
Importstatus changes toon-queue. - A
ProcessImportBatchjob is dispatched to theimport-processingqueue.
5. Batch Processing
ProcessImportBatch processes records in configurable batches (default 50 records per job).
Job Configuration
- Queue:
import-processing - Tries: 3
- Timeout: 120 seconds
- Middleware:
WithoutOverlapping(lock key = import ID, expires after 5 min, non-blocking)
Processing Steps
- Fetch next batch of
discoveredrecords (limit:batch_size). - Mark batch as
on-queuewithsent_at = now(). - Build
ImportContentDataDTOs for each record, resolving file URLs (external passthrough or presigned GET URL from S3). - POST to
{tenant_domain}/api/v3/content/bulkwith{ items: [...] }. - Handle the response (see below).
- If more
discoveredrecords remain, dispatch a newProcessImportBatch(self-chaining). - When no records remain, evaluate final import status.
Response Handling
| Response | Action |
|---|---|
| 429 | Reset batch to discovered, re-dispatch with 30s delay |
| 5xx | Cache-based retry tracking (max retries, default 3). If retries left: reset & re-dispatch with 30s delay. If exhausted: mark batch as failed with server-error |
| 4xx | Mark batch as failed with client-error |
| 200 | Process per-record results: success=true → done + ingested_at + create ImportProduct; success=false → failed + client-error + error message |
Final Status Evaluation
When no discovered or on-queue records remain:
- All failed → Import status =
failed - Any succeeded → Import status =
done
Failed Job Recovery
If the job itself fails (exception), on-queue records are reset to discovered so they can be retried. The RescueStaleImports command runs every 3 minutes to re-dispatch jobs for imports stuck in on-queue for more than 5 minutes.
6. Progress Tracking
- The UI polls every 3 seconds (
wire:poll.3s="getProgress") during processing. - Progress shows processed count / total count with a percentage bar.
- When 100% is reached, a brief delay displays before transitioning to the completed view.
- The completed view shows final stats: total done, total failed.
- Import history (
ImportHistorycomponent) polls every 5 seconds but only while there are in-progress imports.
Error Handling Summary
| Error Source | Behavior | User Visibility |
|---|---|---|
| Validation error | Row excluded from import | Shown before import starts |
| 429 rate limit | Transparent retry with 30s delay | None (progress stalls) |
| 5xx server error | Up to 3 retries, then server-error | Record marked failed |
| 4xx client error | Immediate client-error on batch | Record marked failed |
| Per-record failure | Individual record marked failed with API message | Visible in report/history |
| Job crash | Records reset to discovered, rescued by cron | Temporary stall |