Crawler Detection
Search and recommendation features automatically block crawlers to prevent bots from triggering expensive database queries.
Why Block Crawlers?
Storefront search and recommendations require complex queries:
- Full-text search across issues, taxonomies, and metadata
- Recommendation algorithms based on user behavior and content relationships
When search engine crawlers (Googlebot, Bingbot, etc.) index a storefront, they follow every link—including search forms and recommendation widgets. Without protection, a single crawl session could trigger thousands of expensive queries.
Affected Features
| Feature | Why Blocked |
|---|---|
storefront_search | Prevents bots from executing full-text search queries |
storefront_search_terms | Blocks taxonomy term searches |
storefront_recommended_issues | Avoids recommendation algorithm execution |
How It Works
The system uses LaravelCrawlerDetect to identify bot User-Agents during feature resolution:
Request → Feature Check → Crawler Detection → Allow/Block
- Request comes in for a protected feature
- Feature resolver checks platform-level status first (global kill switch)
- If platform-enabled, checks for crawler via
LaravelCrawlerDetect - If crawler detected → feature returns
disabled - Otherwise → normal feature resolution continues
This happens in the feature access trait after the platform-level check but before tenant-specific resolution.
Bypass Mechanism
Some legitimate tools need to access these features while appearing as crawlers:
| Header | Purpose |
|---|---|
x-farfalla-bypass-crawler-detection | Monitoring tools (PageSpeed Insights, Oh Dear) |
X-CustomFenice-Tenant-Id | Fenice desktop app (uses custom User-Agent) |
These headers are checked in RequestMacrosProvider which adds the isCrawler() macro to the Request class.
SEO Impact
None. Crawlers don't need search functionality to index content:
- Content pages are directly accessible via URLs and internal links
- Search results are dynamic and not meant for indexing anyway
- Recommendations are personalized and would vary per-request
Google and other search engines discover content by following links, not by using site search. The actual content (issues, collections, etc.) remains fully indexable.
Related Code
| File | Purpose |
|---|---|
TenantResolverUnifiedQueryService | Resolves feature availability when tenant context is established |
app/Traits/CanAccessFeature.php | Feature resolution with crawler check |
app/Providers/RequestMacrosProvider.php | isCrawler() macro and bypass logic |