Structure of the ONIX Intake
Here we present the information that we collect from the ONIX files that are sent to us for storage in Publica.la. We will indicate the type of data received, its relationship with the ONIX standard, the extractor responsible for obtaining such information, the required data, and the information that will be stored.
How to read an ONIX
ONIX is a standard that contains important information that we need to use for our platforms. To read an ONIX file, we need to review its entire structure to obtain the necessary information.
The ONIX standard has many different situations and variants, so it is important to document and ensure that the data we receive is correct and compatible with our platform. Therefore, we must review where the data is taken from and what requirements must be met to verify that an ONIX file is compatible with our platform.
The official reference documentation for the codes can be consulted in full on the following site: https://ns.editeur.org/onix/en
Example ONIX Fragment:
<DescriptiveDetail>
<Contributor>
<SequenceNumber>1</SequenceNumber>
<ContributorRole>A01</ContributorRole>
<NameIdentifier>
<NameIDType>01</NameIDType>
<IDTypeName>HCP Author ID</IDTypeName>
<IDValue>7421</IDValue>
</NameIdentifier>
<NameIdentifier>
<NameIDType>16</NameIDType>
<IDValue>0000000121479135</IDValue>
</NameIdentifier>
<NamesBeforeKey>Maj</NamesBeforeKey>
<KeyNames>Sjöwall</KeyNames>
<BiographicalNote><p><strong>Maj Sjöwall</strong> was born in Stockholm in 1935. She is a poet, novelist and translator, and is best known for the ten <em>Martin Beck</em> novels she wrote with husband Per Wahlöö.</p></BiographicalNote>
</Contributor>
<Contributor>
<SequenceNumber>2</SequenceNumber>
<ContributorRole>A01</ContributorRole>
<NameIdentifier>
<NameIDType>01</NameIDType>
<IDTypeName>HCP Author ID</IDTypeName>
<IDValue>7422</IDValue>
</NameIdentifier>
<NameIdentifier>
<NameIDType>16</NameIDType>
<IDValue>0000000121222604</IDValue>
</NameIdentifier>
<NamesBeforeKey>Per</NamesBeforeKey>
<KeyNames>Wahlöö</KeyNames>
<BiographicalNote ><p><strong>Per Wahlöö</strong> was born in Göteborg. After graduating from the University of Lund in 1946, he worked as a journalist, covering criminal and social issues for a number of newspapers and magazines. In the 1950s, Wahlöö became involved with radical political causes, activities that resulted in his deportation from Franco’s Spain in 1957. After returning to Sweden, he wrote a number of television and radio plays, and was managing editor of several magazines, before becoming a full-time writer.</p><p>He is best known for the series of ten <em>Martin Beck</em> novels he wrote with wife Maj Sjöwall, which they completed immediately before his death in 1975.</p></BiographicalNote>
</Contributor>
</DescriptiveDetail>
In the example, you can see the structure: DescriptiveDetail.Contributor. This is the section where our platform obtains information about the author, the narrator, etc.
Within this section, our platform looks for the value of ContributorRole to verify that it matches the required value. It is important to validate this to ensure that the information received is accurate.
In addition, several Contributors can be sent. If we need to process multiple Contributors, we must iterate and obtain the information for each case present within the ONIX file.
Reference data obtained from ONIX
| Information in Publica.la | Format | Extractor | ONIX Tag |
|---|---|---|---|
| author | array | Contributors | DescriptiveDetail.Contributor |
| narrator | array | Narrators | DescriptiveDetail.Contributor |
| description | string | null | Synopsis | CollateralDetail.TextContent |
| external_id | string | null | ProductIdentifiers | ProductIdentifier |
| country | array | null | function | PublishingDetail.CountryOfPublication |
| lang | array | null | Language | DescriptiveDetail.Language |
| bisac | array | Bisac | DescriptiveDetail.Subject |
| name | string | null | Title | DescriptiveDetail.TitleDetail |
| prices | array | null | Prices | ProductSupply |
| type | array | null | [ 'epub', 'audio', 'pdf' ] | ['EPUB', 'MP3', 'PDF'] |
| publishing_status | string | PublishStatus | PublishingDetail.PublishingStatus |
| published_at | date | null | PublishDate | PublishingDetail.PublishingDate |
| publisher | array | Publisher | PublishingDetail.Publisher |
| publishing_group | array | null | PublishingGroup | PublishingDetail.Publisher |
| sales_rights | array | null | SalesRights | PublishingDetail.SalesRights |
| keywords | array | Keywords | DescriptiveDetail.Subject |
author
Format: array
We get the information from: DescriptiveDetail.Contributor
Information extractor in Publica.la: Contributors
| Tag required | Allowed values |
|---|---|
| ContributorRole | 'A01' |
We search for information about the contributors of the book in the DescriptiveDetail.Contributor tag and make sure that ContributorRole provided in the allowed list ['A01']. If it has not been provided, we will return an empty list. If this requirement is met, we will look for contributor information in the following attributes, in the following order: PersonNameInverted, KeyNames y NamesBeforeKey, PersonName, KeyNames, CorporateName.
Section documentation: (17) - Contributor role code - A01
narrator
Format: array
We get the information from: DescriptiveDetail.Contributor
Information extractor in Publica.la: Contributors
| Tag required | Allowed values |
|---|---|
| ContributorRole | 'E03', 'E07' |
We check the DescriptiveDetail.Contributor tag and ensure that its ContributorRole has been provided in the allowed list of values: ['E03', 'E07']. If this value is not present, we will return an empty list. If this requirement is met, we will look for contributor information in the following attributes, in the following order: PersonNameInverted, KeyNames y NamesBeforeKey, PersonName, KeyNames, CorporateName.
Section documentation: (17) - Contributor role code - E03 o E07
description
Format: string | null
We get the information from: CollateralDetail.TextContent
Information extractor in Publica.la: Synopsis
| Tag required | Allowed values |
|---|---|
| ContentAudience | '00', '03' |
| TextType | '02', '03' |
| Text.@attributes.textformat | '02', '06' |
| Tag NO allowed | Values |
|---|---|
| Territory | any |
This text describes a process for obtaining information from the CollateralDetail.TextContent section of the ONIX standard. First, it is checked whether the information exists and, if it does not, null is returned. If the information exists, it is checked whether it is a single synopsis or multiple.
If it is a single synopsis, it is verified whether it meets certain criteria. If the TextType value is allowed (either 02 o 03), if the ContentAudience value is allowed (either 00 o 03) and if the Territory value is not defined. If any of these criteria are not met, it is indicated that there is no synopsis with a null value. If these criteria are met, it is checked whether Text.@attributes.textformat is in an allowed text format (02 or 06 according to the CONTENT_FORMAT array) and the text from the first element of Text is taken.
If the information is multiple, it is checked whether it meets the same criteria of TextType, ContentAudience and Territory. If it does not meet these criteria, the synopsis is discarded. If it meets these criteria, the synopses are sorted according to the priority established in the TextType array and the single processing function is called for each of them until one that can be processed is found, taking its text string.
external_id
Format: string | null
We get the information from: ProductIdentifier
Information extractor in Publica.la: ProductIdentifiers
| Tag required | Allowed values |
|---|---|
| ProductIDType | '15', '03' |
It is analyzed whether it is a single or multiple data. If it is a single value, it is verified that ProductIDType is one of the allowed values (15 = ISBN or 03 = GTIN) and the value of IDValue is obtained if not, it returns null. If it is multiple, the first value is taken and processed as a single value.
Section documentation: (5) - Product Identifier type - 04 o 15
country
Format: string | null
We get the information from: PublishingDetail.CountryOfPublication
Information extractor in Publica.la: Function countryOfPublication
The value sent in PublishingDetail.CountryOfPublication is taken and if it is not indicated, it is returned as null.
Section documentation: (91) - Country
lang
Format: string
We get the information from: DescriptiveDetail.Language
Information extractor in Publica.la: Language
We verify that the DescriptiveDetail.Language value is sent, and if not, we indicate it as undefined. If the information is sent in an associative array, we take the values 'eng' => 'en', 'ita' => 'it', 'por' => 'pt', 'spa' => 'es' based on the LanguageCode.
Otherwise, we search for the first value in LanguageRole that has the value 01. In that case, we obtain the values 'eng' => 'en', 'ita' => 'it', 'por' => 'pt', 'spa' => 'es' based on the LanguageCode.
Supported language codes
| ONIX LanguageCode | ISO 639-1 used in Publica.la |
|---|---|
| eng | en |
| ita | it |
| por | pt |
| spa | es |
| fre / fra | fr |
| chi / zho | zh |
| jpn | ja |
| ger | de |
| cat | ca |
| gsw | de |
| pol | pl |
| glg | gl |
| any other | undefined |
Section documentation: (22) - Language
bisac
Format: array
We get the information from: DescriptiveDetail.Subject
Information extractor in Publica.la: Bisac
| Tag required | Allowed values |
|---|---|
| SubjectSchemeIdentifier | '10' |
If DescriptiveDetail.Subject is not specified, an empty array is taken. If the value is in an associative array, then it is checked that SubjectSchemeIdentifier is equal to 10. If so, the value of SubjectCode is obtained, otherwise an empty array is returned.
If it is not an associative array, then it is checked that SubjectSchemeIdentifier is equal to 10. If so, an array is constructed with the code and main structure, completing it with the information of SubjectCode and MainSubject respectively.
Section documentation: (27) - Subject scheme identifier - Bisac
name
Format: string | null
We get the information from: DescriptiveDetail.TitleDetail
Information extractor in Publica.la: Title
If DescriptiveDetail.TitleDetail is not specified, a null value is taken. If the value is in an associative array, then it is checked that TitleType is not equal to 01. If so, a null value is returned. Otherwise, the first 255 characters of the value of TitleElement.TitleText are obtained if present. If not, the first 255 characters of TitleElement.TitlePrefix and TitleElement.TitleWithoutPrefix are also taken.
If it is not an associative array, then the first value is taken and processed as an associative array (as described above).
Section documentation: (15) - Title type
prices
Format: array | null
We get the information from: ProductSupply
Information extractor in Publica.la: Prices
| Tag required | Allowed values |
|---|---|
| CurrencyCode | 'USD', 'BRL', 'ARS', 'CLP', 'PEN', 'UYU', 'COP', 'MXN', 'BOB', 'EUR', 'ALL', 'AMD', 'AZN', 'BYN', 'BAM', 'BGN', 'CZK', 'DKK', 'GEL', 'HUF', 'ISK', 'MDL', 'MKD', 'NOK', 'PLN', 'RON', 'RUB', 'RSD', 'SEK', 'CHF', 'TRY', 'UAH', 'GBP' |
| DiscountCode | '01', '02', '03', '04', 'default' |
If the ONIX has multiple markets with different prices, only the WORLD region is taken. ProductSupply.Market.Territory.RegionsIncluded = WORLD. The only supported PriceType is 01.
Minimum prices by currency: (in farfalla)
If the price is 0.0 it is considered FREE
The currency that does not appear in the list will use the default of USD. The minimum price value is always taken in USD, converted from the indicated currency on the left:
USD = 2.6
BRL = 2.2 (USD)
ARS = 0.6 (USD)
Section documentation: (58) - Prices
type
Format: array | null
We get the information from: 'EPUB', 'MP3', 'PDF'
Information extractor in Publica.la: 'epub', 'audio', 'pdf'
We get the value based on what they send us: EPUB, MP3 or PDF.
publishing_status
Format: string
We get the information from: PublishingDetail.PublishingStatus
Information extractor in Publica.la: PublishStatus
If the issue has a publishing_status equal to pla-content-source-cancelled, that value is taken. Otherwise, the PublishStatus extractor is used. It is verified whether it has the values: '01', '02', ..., '17'. If not, the value '00' = unspecified is indicated.
Complete ONIX 64 mapping (status → internal state)
| ONIX code | Internal state label |
|---|---|
| 00 | unspecified |
| 01 | cancelled |
| 02 | forthcoming |
| 03 | postponed-indefinitely |
| 04 | active |
| 05 | withdrawn-from-sale |
| 06 | withdrawn-from-sale |
| 07 | out-of-print |
| 08 | inactive |
| 09 | unknown |
| 10 | remaindered |
| 11 | withdrawn-from-sale |
| 12 | recalled |
| 13 | active-but-not-on-sale |
| 15 | recalled |
| 16 | withdrawn-from-sale |
| 17 | withdrawn-from-sale |
Section documentation: (64) - Publishing status
published_at
Format: date | null
We get the information from: PublishingDetail.PublishingDate
Information extractor in Publica.la: PublishDate
| Tag required | Allowed values |
|---|---|
| PublishingDateRole | '01' |
It is checked whether it is an associative array or not. If it is, it is checked that PublishingDateRole has the value 01. If not, a null value is indicated. Then, it is checked if it has the value of Date.0, and if not, the value of Date is taken and processed with the format: Y-m-d.
If it is not an associative array, the first value is taken and processed as an associative array.
Section documentation: (163) - Publishing date role
publisher
Format: array
We get the information from: PublishingDetail.Publisher
Information extractor in Publica.la: Publisher
| Tag required | Allowed values |
|---|---|
| PublishingRole | '01' |
If it is an associative array, it is checked that PublishingRole is equal to 01. If the condition is not met, an empty array is taken. If the condition is met, the value is searched within PublisherName. If it is not found, null is taken.
If it is not an associative array, an array is constructed with the different values of PublisherName.
Section documentation: (45) - Publishing role
publishing_group
Format: array | null
We get the information from: PublishingDetail.Publisher
Information extractor in Publica.la: PublishingGroup
| Tag required | Allowed values |
|---|---|
| PublishingRole | '10' |
If it is an associative array, it is checked that PublishingRole is equal to 10. If the condition is not met, an empty array is taken. If the condition is met, the value is searched within PublisherName. If it is not found, null is taken.
If it is not an associative array, an array is constructed with the different values of PublisherName.
Section documentation: (45) - Publishing role
sales_rights
Format: array | null
We get the information from: PublishingDetail.SalesRights
Information extractor in Publica.la: SalesRights
| Tag required | Allowed values |
|---|---|
| SalesRightsType | '01', '02' |
| Territory | 'RegionsIncluded' => 'WORLD', 'CountriesIncluded' => any, 'RegionsExcluded' => any, 'CountriesExcluded' => any |
If it is an associative array, it is checked that SalesRightsType is '01', '02' and that Territory => RegionsIncluded is equal to WORLD. If this condition is not met, we look at Territory => RegionsIncluded, CountriesIncluded, RegionsExcluded, and CountriesExcluded.
If it is not an associative array, the same process is performed for each element as if it were an associative array.
Section documentation: (46) - Sales Rights
keywords
Format: array
We get the information from: DescriptiveDetail.Subject
Information extractor in Publica.la: Keywords
| Tag required | Allowed values |
|---|---|
| ProductIDType | '15', '03' |
If it is an associative array, we search for SubjectSchemeIdentifier equal to 20 and look for SubjectHeadingText within it, creating an array.
If it is not an associative array, the same process is performed for each element as if it were an associative array.
Section documentation: (27) - Subject scheme identifier