Skip to main content

Structure of the ONIX Intake

Here we present the information that we collect from the ONIX files that are sent to us for storage in Publica.la. We will indicate the type of data received, its relationship with the ONIX standard, the extractor responsible for obtaining such information, the required data, and the information that will be stored.

How to read an ONIX

ONIX is a standard that contains important information that we need to use for our platforms. To read an ONIX file, we need to review its entire structure to obtain the necessary information.

The ONIX standard has many different situations and variants, so it is important to document and ensure that the data we receive is correct and compatible with our platform. Therefore, we must review where the data is taken from and what requirements must be met to verify that an ONIX file is compatible with our platform.

The official reference documentation for the codes can be consulted in full on the following site: https://ns.editeur.org/onix/en

Example ONIX Fragment:

<DescriptiveDetail>
<Contributor>
<SequenceNumber>1</SequenceNumber>
<ContributorRole>A01</ContributorRole>
<NameIdentifier>
<NameIDType>01</NameIDType>
<IDTypeName>HCP Author ID</IDTypeName>
<IDValue>7421</IDValue>
</NameIdentifier>
<NameIdentifier>
<NameIDType>16</NameIDType>
<IDValue>0000000121479135</IDValue>
</NameIdentifier>
<NamesBeforeKey>Maj</NamesBeforeKey>
<KeyNames>Sjöwall</KeyNames>
<BiographicalNote><p><strong>Maj Sjöwall</strong> was born in Stockholm in 1935. She is a poet, novelist and translator, and is best known for the ten <em>Martin Beck</em> novels she wrote with husband Per Wahlöö.</p></BiographicalNote>
</Contributor>

<Contributor>
<SequenceNumber>2</SequenceNumber>
<ContributorRole>A01</ContributorRole>
<NameIdentifier>
<NameIDType>01</NameIDType>
<IDTypeName>HCP Author ID</IDTypeName>
<IDValue>7422</IDValue>
</NameIdentifier>
<NameIdentifier>
<NameIDType>16</NameIDType>
<IDValue>0000000121222604</IDValue>
</NameIdentifier>
<NamesBeforeKey>Per</NamesBeforeKey>
<KeyNames>Wahlöö</KeyNames>
<BiographicalNote ><p><strong>Per Wahlöö</strong> was born in Göteborg. After graduating from the University of Lund in 1946, he worked as a journalist, covering criminal and social issues for a number of newspapers and magazines. In the 1950s, Wahlöö became involved with radical political causes, activities that resulted in his deportation from Franco’s Spain in 1957. After returning to Sweden, he wrote a number of television and radio plays, and was managing editor of several magazines, before becoming a full-time writer.</p><p>He is best known for the series of ten <em>Martin Beck</em> novels he wrote with wife Maj Sjöwall, which they completed immediately before his death in 1975.</p></BiographicalNote>
</Contributor>
</DescriptiveDetail>

In the example, you can see the structure: DescriptiveDetail.Contributor. This is the section where our platform obtains information about the author, the narrator, etc.

Within this section, our platform looks for the value of ContributorRole to verify that it matches the required value. It is important to validate this to ensure that the information received is accurate.

In addition, several Contributors can be sent. If we need to process multiple Contributors, we must iterate and obtain the information for each case present within the ONIX file.

Reference data obtained from ONIX

Information in Publica.laFormatExtractorONIX Tag
authorarrayContributorsDescriptiveDetail.Contributor
narratorarrayNarratorsDescriptiveDetail.Contributor
descriptionstring | nullSynopsisCollateralDetail.TextContent
external_idstring | nullProductIdentifiersProductIdentifier
countryarray | nullfunctionPublishingDetail.CountryOfPublication
langarray | nullLanguageDescriptiveDetail.Language
bisacarrayBisacDescriptiveDetail.Subject
namestring | nullTitleDescriptiveDetail.TitleDetail
pricesarray | nullPricesProductSupply
typearray | null[ 'epub', 'audio', 'pdf' ]['EPUB', 'MP3', 'PDF']
publishing_statusstringPublishStatusPublishingDetail.PublishingStatus
published_atdate | nullPublishDatePublishingDetail.PublishingDate
publisherarrayPublisherPublishingDetail.Publisher
publishing_grouparray | nullPublishingGroupPublishingDetail.Publisher
sales_rightsarray | nullSalesRightsPublishingDetail.SalesRights
keywordsarrayKeywordsDescriptiveDetail.Subject

author

Format: array

We get the information from: DescriptiveDetail.Contributor

Information extractor in Publica.la: Contributors

Tag requiredAllowed values
ContributorRole'A01'

We search for information about the contributors of the book in the DescriptiveDetail.Contributor tag and make sure that ContributorRole provided in the allowed list ['A01']. If it has not been provided, we will return an empty list. If this requirement is met, we will look for contributor information in the following attributes, in the following order: PersonNameInverted, KeyNames y NamesBeforeKey, PersonName, KeyNames, CorporateName.

Section documentation: (17) - Contributor role code - A01

narrator

Format: array

We get the information from: DescriptiveDetail.Contributor

Information extractor in Publica.la: Contributors

Tag requiredAllowed values
ContributorRole'E03', 'E07'

We check the DescriptiveDetail.Contributor tag and ensure that its ContributorRole has been provided in the allowed list of values: ['E03', 'E07']. If this value is not present, we will return an empty list. If this requirement is met, we will look for contributor information in the following attributes, in the following order: PersonNameInverted, KeyNames y NamesBeforeKey, PersonName, KeyNames, CorporateName.

Section documentation: (17) - Contributor role code - E03 o E07

description

Format: string | null

We get the information from: CollateralDetail.TextContent

Information extractor in Publica.la: Synopsis

Tag requiredAllowed values
ContentAudience'00', '03'
TextType'02', '03'
Text.@attributes.textformat'02', '06'
Tag NO allowedValues
Territoryany

This text describes a process for obtaining information from the CollateralDetail.TextContent section of the ONIX standard. First, it is checked whether the information exists and, if it does not, null is returned. If the information exists, it is checked whether it is a single synopsis or multiple.

If it is a single synopsis, it is verified whether it meets certain criteria. If the TextType value is allowed (either 02 o 03), if the ContentAudience value is allowed (either 00 o 03) and if the Territory value is not defined. If any of these criteria are not met, it is indicated that there is no synopsis with a null value. If these criteria are met, it is checked whether Text.@attributes.textformat is in an allowed text format (02 or 06 according to the CONTENT_FORMAT array) and the text from the first element of Text is taken.

If the information is multiple, it is checked whether it meets the same criteria of TextType, ContentAudience and Territory. If it does not meet these criteria, the synopsis is discarded. If it meets these criteria, the synopses are sorted according to the priority established in the TextType array and the single processing function is called for each of them until one that can be processed is found, taking its text string.

external_id

Format: string | null

We get the information from: ProductIdentifier

Information extractor in Publica.la: ProductIdentifiers

Tag requiredAllowed values
ProductIDType'15', '03'

It is analyzed whether it is a single or multiple data. If it is a single value, it is verified that ProductIDType is one of the allowed values (15 = ISBN or 03 = GTIN) and the value of IDValue is obtained if not, it returns null. If it is multiple, the first value is taken and processed as a single value.

Section documentation: (5) - Product Identifier type - 04 o 15

country

Format: string | null

We get the information from: PublishingDetail.CountryOfPublication

Information extractor in Publica.la: Function countryOfPublication

The value sent in PublishingDetail.CountryOfPublication is taken and if it is not indicated, it is returned as null.

Section documentation: (91) - Country

lang

Format: string

We get the information from: DescriptiveDetail.Language

Information extractor in Publica.la: Language

We verify that the DescriptiveDetail.Language value is sent, and if not, we indicate it as undefined. If the information is sent in an associative array, we take the values 'eng' => 'en', 'ita' => 'it', 'por' => 'pt', 'spa' => 'es' based on the LanguageCode.

Otherwise, we search for the first value in LanguageRole that has the value 01. In that case, we obtain the values 'eng' => 'en', 'ita' => 'it', 'por' => 'pt', 'spa' => 'es' based on the LanguageCode.

Supported language codes

ONIX LanguageCodeISO 639-1 used in Publica.la
engen
itait
porpt
spaes
fre / frafr
chi / zhozh
jpnja
gerde
catca
gswde
polpl
glggl
any otherundefined

Section documentation: (22) - Language

bisac

Format: array

We get the information from: DescriptiveDetail.Subject

Information extractor in Publica.la: Bisac

Tag requiredAllowed values
SubjectSchemeIdentifier'10'

If DescriptiveDetail.Subject is not specified, an empty array is taken. If the value is in an associative array, then it is checked that SubjectSchemeIdentifier is equal to 10. If so, the value of SubjectCode is obtained, otherwise an empty array is returned.

If it is not an associative array, then it is checked that SubjectSchemeIdentifier is equal to 10. If so, an array is constructed with the code and main structure, completing it with the information of SubjectCode and MainSubject respectively.

Section documentation: (27) - Subject scheme identifier - Bisac

name

Format: string | null

We get the information from: DescriptiveDetail.TitleDetail

Information extractor in Publica.la: Title

If DescriptiveDetail.TitleDetail is not specified, a null value is taken. If the value is in an associative array, then it is checked that TitleType is not equal to 01. If so, a null value is returned. Otherwise, the first 255 characters of the value of TitleElement.TitleText are obtained if present. If not, the first 255 characters of TitleElement.TitlePrefix and TitleElement.TitleWithoutPrefix are also taken.

If it is not an associative array, then the first value is taken and processed as an associative array (as described above).

Section documentation: (15) - Title type

prices

Format: array | null

We get the information from: ProductSupply

Information extractor in Publica.la: Prices

Tag requiredAllowed values
CurrencyCode'USD', 'BRL', 'ARS', 'CLP', 'PEN', 'UYU', 'COP', 'MXN', 'BOB', 'EUR', 'ALL', 'AMD', 'AZN', 'BYN', 'BAM', 'BGN', 'CZK', 'DKK', 'GEL', 'HUF', 'ISK', 'MDL', 'MKD', 'NOK', 'PLN', 'RON', 'RUB', 'RSD', 'SEK', 'CHF', 'TRY', 'UAH', 'GBP'
DiscountCode'01', '02', '03', '04', 'default'

If the ONIX has multiple markets with different prices, only the WORLD region is taken. ProductSupply.Market.Territory.RegionsIncluded = WORLD. The only supported PriceType is 01.

Minimum prices by currency: (in farfalla)

If the price is 0.0 it is considered FREE

The currency that does not appear in the list will use the default of USD. The minimum price value is always taken in USD, converted from the indicated currency on the left:

USD = 2.6

BRL = 2.2 (USD)

ARS = 0.6 (USD)

Section documentation: (58) - Prices

type

Format: array | null

We get the information from: 'EPUB', 'MP3', 'PDF'

Information extractor in Publica.la: 'epub', 'audio', 'pdf'

We get the value based on what they send us: EPUB, MP3 or PDF.

publishing_status

Format: string

We get the information from: PublishingDetail.PublishingStatus

Information extractor in Publica.la: PublishStatus

If the issue has a publishing_status equal to pla-content-source-cancelled, that value is taken. Otherwise, the PublishStatus extractor is used. It is verified whether it has the values: '01', '02', ..., '17'. If not, the value '00' = unspecified is indicated.

Complete ONIX 64 mapping (status → internal state)

ONIX codeInternal state label
00unspecified
01cancelled
02forthcoming
03postponed-indefinitely
04active
05withdrawn-from-sale
06withdrawn-from-sale
07out-of-print
08inactive
09unknown
10remaindered
11withdrawn-from-sale
12recalled
13active-but-not-on-sale
15recalled
16withdrawn-from-sale
17withdrawn-from-sale

Section documentation: (64) - Publishing status

published_at

Format: date | null

We get the information from: PublishingDetail.PublishingDate

Information extractor in Publica.la: PublishDate

Tag requiredAllowed values
PublishingDateRole'01'

It is checked whether it is an associative array or not. If it is, it is checked that PublishingDateRole has the value 01. If not, a null value is indicated. Then, it is checked if it has the value of Date.0, and if not, the value of Date is taken and processed with the format: Y-m-d.

If it is not an associative array, the first value is taken and processed as an associative array.

Section documentation: (163) - Publishing date role

publisher

Format: array

We get the information from: PublishingDetail.Publisher

Information extractor in Publica.la: Publisher

Tag requiredAllowed values
PublishingRole'01'

If it is an associative array, it is checked that PublishingRole is equal to 01. If the condition is not met, an empty array is taken. If the condition is met, the value is searched within PublisherName. If it is not found, null is taken.

If it is not an associative array, an array is constructed with the different values of PublisherName.

Section documentation: (45) - Publishing role

publishing_group

Format: array | null

We get the information from: PublishingDetail.Publisher

Information extractor in Publica.la: PublishingGroup

Tag requiredAllowed values
PublishingRole'10'

If it is an associative array, it is checked that PublishingRole is equal to 10. If the condition is not met, an empty array is taken. If the condition is met, the value is searched within PublisherName. If it is not found, null is taken.

If it is not an associative array, an array is constructed with the different values of PublisherName.

Section documentation: (45) - Publishing role

sales_rights

Format: array | null

We get the information from: PublishingDetail.SalesRights

Information extractor in Publica.la: SalesRights

Tag requiredAllowed values
SalesRightsType'01', '02'
Territory'RegionsIncluded' => 'WORLD', 'CountriesIncluded' => any, 'RegionsExcluded' => any, 'CountriesExcluded' => any

If it is an associative array, it is checked that SalesRightsType is '01', '02' and that Territory => RegionsIncluded is equal to WORLD. If this condition is not met, we look at Territory => RegionsIncluded, CountriesIncluded, RegionsExcluded, and CountriesExcluded.

If it is not an associative array, the same process is performed for each element as if it were an associative array.

Section documentation: (46) - Sales Rights

keywords

Format: array

We get the information from: DescriptiveDetail.Subject

Information extractor in Publica.la: Keywords

Tag requiredAllowed values
ProductIDType'15', '03'

If it is an associative array, we search for SubjectSchemeIdentifier equal to 20 and look for SubjectHeadingText within it, creating an array.

If it is not an associative array, the same process is performed for each element as if it were an associative array.

Section documentation: (27) - Subject scheme identifier


X

Graph View