Structure of the ONIX Intake

Here we present the information that we collect from the ONIX files that are sent to us for storage in Publica.la. We will indicate the type of data received, its relationship with the ONIX standard, the extractor responsible for obtaining such information, the required data, and the information that will be stored.

How to read an ONIX

ONIX is a standard that contains important information that we need to use for our platforms. To read an ONIX file, we need to review its entire structure to obtain the necessary information.

The ONIX standard has many different situations and variants, so it is important to document and ensure that the data we receive is correct and compatible with our platform. Therefore, we must review where the data is taken from and what requirements must be met to verify that an ONIX file is compatible with our platform.

The official reference documentation for the codes can be consulted in full on the following site: https://ns.editeur.org/onix/en

Example ONIX Fragment:

<DescriptiveDetail>
    <Contributor>
        <SequenceNumber>1</SequenceNumber>
        <ContributorRole>A01</ContributorRole>
        <NameIdentifier>
            <NameIDType>01</NameIDType>
            <IDTypeName>HCP Author ID</IDTypeName>
            <IDValue>7421</IDValue>
        </NameIdentifier>
        <NameIdentifier>
            <NameIDType>16</NameIDType>
            <IDValue>0000000121479135</IDValue>
        </NameIdentifier>
        <NamesBeforeKey>Maj</NamesBeforeKey>
        <KeyNames>Sjöwall</KeyNames>
        <BiographicalNote><p><strong>Maj Sjöwall</strong> was born in Stockholm in 1935. She is a poet, novelist and translator, and is best known for the ten <em>Martin Beck</em> novels she wrote with husband Per Wahlöö.</p></BiographicalNote>
    </Contributor>

    <Contributor>
        <SequenceNumber>2</SequenceNumber>
        <ContributorRole>A01</ContributorRole>
        <NameIdentifier>
            <NameIDType>01</NameIDType>
            <IDTypeName>HCP Author ID</IDTypeName>
            <IDValue>7422</IDValue>
        </NameIdentifier>
        <NameIdentifier>
            <NameIDType>16</NameIDType>
            <IDValue>0000000121222604</IDValue>
        </NameIdentifier>
        <NamesBeforeKey>Per</NamesBeforeKey>
        <KeyNames>Wahlöö</KeyNames>
        <BiographicalNote ><p><strong>Per Wahlöö</strong> was born in Göteborg. After graduating from the University of Lund in 1946, he worked as a journalist, covering criminal and social issues for a number of newspapers and magazines. In the 1950s, Wahlöö became involved with radical political causes, activities that resulted in his deportation from Franco’s Spain in 1957. After returning to Sweden, he wrote a number of television and radio plays, and was managing editor of several magazines, before becoming a full-time writer.</p><p>He is best known for the series of ten <em>Martin Beck</em> novels he wrote with wife Maj Sjöwall, which they completed immediately before his death in 1975.</p></BiographicalNote>
    </Contributor>
</DescriptiveDetail>

In the example, you can see the structure: DescriptiveDetail.Contributor. This is the section where our platform obtains information about the author, the narrator, etc.

Within this section, our platform looks for the value of ContributorRole to verify that it matches the required value. It is important to validate this to ensure that the information received is accurate.

In addition, several Contributors can be sent. If we need to process multiple Contributors, we must iterate and obtain the information for each case present within the ONIX file.

Reference data obtained from ONIX

Information in Publica.la	Format	Extractor	ONIX Tag
author	array	Contributors	DescriptiveDetail.Contributor
narrator	array	Narrators	DescriptiveDetail.Contributor
description	string \| null	Synopsis	CollateralDetail.TextContent
external_id	string \| null	ProductIdentifiers	ProductIdentifier
country	array \| null	function	PublishingDetail.CountryOfPublication
lang	array \| null	Language	DescriptiveDetail.Language
bisac	array	Bisac	DescriptiveDetail.Subject
name	string \| null	Title	DescriptiveDetail.TitleDetail
prices	array \| null	Prices	ProductSupply
type	array \| null	[ 'epub', 'audio', 'pdf' ]	['EPUB', 'MP3', 'PDF']
publishing_status	string	PublishStatus	PublishingDetail.PublishingStatus
published_at	date \| null	PublishDate	PublishingDetail.PublishingDate
publisher	array	Publisher	PublishingDetail.Publisher
publishing_group	array \| null	PublishingGroup	PublishingDetail.Publisher
sales_rights	array \| null	SalesRights	PublishingDetail.SalesRights
keywords	array	Keywords	DescriptiveDetail.Subject

author

Format: array

We get the information from: DescriptiveDetail.Contributor

Information extractor in Publica.la: Contributors

Tag required	Allowed values
ContributorRole	'A01'

We search for information about the contributors of the book in the DescriptiveDetail.Contributor tag and make sure that ContributorRole provided in the allowed list ['A01']. If it has not been provided, we will return an empty list. If this requirement is met, we will look for contributor information in the following attributes, in the following order: PersonNameInverted, KeyNames y NamesBeforeKey, PersonName, KeyNames, CorporateName.

Section documentation: (17) - Contributor role code - A01

narrator

Format: array

We get the information from: DescriptiveDetail.Contributor

Information extractor in Publica.la: Contributors

Tag required	Allowed values
ContributorRole	'E03', 'E07'

We check the DescriptiveDetail.Contributor tag and ensure that its ContributorRole has been provided in the allowed list of values: ['E03', 'E07']. If this value is not present, we will return an empty list. If this requirement is met, we will look for contributor information in the following attributes, in the following order: PersonNameInverted, KeyNames y NamesBeforeKey, PersonName, KeyNames, CorporateName.

Section documentation: (17) - Contributor role code - E03 o E07

description

Format: string | null

We get the information from: CollateralDetail.TextContent

Information extractor in Publica.la: Synopsis

Tag required	Allowed values
ContentAudience	'00', '03'
TextType	'02', '03'
Text.@attributes.textformat	'02', '06'

Tag NO allowed	Values
Territory	any

This text describes a process for obtaining information from the CollateralDetail.TextContent section of the ONIX standard. First, it is checked whether the information exists and, if it does not, null is returned. If the information exists, it is checked whether it is a single synopsis or multiple.

If it is a single synopsis, it is verified whether it meets certain criteria. If the TextType value is allowed (either 02 o 03), if the ContentAudience value is allowed (either 00 o 03) and if the Territory value is not defined. If any of these criteria are not met, it is indicated that there is no synopsis with a null value. If these criteria are met, it is checked whether Text.@attributes.textformat is in an allowed text format (02 or 06 according to the CONTENT_FORMAT array) and the text from the first element of Text is taken.

If the information is multiple, it is checked whether it meets the same criteria of TextType, ContentAudience and Territory. If it does not meet these criteria, the synopsis is discarded. If it meets these criteria, the synopses are sorted according to the priority established in the TextType array and the single processing function is called for each of them until one that can be processed is found, taking its text string.

external_id

Format: string | null

We get the information from: ProductIdentifier

Information extractor in Publica.la: ProductIdentifiers

Tag required	Allowed values
ProductIDType	'15', '03'

It is analyzed whether it is a single or multiple data. If it is a single value, it is verified that ProductIDType is one of the allowed values (15 = ISBN or 03 = GTIN) and the value of IDValue is obtained if not, it returns null. If it is multiple, the first value is taken and processed as a single value.

Section documentation: (5) - Product Identifier type - 04 o 15

country

Format: string | null

We get the information from: PublishingDetail.CountryOfPublication

Information extractor in Publica.la: Function countryOfPublication

The value sent in PublishingDetail.CountryOfPublication is taken and if it is not indicated, it is returned as null.

Section documentation: (91) - Country

lang

Format: string

We get the information from: DescriptiveDetail.Language

Information extractor in Publica.la: Language

We verify that the DescriptiveDetail.Language value is sent, and if not, we indicate it as undefined. If the information is sent in an associative array, we take the values 'eng' => 'en', 'ita' => 'it', 'por' => 'pt', 'spa' => 'es' based on the LanguageCode.

Otherwise, we search for the first value in LanguageRole that has the value 01. In that case, we obtain the values 'eng' => 'en', 'ita' => 'it', 'por' => 'pt', 'spa' => 'es' based on the LanguageCode.

Supported language codes

ONIX LanguageCode	ISO 639-1 used in Publica.la
eng	en
ita	it
por	pt
spa	es
fre / fra	fr
chi / zho	zh
jpn	ja
ger	de
cat	ca
gsw	de
pol	pl
glg	gl
any other	undefined

Section documentation: (22) - Language

bisac

Format: array

We get the information from: DescriptiveDetail.Subject

Information extractor in Publica.la: Bisac

Tag required	Allowed values
SubjectSchemeIdentifier	'10'

If DescriptiveDetail.Subject is not specified, an empty array is taken. If the value is in an associative array, then it is checked that SubjectSchemeIdentifier is equal to 10. If so, the value of SubjectCode is obtained, otherwise an empty array is returned.

If it is not an associative array, then it is checked that SubjectSchemeIdentifier is equal to 10. If so, an array is constructed with the code and main structure, completing it with the information of SubjectCode and MainSubject respectively.

Section documentation: (27) - Subject scheme identifier - Bisac

name

Format: string | null

We get the information from: DescriptiveDetail.TitleDetail

Information extractor in Publica.la: Title

If DescriptiveDetail.TitleDetail is not specified, a null value is taken. If the value is in an associative array, then it is checked that TitleType is not equal to 01. If so, a null value is returned. Otherwise, the first 255 characters of the value of TitleElement.TitleText are obtained if present. If not, the first 255 characters of TitleElement.TitlePrefix and TitleElement.TitleWithoutPrefix are also taken.

If it is not an associative array, then the first value is taken and processed as an associative array (as described above).

Section documentation: (15) - Title type

prices

Format: array | null

We get the information from: ProductSupply

Information extractor in Publica.la: Prices

Tag required	Allowed values
CurrencyCode	'USD', 'BRL', 'ARS', 'CLP', 'PEN', 'UYU', 'COP', 'MXN', 'BOB', 'EUR', 'ALL', 'AMD', 'AZN', 'BYN', 'BAM', 'BGN', 'CZK', 'DKK', 'GEL', 'HUF', 'ISK', 'MDL', 'MKD', 'NOK', 'PLN', 'RON', 'RUB', 'RSD', 'SEK', 'CHF', 'TRY', 'UAH', 'GBP'
DiscountCode	'01', '02', '03', '04', 'default'

If the ONIX has multiple markets with different prices, only the WORLD region is taken. ProductSupply.Market.Territory.RegionsIncluded = WORLD. The only supported PriceType is 01.

Minimum prices by currency: (in farfalla)

If the price is 0.0 it is considered FREE

The currency that does not appear in the list will use the default of USD. The minimum price value is always taken in USD, converted from the indicated currency on the left:

USD = 2.6

BRL = 2.2 (USD)

ARS = 0.6 (USD)

Section documentation: (58) - Prices

type

Format: array | null

We get the information from: 'EPUB', 'MP3', 'PDF'

Information extractor in Publica.la: 'epub', 'audio', 'pdf'

We get the value based on what they send us: EPUB, MP3 or PDF.

publishing_status

Format: string

We get the information from: PublishingDetail.PublishingStatus

Information extractor in Publica.la: PublishStatus

If the issue has a publishing_status equal to pla-content-source-cancelled, that value is taken. Otherwise, the PublishStatus extractor is used. It is verified whether it has the values: '01', '02', ..., '17'. If not, the value '00' = unspecified is indicated.

Complete ONIX 64 mapping (status → internal state)

ONIX code	Internal state label
00	unspecified
01	cancelled
02	forthcoming
03	postponed-indefinitely
04	active
05	withdrawn-from-sale
06	withdrawn-from-sale
07	out-of-print
08	inactive
09	unknown
10	remaindered
11	withdrawn-from-sale
12	recalled
13	active-but-not-on-sale
15	recalled
16	withdrawn-from-sale
17	withdrawn-from-sale

Section documentation: (64) - Publishing status

published_at

Format: date | null

We get the information from: PublishingDetail.PublishingDate

Information extractor in Publica.la: PublishDate

Tag required	Allowed values
PublishingDateRole	'01'

It is checked whether it is an associative array or not. If it is, it is checked that PublishingDateRole has the value 01. If not, a null value is indicated. Then, it is checked if it has the value of Date.0, and if not, the value of Date is taken and processed with the format: Y-m-d.

If it is not an associative array, the first value is taken and processed as an associative array.

Section documentation: (163) - Publishing date role

publisher

Format: array

We get the information from: PublishingDetail.Publisher

Information extractor in Publica.la: Publisher

Tag required	Allowed values
PublishingRole	'01'

If it is an associative array, it is checked that PublishingRole is equal to 01. If the condition is not met, an empty array is taken. If the condition is met, the value is searched within PublisherName. If it is not found, null is taken.

If it is not an associative array, an array is constructed with the different values of PublisherName.

Section documentation: (45) - Publishing role

publishing_group

Format: array | null

We get the information from: PublishingDetail.Publisher

Information extractor in Publica.la: PublishingGroup

Tag required	Allowed values
PublishingRole	'10'

If it is an associative array, it is checked that PublishingRole is equal to 10. If the condition is not met, an empty array is taken. If the condition is met, the value is searched within PublisherName. If it is not found, null is taken.

If it is not an associative array, an array is constructed with the different values of PublisherName.

Section documentation: (45) - Publishing role

sales_rights

Format: array | null

We get the information from: PublishingDetail.SalesRights

Information extractor in Publica.la: SalesRights

Tag required	Allowed values
SalesRightsType	'01', '02'
Territory	'RegionsIncluded' => 'WORLD', 'CountriesIncluded' => any, 'RegionsExcluded' => any, 'CountriesExcluded' => any

If it is an associative array, it is checked that SalesRightsType is '01', '02' and that Territory => RegionsIncluded is equal to WORLD. If this condition is not met, we look at Territory => RegionsIncluded, CountriesIncluded, RegionsExcluded, and CountriesExcluded.

If it is not an associative array, the same process is performed for each element as if it were an associative array.

Section documentation: (46) - Sales Rights

keywords

Format: array

We get the information from: DescriptiveDetail.Subject

Information extractor in Publica.la: Keywords

Tag required	Allowed values
ProductIDType	'15', '03'

If it is an associative array, we search for SubjectSchemeIdentifier equal to 20 and look for SubjectHeadingText within it, creating an array.

If it is not an associative array, the same process is performed for each element as if it were an associative array.

Section documentation: (27) - Subject scheme identifier

How to read an ONIX​

Reference data obtained from ONIX​

author​

narrator​

description​

external_id​

country​

lang​

bisac​

name​

prices​

type​

publishing_status​

published_at​

publisher​

publishing_group​

sales_rights​

keywords​

Graph View

How to read an ONIX

Reference data obtained from ONIX

author

narrator

description

external_id

country

lang

bisac

name

prices

type

publishing_status

published_at

publisher

publishing_group

sales_rights

keywords