Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions content/docs/overview/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,48 @@ sidebar:

This document includes all meaningful changes made to the **Data Package standard**. It does not cover changes made to other documents like Recipes or Guides.

## v2.1

##### `schema.fieldsMatch` (fixed)

[fieldsMatch](/standard/table-schema/#fieldsMatch) has been corrected from array to string to match its definition ([#965](https://github.com/frictionlessdata/datapackage/issues/965)).

##### `schema.name` (new)

[`name`](/standard/table-schema/#name) allows to specify a name for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.title` (new)

[`title`](/standard/table-schema/#title) allows to specify a title for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.description` (new)

[`description`](/standard/table-schema/#description) allows to specify a description for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.homepage` (new)

[`homepage`](/standard/table-schema/#homepage) allows to specify a homepage for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.version` (new)

[`version`](/standard/table-schema/#version) allows to specify a version for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.created` (new)

[`created`](/standard/table-schema/#created) allows to specify when a schema was created ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.keywords` (new)

[`keywords`](/standard/table-schema/#keywords) allows to specify keywords for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.contributors` (new)

[`contributors`](/standard/table-schema/#contributors) allows to specify contributors for a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

##### `schema.examples` (new)

[`examples`](/standard/table-schema/#examples) allows to specify a list of illustrative data resources that use a schema ([#961](https://github.com/frictionlessdata/datapackage/pull/961)).

## v2.0

This release includes a rich set of specification improvements to make Data Package a finished product (see [announcement](https://frictionlessdata.io/blog/2023/11/15/frictionless-specs-update/)). All changes were reviewed and accepted by the Data Package Working Group.
Expand Down
145 changes: 145 additions & 0 deletions content/docs/recipes/category-tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
title: Category Tables
---

<table>
<tr>
<th>Authors</th>
<td>Kyle Husmann, Jan van der Laan, Albert-Jan Roskam, Phil Schumm</td>
</tr>
</table>

Category Table Resources are Tabular Data Resources that can be referenced in the `categories` property of a field descriptor. This is useful when there are many (e.g., thousands) of categorical levels (e.g., as with controlled vocabularies such as Medical Subject Headings (MeSH)), the same `categories` definitions are repeated across many fields (e.g., the same Likert scale applied to a series of items), or the categorical levels include a signficant amount of additional metadata (e.g., a hierarchical ontology such as the International Classification of Diseases (ICD)). Category Table Resources may be shared across data packages to facilitate harmonization, and provide support for categorical variables (e.g., as in Pandas, R, or Julia) or value labels (e.g., as in Stata, SAS, or SPSS).

## Specification

The Category Table Resource builds directly on the Tabular Data Resource specification. A Category Table Resource `MUST` be a Tabular Data Resource and conform to the [Tabular Data Resource specification](/standard/data-resource/#tabular).

In addition to the requirements of a Tabular Data Resource, Category Table Resources MUST have an additional
`categoryFieldMap` property of type `object` with the following properties:

- There `MUST` be a `value` property of type `string` that specifies the name of the field in the Category Table Resource containing the values for the categories as they would appear in a focal data resource. The field indicated by `value` `MUST` exist in the Category Table Resource and be of field type `string` or `integer`.

- There `MAY` be an optional `label` property of type `string` that specificies the name of the field in the Category Table Resource containing labels for the categories. When specified, the field indicated by `label` `MUST` exist in the Category Table Resource and be of field type `string`.

- There `MAY` be an optional `ordered` property of type `boolean`. When `ordered` is `true`, implementations `SHOULD` regard the order of appearance of the values in the Category Table Resource as their natural order. When `false` implementations `SHOULD` assume that the categories do not have a natural order. When the property is not present, no assumption about the ordered nature of the values `SHOULD` be made.

For example, the following is a valid Category Table Resource:

```json
{
"name": "fruit-codes",
"type": "table",
"categoryFieldMap": {
"value": "code",
"label": "name",
"ordered": false
},
"schema": {
"fields": [
{ "name": "code", "type": "string" },
{ "name": "name", "type": "string" }
]
},
"data": [
{ "code": "A", "name": "Apple" },
{ "code": "B", "name": "Banana" },
{ "code": "C", "name": "Cherry" }
]
}
```

## Usage

Category Table Resources are used by providing the `categories` property of a categorical field descriptor with an `object` with the following properties:

- There `MUST` be a `resource` property of type `string` that specifies the name of the Category Table Resource to be used.

- There `MAY` be an optional `package` property of type `string` that specifies the package containing the Category Table Resource to be used. As with the [External Foreign Keys](/recipes/external-foreign-keys/) recipe, the `package` property `MUST` be either a fully qualified HTTP address to a Data Package `datapackage.json` file or a data package name that can be resolved by a canonical data package registry. If omitted, implementations `SHOULD` assume the Category Table Resource is in the current data package.

- There `MAY` be an optional `encodedAs` property of type `string` that specifies whether the values of the focal categorical field reference the `value` or `label` field of the Category Table Resource. When `encodedAs` is `"value"`, the values of the focal categorical field are mapped to the values of the `value` field in the Category Table Resource. When `encodedAs` is `"label"`, the values of the focal categorical field are mapped to the values of the `label` field in the Category Table Resource. When `encodedAs` is omitted, implementations `SHOULD` assume the values of the categorical field are the values of the `value` field in the Category Table Resource.

For example, the following field definition references the `fruit-codes` Category Table Resource defined above if it was in the same data package used the `value`s of the Category Table Resource (in this case, the `code` field of `fruit-codes`):

```json
{
"name": "fruit",
"type": "string",
"categories": {
"resource": "fruit-codes"
}
}
```

Alternatively, if the `fruit-codes` Category Table Resource was in an external data package and used the Category Table Resource's `label`s to represent the field's options (in this case, the `name` field of `fruit-codes`), the field definition would be:

```json
{
"name": "fruit",
"type": "string",
"categories": {
"package": "http://example.com/package.json",
"resource": "fruit-codes",
"encodedAs": "label"
}
}
```

## Constraints

In a Category Table Resource, the field referenced by the `value` property `MUST` validated with `"required": true` and `"unique": true` field constraints. Similarly, when `label` is specified, the field it references `MUST` be of type `string` and be validated with the `"unique": true` field constraint.

Fields in a focal data resource referencing a Category Table Resource via the `categories` property `MUST` have a type identical to the type of the corresponding `value` field in the Category Table Resource. For example, the following is an invalid references to the `fruit-codes` Category Table Resource because the `type` of the categorical field being defined is `integer` while the `value` field in the `fruit-codes` Category Table Resource is of type `string`:

```json
{
"name": "fruit",
"type": "integer",
"categories": {
"resource": "fruit-codes"
}
}
```

## Internationalization

Alternate translations of the category labels can be provided by way of the [Language Support](/recipes/language-support) recipe. The following example shows how the fruit-codes table from the previous example could be extended to support multiple languages:

```json
{
"name": "fruit-codes",
"type": "table",
"languages": ["en", "nl"],
"categoryFieldMap": {
"value": "code",
"label": "name",
"ordered": false
},
"schema": {
"fields": [
{ "name": "code", "type": "string" },
{ "name": "name", "type": "string" },
{ "name": "name@nl", "type": "string" }
]
},
"data": [
{ "code": "A", "name": "Apple", "name@nl": "Appel" },
{ "code": "B", "name": "Banana", "name@nl": "Banaan" },
{ "code": "C", "name": "Cherry", "name@nl": "Kers" }
]
}
```

## Discussion

Being able to define lists of categories in a separate data resource has a number of advantages:

- In case of a large number of categories it is often easier to maintain these in files, such as CSV files. This also keeps the `datapackage.json` file compact and readable for humans.

- The data set in the category table resource can store additional information besides the 'value' and 'label'. For example, the categories could have descriptions or the categories could form a hierarchy.

- It is also possible to store additional meta data in the category table resource. For example, it is possible to indicate the source, license, version and owner of the data resource. This is important for many 'official' categories lists where there can be many similar versions maintained by different organisations.

- When different fields use the same categories they can all refer to the same category table resource. First, this allows to reuse of the categories. Second, by referring to the same data resource, the field descriptors can indicate that the categories are comparable between fields.

- It is possible to refer to category table resources in other data packages. This makes it, for example, possible to create centrally maintained repositories of categories.
6 changes: 3 additions & 3 deletions content/docs/standard/data-package.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Data Package
description: A simple container format for describing a coherent collection of data in a single package. It provides the basis for convenient delivery, installation and management of datasets.
sidebar:
order: 1
profile: /profiles/2.0/datapackage.json
profile: /profiles/2.1/datapackage.json
authors:
- Rufus Pollock
- Paul Walsh
Expand Down Expand Up @@ -129,7 +129,7 @@ Packaged data resources are described in the `resources` property of the package

A root level Data Package descriptor `MAY` have a `$schema` property that `MUST` be a profile as per [Profile](/standard/glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

The default value is `https://datapackage.org/profiles/1.0/datapackage.json` and the recommended value is `https://datapackage.org/profiles/2.0/datapackage.json`.
The default value is `https://datapackage.org/profiles/1.0/datapackage.json` and the recommended value is `https://datapackage.org/profiles/2.1/datapackage.json`.

:::note[Backward Compatibility]
If the `$schema` property is not provided but a descriptor has the `profile` property a data consumer `MUST` validate the descriptor according to the [Profiles](https://specs.frictionlessdata.io/profiles/) specification.
Expand Down Expand Up @@ -229,7 +229,7 @@ An Array of string keywords to assist users searching for the package in catalog

### `contributors`

The people or organizations who contributed to this Data Package. It `MUST` be an array. Each entry is a Contributor and `MUST` be an `object`. A Contributor `MUST` have at least one property. A Contributor is `RECOMMENDED` to have `title` property and `MAY` contain `givenName`, `familyName`, `path`, `email`, `roles`, and `organization` properties:
The people or organizations that contributed to this Data Package. It `MUST` be an array. Each entry is a Contributor and `MUST` be an `object`. A Contributor `MUST` have at least one property. A Contributor is `RECOMMENDED` to have `title` property and `MAY` contain `givenName`, `familyName`, `path`, `email`, `roles`, and `organization` properties:

- `title`: A string containing a name of the contributor.
- `givenName`: A string containing the name a person has been given, if the contributor is a person.
Expand Down
4 changes: 2 additions & 2 deletions content/docs/standard/data-resource.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Data Resource
description: A simple format to describe and package a single data resource such as an individual table or file. The essence of a Data Resource is a locator for the data it describes. A range of other properties can be declared to provide a richer set of metadata.
sidebar:
order: 2
profile: /profiles/2.0/dataresource.json
profile: /profiles/2.1/dataresource.json
authors:
- Rufus Pollock
- Paul Walsh
Expand Down Expand Up @@ -159,7 +159,7 @@ If a resource has `profile` property that equals to `tabular-data-resource` or `

A root level Data Resource descriptor `MAY` have a `$schema` property that `MUST` be a profile as per [Profile](/standard/glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

The default value is `https://datapackage.org/profiles/1.0/dataresource.json` and the recommended value is `https://datapackage.org/profiles/2.0/dataresource.json`.
The default value is `https://datapackage.org/profiles/1.0/dataresource.json` and the recommended value is `https://datapackage.org/profiles/2.1/dataresource.json`.

:::note[Backward Compatibility]
If the `$schema` property is not provided but a descriptor has the `profile` property a data consumer `MUST` validate the descriptor according to the [Profiles](https://specs.frictionlessdata.io/profiles/) specification.
Expand Down
4 changes: 2 additions & 2 deletions content/docs/standard/table-dialect.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Table Dialect
description: Table Dialect describes how tabular data is stored in a file. It supports delimited text files like CSV, semi-structured formats like JSON and YAML, and spreadsheets like Microsoft Excel. The specification is designed to be expressible as a single JSON-compatible descriptor.
sidebar:
order: 3
profile: /profiles/2.0/tabledialect.json
profile: /profiles/2.1/tabledialect.json
authors:
- Rufus Pollock
- Paul Walsh
Expand Down Expand Up @@ -148,7 +148,7 @@ Database formats is a group of formats accessing data from databases like SQLite

A root level Table Dialect descriptor `MAY` have a `$schema` property that `MUST` be a profile as per [Profile](/standard/glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

The default value is `https://datapackage.org/profiles/1.0/tabledialect.json` and the recommended value is `https://datapackage.org/profiles/2.0/tabledialect.json`.
The default value is `https://datapackage.org/profiles/1.0/tabledialect.json` and the recommended value is `https://datapackage.org/profiles/2.1/tabledialect.json`.

### `header`

Expand Down
Loading