Modelling Schemas

A schema describes the structure of the data associated with an Asset. The technology that supports the asset often limits the structural choices for data. For example:

These differences need to be represented in the Open Metadata Types. However, at the same time, data governance is concerned with the accuracy and appropriate use of individual data values. This is very expensive if each data item was governed individually so the data governance practices aim to group like data together so it can governed in a consistent way. As such, the open metadata types provide a root set of types that all the specific schema structures inherit from. The schema root type is called Schema Element which is then dividing into a Schema Attribute (think of this as a variable) and a Schema Type. The schema type describes the structure of the data associated with the schema attribute.

In the early versions of Egeria, the schema attribute and the schema type were represented as as two separate entities in the open metadata types with a SchemaTypeForAttribute relationship to connect them together. This is shown in figure 1.

Figure 1

Figure 1: Original model for SchemaAttribute and its SchemaType

However, it became obvious that since these two elements need to retrieved together, it is much more more efficient if the schema type is represented as a classification for the SchemaAttribute since classifications are typically stored, distributed and retrieved with their entity. The new classification is called TypeEmbeddedAttribute and it contains all of the properties found in the schema types plus a typeName property to identify the corresponding schema type.

Figure 2 shows the new types for representing a schema attribute and its type.

Figure 2

Figure 2: Collapsing SchemaAttribute and SchemaType into an entity with a classification

Schema type entities are still used:

Figure 3 shows the use of the schema type:

Figure 3

Figure 3: The SchemaType is still used as the top level element in a schema and for complex structures

Specific SchemaTypes

The root SchemaType and SchemaAttribute are specialized to support different structures. The diagrams show how the structure is represented for a SchemaAttribute on the left and how it is represented as a SchemaType on the right.

Primitives

Primitives are single values such a string, characters and numbers. They are represented by the PrimitiveSchemaType.

Figure 4

Figure 4: The PrimitiveSchemaType

Literals (Constants)

Literals are fixed values, also known as constants. They are represented by the LiteralSchemaType.

Figure 5

Figure 5: The LiteralSchemaType

Enumerations

Enumerations (Enums) define a list of valid values. The valid values are recorded in a ValidValuesSet linked to an EnumSchemaType.

Figure 6

Figure 6: The EnumSchemaType

Linking to a standard schema type

External schema types link to a schema type that is reused in multiple assets - typically it is part of a standard. The use of an external schema type is represented by an ExternalSchemaType.

Figure 7

Figure 7: The ExternalSchemaType

Maps

Maps show how one set of values link to another. They are often used for look up tables. The map is represented by a MapSchemaType that then links to two other SchemaTypes, one for the type of the starting value and the other for the type of value it is mapped to.

Figure 8

Figure 8: The MapSchemaType

Alternative types

In some schemas, it is possible that there are multiple choices for the type of an element. This is supported by the SchemaTypeChoice. This links to the options for the SchemaType.

Figure 9

Figure 9: The SchemaTypeChoice

Structures or Records

It is common for an attribute to consist of a collection of other values. For example an attribute called employee may consist of multiple values from employee number, name, address, department, …. These types of attribute are represented by the StructSchemaType.

Figure 10

Figure 10: The StructSchemaType

The relationship between the schema attribute and its nested schema attributes is NestedSchemaAttribute. The relationship between the StructSchemaType and its nested schema attributes is AttributeForSchema.

Open Metadata types for connecting schemas to other types of elements:

Open Metadata Types for different types of data structures:

Specializations of the main types of schema structures for particular types of technology. They are used to enable retrieval of technology-specific schema elements. For example, a query for relational columns with a particular characteristic.

APIs that support the definition of schemas:

Other types of information associated with an Asset:



License: CC BY 4.0, Copyright Contributors to the Egeria project.