A projection containing one row per request item.
Each row corresponds to the latest state of each request item in the system, with the following fields available:
Columns:
clientid
campaignid
sendinggroupid
sendinggroupidversion
requestitemrefid
requestitemid
requestrefid
requestid
nhsnumberhash
createdtime
completedtime
completedcommunicationtypes
failedcommunicationtypes
status
failedreason
patientodscode
timestamp
Partitions:
bucket(32, clientid)
month(createdtime)
month(completedtime)
A projection containing one row per request item plan.
Each row corresponds to the latest state of each request item plan in the system, with the following fields available:
Columns:
clientid
campaignid
sendinggroupid
sendinggroupidversion
requestitemrefid
requestitemid
requestrefid
requestid
requestitemplanid
communicationtype
supplier
createdtime
completedtime
sendtime
status
failedreason
contactdetailsource
channeltype
ordernumber
timestamp
Partitions:
bucket(32, clientid)
month(createdtime)
month(completedtime)
An aggregated view of request item plans that have reached DELIVERED or FAILED status.
Dimensions:
clientid
campaignid
sendinggroupid
sendinggroupidversion
communicationtype
supplier
createddate
completeddate
status
failedreason
contactdetailsource
channeltype
Facts:
requestitemcount
Partitions:
month(createddate)
month(completeddate)
An aggregated view of request item plans that have reached DELIVERED or FAILED status.
Includes batch dimensions for specific large-batch use cases such as vaccinations.
Dimensions:
clientid
campaignid
sendinggroupid
sendinggroupidversion
requestrefid
requestid
communicationtype
supplier
createddate
completeddate
status
failedreason
contactdetailsource
channeltype
Facts:
requestitemcount
Partitions:
month(createddate)
month(completeddate)
An aggregated view of request items providing counts for key states and channels.
Dimensions:
clientid
campaignid
sendinggroupid
sendinggroupidversion
createddate
Facts:
requestedcount
deliveredcount
failedcount
completedcount
outstandingcount
nhsappdeliveredcount
emaildeliveredcount
smsdeliveredcount
letterdeliveredcount
nhsappfailedcount
emailfailedcount
smsfailedcount
letterfailedcount
Partitions:
month(createddate)
An aggregated view of request items providing counts for key states and channels.
Includes batch dimensions for specific large-batch use cases such as vaccinations.
Dimensions:
clientid
campaignid
sendinggroupid
sendinggroupidversion
requestrefid
requestid
createddate
Facts:
requestedcount
deliveredcount
failedcount
completedcount
outstandingcount
nhsappdeliveredcount
emaildeliveredcount
smsdeliveredcount
letterdeliveredcount
nhsappfailedcount
emailfailedcount
smsfailedcount
letterfailedcount
Partitions:
month(createddate)
Ingestion queries follow a similar format in order to take account of the characteristics of the underlying transaction_history table.
These are simplified examples of the actual ingestion queries.
This query finds the latest update of each object in the ingestion window. It then inserts a subset of columns into the reporting table if the primary key is not already present, or updates the row if already present but a later version is available.
MERGE INTO <reporting_table> as target
USING (
SELECT * FROM (
SELECT
*,
ROW_NUMBER() OVER (
--Only select last update from sample window
--Use completeddate as indicator of terminal state as a tie-breaker on identical timestamps
partition BY requestitemid ORDER BY
timestamp DESC,
length(coalesce(completeddate, '')) DESC
) AS rownumber
FROM (
SELECT
--Primary key for matching
requestitemid,
--Data column(s)
status,
--Timestamp partitioning/ordering columns
completeddate,
CAST("$classification".timestamp AS BIGINT) AS timestamp
FROM ${source_table}
WHERE (sk LIKE 'REQUEST_ITEM#%') AND
(
-- Moving 1-week ingestion window
DATE(CAST(__year AS VARCHAR) || '-' || CAST(__month AS VARCHAR) || '-' || CAST(__day AS VARCHAR)) >= DATE_ADD('week', -1, CURRENT_DATE)
)
)
)
WHERE rownumber = 1
) as source
ON
--Match on primary key
source.requestitemid = target.requestitemid
--Update if match exists and source has later timestamp
WHEN MATCHED AND (source.timestamp > target.timestamp) THEN UPDATE SET
status = source.status
--Insert if no match exists
WHEN NOT MATCHED THEN INSERT (
requestitemid,
status,
timestamp
)
VALUES (
source.requestitemid,
source.status,
source.timestamp
)
Aggregation queries produce much smaller result sets than the corresponding projection queries by grouping related items together.
Typically aggregation queries operate as a second pass against earlier data projections.
If a new combination of dimensions are found an insert is performed.
If an existing combination of dimensions are found an update is performed only if the total would be increased (this is to prevent totals being reduced as events expire from the sliding ingestion window)
For correct results, the ingestion window must be large enough to encompass all events that correspond to a given combination of dimensions.
MERGE INTO <aggregation_table> as target
USING (
SELECT
--Dimensions
clientid,
sendinggroupid,
--Facts
COUNT(DISTINCT requestitemid) AS requestitemcount
FROM <projection_table>
WHERE (status = 'DELIVERED' OR status = 'FAILED') AND
(
-- Moving 1-month ingestion window
DATE(createdtime) >= DATE_ADD('month', -1, CURRENT_DATE)
)
-- Group by dimension columns
GROUP BY
clientid,
sendinggroupid
) as source
ON
-- Match on dimensions, using COALESCE to match on null values correctly
COALESCE(source.clientid, '') = COALESCE(target.clientid, '') AND
COALESCE(source.sendinggroupid, '') = COALESCE(target.sendinggroupid, '') AND
--If matched, update the fact(s) with a one-way check
WHEN MATCHED AND (source.requestitemcount > target.requestitemcount) THEN UPDATE
SET requestitemcount = source.requestitemcount
--If not matched, insert dimensions and facts
WHEN NOT MATCHED THEN INSERT (
clientid,
sendinggroupid,
requestitemcount
)
VALUES (
source.clientid,
source.sendinggroupid,
source.requestitemcount
)
Raw NHS numbers are not to be exposed in reporting tables. Instead, the following conversion is used:
to_base64(sha256(cast((? || '.' || nhsnumber) AS varbinary))) AS nhsnumberhash
An environment-specific secret key is injected as an execution parameter to prevent precomputation attacks.
The same format should be used in all queries to ensure that the same hash value is always generated for the same NHS number in any given environment.
Data migration queries are executed manually and are held in this repository solely for traceability.
A data migration query is usually very similar to the corresponding ingestion query, and operates in exactly the same way.
Key differences include:
- The sliding time window is omitted so that all available rows in the source table are available
- Historic indiosyncrasies are accounted for (such as some legacy timestamps that were not ISO8601 compliant)
- Migration queries may include specific amendments to rectify historic data quality issues