Querying data

Before you begin, read the OCDS Kingfisher Summarize page to learn about how data is stored in Kingfisher Summarize.

Tables in Kingfisher Summarize can contain summaries of:

  • individual OCDS releases

  • individual OCDS releases from an OCDS record

  • compiled OCDS releases generated by Kingfisher Process

  • compiled OCDS releases published as part of an OCDS record

Since most analysis is much easier to perform on compiled releases, we recommend working with compiled releases to begin with.

List all schemas in the database

Kingfisher Summarize creates database schemas to store summary data.

The following query returns a list of schemas in the database:

SELECT
  schema_name
FROM
  information_schema.schemata;

Set the schema to query

The following query sets the view_data_collection_1257_1259 schema as the first item in the search path:

SET
  search_path
TO
  view_data_collection_1257_1259, public;

Note

Depending on the tool you use to query the database, you might need to run the above query before each other query.

List the collections in the current schema

The following query lists the collections in the current schema, with the name of the data source and the type of data summarized:

SELECT DISTINCT
    collection_id,
    source_id,
    release_type
FROM
    release_summary
JOIN
    collection ON release_summary.collection_id = collection.id
ORDER BY
    collection_id DESC;

The release_type column indicates the type of data stored in the collection:

  • release identifies individual releases

  • compiled_release identifies compiled releases generated by Kingfisher Process

  • record identifies compiled releases published as part of an OCDS record

Get a top-level summary of contracting processes

Top-level summary data is stored in the release_summary table.

Use one of the collection_id values returned by the previous query to filter your results to a single collection. To get summaries of individual releases, use a release collection. To get summaries of entire contracting processes, use a compiled_release or record collection.

The following query returns a top-level summary of the first 3 contracting processes in collection 1259, which is a compiled_release collection.

SELECT
    *
FROM
    release_summary
WHERE
    collection_id = 1259
LIMIT 3;

To learn more about the summaries and aggregates in the release_summary table, refer to the release_summary documentation.

To get data from a different collection, change the collection_id condition.

Calculate the total value of tenders in a collection

Summary data about tenders is stored in the tender_summary table.

The following query calculates the total value of tenders disaggregated by currency and tender status in collection 1259.

SELECT
  tender_value_currency, -- return the currency of the tender value, values in OCDS have an amount and a currency, as datasets may contain values in multiple currencies
  tender_status,
  sum(tender_value_amount)
FROM
  tender_summary
WHERE
  collection_id = 1259
GROUP BY
  tender_value_currency,
  tender_status
ORDER BY
  tender_value_currency,
  tender_status;

To learn more about the summaries and aggregates in the tender_summary table, refer to the tender_summary documentation.

Tip

The tender, awards and contracts objects in OCDS all have a .status field.

Kingfisher Summarize stores these status fields in the tender_summary.tender_status, awards_summary.award_status and contracts_summary.contract_status columns.

Consider which statuses you want to include or exclude from your analysis; for example, you might want to exclude pending and cancelled contracts when calculating the total value of contracts for each buyer.

The OCDS codelist documentation describes the meaning of the statuses for each object.

Calculate the top 10 buyers by award value

Summary data about buyers is stored in the buyer_summary table, and summary data about awards is stored in the award_summary table.

To join summary tables, use the id column, which uniquely identifies a release. To learn more about the relationships between tables, refer to the How tables are related documentation.

The buyer_summary table doesn’t include the buyer’s name; however, the buyer column contains a JSONB blob of the buyer for each contracting process, from which the buyer’s name can be queried.

Most summary tables include a column that contains a JSONB blob of the object to which the summary relates. For example, the award column in awards_summary and the tender column in tender_summary.

The following query calculates the top 10 buyers by award value for collection 1259, disaggregated by currency, and counting ‘active’ awards only:

SELECT
    buyer_identifier,
    buyer -> 'name' AS buyer_name, -- extract the buyer name from the JSON
    award_value_currency,
    sum(award_value_amount) AS award_amount
FROM
    awards_summary
JOIN
    buyer_summary ON awards_summary.id = buyer_summary.id
WHERE
    awards_summary.collection_id = 1259
AND
    awards_summary.award_value_amount > 0 -- filter out awards with no value
AND
    awards_summary.award_status = 'active'
GROUP BY
    buyer_identifier,
    buyer_name,
    award_value_currency
ORDER BY
    award_amount DESC
LIMIT 10;

Check which fields are available

Use the OCDS schema documentation to understand the meaning, structure and format of the fields in OCDS and to identify the fields needed for your analysis.

Coverage of the OCDS schema varies by publisher. Use the field_counts table to check whether the fields needed for your analysis are available.

The following query lists the coverage of each field in the current schema:

SELECT
  *
FROM
  field_counts;

For schemas with multiple collections, use the collection_id column to filter your results for a particular collection.

You can also check the coverage of specific fields or groups of fields by filtering on the path column:

SELECT
  *
FROM
  field_counts
WHERE
  path IN ('tender/value/amount', 'tender/procurementMethod');