OCDS Kingfisher Summarize

Build Status Coverage Status

OCDS Kingfisher Summarize is used to create SQL tables that summarize the OCDS data in collections from OCDS Kingfisher Process.

(If you are viewing this on GitHub, open the full documentation for additional details.)

How it works

Kingfisher Summarize runs SQL statements to create SQL schemas, containing tables and views, which summarize the OCDS data in specified collections from Kingfisher Process.

A SQL schema is like a set of SQL tables in a common namespace. It is not like a set of constraints, like XML schema or JSON Schema.

The schemas are created in the database used by Kingfisher Process, and the schemas’ names start with summary_. (The default public schema contains the tables created by Kingfisher Process.)

Typical usage

Create a schema

Once Kingfisher Summarize is installed, use the add command to create schemas that summarize one or more collections of your choice. (This command might take a long time to run. You might want to run it in a terminal multiplexer like tmux.)

Once it’s done, you can query the tables it created.

Query its tables

As documented in the add command, the schema you created has a name starting with summary_, like summary_collection_123 or summary_collection_4_5_6 or summary_the_name. To learn more about each table in the schema, refer to the Database tables.

To query a table in the schema you created, prefix the table name by the schema name and a period. For example:

SELECT * FROM summary_collection_123.release_summary;

Instead of typing the schema name every time, you can set PostgreSQL’s search_path to a comma-separated list of schemas in which to search for tables. For example, if you want to query both a Kingfisher Summarize schema and Kingfisher Process’ tables, run this statement first:

SET search_path = summary_collection_123, public;

You can then run statements like:

SELECT * FROM release_summary;
SELECT * FROM collection;

Remove the schema

Once you no longer need the schema, remove it using the remove command to free up disk space. (You can re-create it at any time using the add command.)

List all schemas

To get a list of schemas created by yourself or others, use the index command. It reports:

  • The name of each schema

  • The IDs of the collections that it summarizes

  • The note provided by the user who created it

That’s it! Feel free to browse the documentation below.