Edit SQL files¶
You should be familiar with SQL and the Open Contracting Data Standard. You don’t need to know Python, as there’s no need to touch Kingfisher Summarize’s Python files, only its SQL files.
This how-to guide will walk you through the steps of editing SQL files (if you haven’t already, please follow the Setup guide):
Make your changes to the SQL files
Run the tests, to make sure your changes were successful and didn’t break anything else.
pytest
To merge your changes, push your changes to GitHub and make a pull request
Make changes¶
Example: Add a column¶
We want to add the description
values of the Tender
and Award
objects to the tender_summary and awards_summary views. (Note: This is already done.)
Find the SQL file to change.
The
tender_summary.sql
file contains theCREATE VIEW tender_summary
statement.
Find the SQL statement to change.
The
tender_summary
view selects from thetender_summary_no_data
table.
Add the
description
field to theSELECT
clause for thetender_summary_no_data
table.You can see the other OCDS fields in the statement. Add it alongside those.
CREATE TABLE tender_summary_no_data AS SELECT r.id, r.release_type, r.collection_id, r.ocid, r.release_id, r.data_id, tender ->> 'id' AS tender_id, tender ->> 'title' AS tender_title, tender ->> 'status' AS tender_status, tender ->> 'description' AS tender_description, -- OUR ADDITION ...
Do the same for the table summarizing the
Award
object, by editing theSELECT
clause for theawards_summary_no_data
table in theawards_summary.sql
file.... award ->> 'title' AS award_title, award ->> 'status' AS award_status, award ->> 'description' AS award_description, -- OUR ADDITION ...
Example: Add an aggregate¶
We want to add the number of Document
objects (in total and for each documentType
value) across all Planning
objects to the release_summary view. (Note: This is already done.)
tender_document_documenttype_counts
and total_tender_documents
columns already exist for Tender
objects. We can follow their example to add planning_document_documenttype_counts
and total_planning_documents
columns.
This example demonstrates how temporary (tmp_*
) tables are used to build final tables.
The
tender_document_documenttype_counts
term occurs in theagg_tender.sql
file, which populates atmp_tender_documents_aggregates
table with that column. Following this template, we create this file:CREATE TABLE tmp_planning_documents_aggregates AS SELECT id, jsonb_object_agg(coalesce(documentType, ''), documentType_count) planning_document_documenttype_counts FROM ( SELECT id, documentType, count(*) documentType_count FROM planning_documents_summary GROUP BY id, documentType) AS d GROUP BY id; CREATE UNIQUE INDEX tmp_planning_documents_aggregates_id ON tmp_planning_documents_aggregates (id);
Next, the
tmp_tender_documents_aggregates
term occurs in therelease_summary.sql
file, whichJOIN
s the table into therelease_summary_no_data
table. Following this template, we add this clause in that file:LEFT JOIN tmp_planning_documents_aggregates USING (id)
Next, the
total_tender_documents
term occurs in therelease_summary.sql
file, in aJOIN
clause. Following this template, we add this clause in that file:LEFT JOIN ( SELECT id, documents_count AS total_planning_documents FROM planning_summary) AS planning_summary USING (id)
Finally, drop the
tmp_
table in thesql/final/drop.sql
file:DROP TABLE tmp_planning_documents_aggregates;
Note
The order of the JOIN
s controls the order of the columns in the table.
Review changes¶
Review your changes by comparing to the initial summaries you created when loading data. Create new summaries:
./manage.py add 1 "Review new column" --name review_new_column
Then, check that the data is as you expect it to be. (If you’re viewing the data in a PostgreSQL client, don’t forget to refresh it.)
Update documentation¶
The tests won’t pass if you don’t document the new columns!
Edit the
docs.sql
file to add comments on the new columns:The comments should be in the same order as the corresponding columns in the tables. You can use other comments for similar columns as a template.
-- For the "Add a column" example ... COMMENT ON COLUMN %1$s.tender_id IS 'Value of the ``id`` field in the tender object'; COMMENT ON COLUMN %1$s.tender_title IS 'Value of the ``title`` field in the tender object'; COMMENT ON COLUMN %1$s.tender_status IS 'Value of the ``status`` field in the tender object'; COMMENT ON COLUMN %1$s.tender_description IS 'Value of the ``description`` field in the tender object'; -- OUR ADDITION ... -- For the "Add an aggregate" example COMMENT ON COLUMN %1$s.total_planning_documents IS 'Length of the ``planning/documents`` array'; COMMENT ON COLUMN %1$s.planning_document_documenttype_counts IS 'JSONB object in which each key is a unique ``documentType`` value and each value is its number of occurrences in the ``planning/documents`` array';
Run the add command (replacing
COLLECTION_ID
below):./manage.py add COLLECTION_ID docs
Review your changes.
In your PostgreSQL client, Look at relevant tables’ schemas to check that new comments appear.
Update the documentation files (replacing
NAME
below):
./manage.py dev docs-table-ref NAME
Merge your changes¶
If your changes are for your own use only, you’re done!
If you want to share your changes with others:
Create a new branch in your git repository and commit your changes:
git checkout -b my-changes git commit -a -m 'Add X column to Y table'
Push the changes to GitHub:
git push -u origin my-changes
Follow the link in the output to create a pull request. The maintainers will assign your pull request for review, and merge it as appropriate.
To apply your changes to existing schema created by Kingfisher Summarize, see Upgrade Kingfisher Summarize.