Edit SQL files#
You should be familiar with SQL and the Open Contracting Data Standard. You don’t need to know Python, as there’s no need to touch Kingfisher Summarize’s Python files, only its SQL files.
This how-to guide will walk you through the steps of editing SQL files (if you haven’t already, please follow the Setup guide):
Make your changes to the SQL files
Run the tests, to make sure your changes were successful and didn’t break anything else.
pytest
To merge your changes, push your changes to GitHub and make a pull request
Make changes#
Example: Add a column#
We want to add the description values of the Tender and Award objects to the tender_summary and awards_summary views. (Note: This is already done.)
Find the SQL file to change.
The
tender_summary.sqlfile contains theCREATE VIEW tender_summarystatement.
Find the SQL statement to change.
The
tender_summaryview selects from thetender_summary_no_datatable.
Add the
descriptionfield to theSELECTclause for thetender_summary_no_datatable.You can see the other OCDS fields in the statement. Add it alongside those.
CREATE TABLE tender_summary_no_data AS SELECT r.id, r.release_type, r.collection_id, r.ocid, r.release_id, r.data_id, tender ->> 'id' AS tender_id, tender ->> 'title' AS tender_title, tender ->> 'status' AS tender_status, tender ->> 'description' AS tender_description, -- OUR ADDITION ...
Do the same for the table summarizing the
Awardobject, by editing theSELECTclause for theawards_summary_no_datatable in theawards_summary.sqlfile.... award ->> 'title' AS award_title, award ->> 'status' AS award_status, award ->> 'description' AS award_description, -- OUR ADDITION ...
Example: Add an aggregate#
We want to add the number of Document objects (in total and for each documentType value) across all Planning objects to the release_summary view. (Note: This is already done.)
tender_document_documenttype_counts and total_tender_documents columns already exist for Tender objects. We can follow their example to add planning_document_documenttype_counts and total_planning_documents columns.
This example demonstrates how temporary (tmp_*) tables are used to build final tables.
The
tender_document_documenttype_countsterm occurs in theagg_tender.sqlfile, which populates atmp_tender_documents_aggregatestable with that column. Following this template, we create this file:CREATE TABLE tmp_planning_documents_aggregates AS SELECT id, jsonb_object_agg(coalesce(documentType, ''), documentType_count) planning_document_documenttype_counts FROM ( SELECT id, documentType, count(*) documentType_count FROM planning_documents_summary GROUP BY id, documentType) AS d GROUP BY id; CREATE UNIQUE INDEX tmp_planning_documents_aggregates_id ON tmp_planning_documents_aggregates (id);
Next, the
tmp_tender_documents_aggregatesterm occurs in therelease_summary.sqlfile, whichJOINs the table into therelease_summary_no_datatable. Following this template, we add this clause in that file:LEFT JOIN tmp_planning_documents_aggregates USING (id)
Next, the
total_tender_documentsterm occurs in therelease_summary.sqlfile, in aJOINclause. Following this template, we add this clause in that file:LEFT JOIN ( SELECT id, documents_count AS total_planning_documents FROM planning_summary) AS planning_summary USING (id)
Finally, drop the
tmp_table in thesql/final/drop.sqlfile:DROP TABLE tmp_planning_documents_aggregates;
Note
The order of the JOIN s controls the order of the columns in the table.
Review changes#
Review your changes by comparing to the initial summaries you created when loading data. Create new summaries:
./manage.py add 1 "Review new column" --name review_new_column
Then, check that the data is as you expect it to be. (If you’re viewing the data in a PostgreSQL client, don’t forget to refresh it.)
Update documentation#
The tests won’t pass if you don’t document the new columns!
Edit the
docs.sqlfile to add comments on the new columns:The comments should be in the same order as the corresponding columns in the tables. You can use other comments for similar columns as a template.
-- For the "Add a column" example ... COMMENT ON COLUMN %1$s.tender_id IS 'Value of the ``id`` field in the tender object'; COMMENT ON COLUMN %1$s.tender_title IS 'Value of the ``title`` field in the tender object'; COMMENT ON COLUMN %1$s.tender_status IS 'Value of the ``status`` field in the tender object'; COMMENT ON COLUMN %1$s.tender_description IS 'Value of the ``description`` field in the tender object'; -- OUR ADDITION ... -- For the "Add an aggregate" example COMMENT ON COLUMN %1$s.total_planning_documents IS 'Length of the ``planning/documents`` array'; COMMENT ON COLUMN %1$s.planning_document_documenttype_counts IS 'JSONB object in which each key is a unique ``documentType`` value and each value is its number of occurrences in the ``planning/documents`` array';
Run the add command (replacing
COLLECTION_IDbelow):./manage.py add COLLECTION_ID docs
Review your changes.
In your PostgreSQL client, Look at relevant tables’ schemas to check that new comments appear.
Update the documentation files (replacing
NAMEbelow):
./manage.py dev docs-table-ref NAME
Format SQL files#
We use pg_format to consistently format SQL files. On macOS, using Homebrew, install it with:
brew install pgformatter
Then, run:
find . -name '*.sql' -exec pg_format -f 1 -p '%1\$s' -o {} {} \;
Merge your changes#
If your changes are for your own use only, you’re done!
If you want to share your changes with others:
Create a new branch in your git repository and commit your changes:
git checkout -b my-changes git commit -a -m 'Add X column to Y table'
Push the changes to GitHub:
git push -u origin my-changes
Follow the link in the output to create a pull request. The maintainers will assign your pull request for review, and merge it as appropriate.
To apply your changes to existing schema created by Kingfisher Summarize, see Upgrade Kingfisher Summarize.