Skip to main content

Using a database index for report performance 

Archived

5 Tasks

5 mins

Visible to: All users
Advanced English
This content is now archived and is no longer updated. Progress is not calculated. Pega Cloud instances are disabled, and badges are no longer awarded. Click here to continue your progress in the latest version.

Scenario

Front Stage wants to investigate the impact of expanding its address search capability to include filtering on street addresses that start with a certain number. They are aware of a source of actual address information that they can use: OpenAddress . Front Stage is concerned that, over time as the Address table grows larger, and with greater use of address search, query performance could impact their ability to efficiently process cases.

The following table provides the credentials you need to complete the exercise.

Role Operator ID Password
Admin Admin@Booking rules
pgAdmin4 pega pega

Your assignment:

You will analyze the possible performance impact of expanding the address search capability to include filtering on street addresses. To do this you will first need to insert a large amount of test data into the FSG-Data-Address table, customerdata.fsg_data_address. Then you will test a street filter condition to be used with the Haversine formula. To do this you do not need to modify the HaversineFormula Connect-SQL rule. Instead you will execute the modified query from PostSQL pdAdmin4 tool. In order to analyze the query, you will prefix the query using EXPLAIN (ANALYZE true, COSTS true, FORMAT json). You will first execute the query and note the result. Then, you will add a database index to the street column. Then execute the query again and compare the result from your first run. 

Detailed Tasks

1 Detailed steps

Download  Address data

Note: Excercise.zip  is provided with this exercise. You may need to download the file. Transfer the ZIP file to $/home/architect/Desktop directory. You can also unzip files using the File Manager.

2 Load CSV address data into the PostgreSQL database

Within pgAdmin4, launch the Query Tool. Open SQL file fsg_data_address_import.sql from the Exercise folder and execute.

select count(*) from customerdata.fsg_data_address to verify that all the records loaded successfully.

3 Analyze and improve query performance

Instead of modifying the Haversine formula by adding a filter criterion such as AND street LIKE '1 ROGERS STR%', it is enough to perform a simple query within pgAdmin4 against the address table such as the one shown below.

EXPLAIN (ANALYZE true, COSTS true, FORMAT json)
SELECT pinguid AS pyGUID,
reference AS Reference,
isfor AS IsFor,
street AS Street,
city AS City,
state AS State,
postalcode AS PostalCode,
country AS Country,
latitude AS Latitude,
longitude AS Longitude
FROM customerdata.fsg_data_address
WHERE street like '1 ROGERS STR%';

In the “Explain” tab hover over the icon. On the right you should see the statistics for the Sequential Scan that was performed.

Next select Indexes beneath pr_fsg_data_address, right-click, and select Create. Provide an index name such as idx_street or StreetIDX. In the Definition table select btree then add the street column to the index. Choose any Operator class such as text_pattern_ops.

Execute the query again. This time you should see significantly faster performance due to an Index Scan being used.

If you want, do the same for a modified version of the Haversine formula. The Explain graphic will be different.

EXPLAIN (ANALYZE true, COSTS true, FORMAT json)
SELECT pyGUID AS pyGUID,
Reference AS Reference,
IsFor AS IsFor,
Street AS Street,
City AS City,
State AS State,
PostalCode AS PostalCode,
Country AS Country,
Latitude AS Latitude,
Longitude AS Longitude,
Distance AS Distance
FROM (
SELECT z.pyguid AS pyGUID,
z.reference AS Reference,
z.isfor AS IsFor,
z.street AS Street,
z.city AS City,
z.state AS State,
z.postalcode AS PostalCode,
z.country AS Country,
z.latitude AS Latitude,
z.longitude AS Longitude,
p.radius,
p.distanceunit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) AS Distance
FROM customerdata.fsg_data_address AS z
JOIN (
SELECT 42.0 AS latpoint,
-71.0 AS longpoint,
30.0 AS radius,
69.0 AS distanceunit ) AS p ON 1 = 1 ) AS d
WHERE distance <= radius
AND street LIKE '1 ROGERS STR%'
ORDER BY distance
LIMIT 15

Table Index

 

Note:  Attempting to index the latitude and longitude columns will not improve performance since these are numeric. pgAdmin4 does not support the definition of indexes for numeric columns, Forcing an index on a numeric column using DDL syntax, for example, CREATE INDEX idx_lat ON customerdata.fsg_data_address(latitude), will not work; the index will be ignored.

4 Verify your work

Indexed columns should result in significantly improved performance readings.

5 Solution RAP

 

Openaddr Collected US Northeast File

 



Available in the following mission:

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice