Improve query performance with index

WePython 2 Months+


I am looking to create an index for a database to acheive better performance. I am using this query:

SELECT DISTINCT t1.* FROM current_order
AS t1 LEFT JOIN
receipt AS t2 USING (paper_id,subscriber_id)
WHERE t2.id NOT IN (SELECT id
FROM receipt WHERE paid_till_date > Now())
UNION
SELECT current_order.* FROM receipt RIGHT JOIN
current_order USING (paper_id, subscriber_id)
Where receipt.id IS NULL ORDER BY
subscriber_id, paper_id

Tables are:

 PAPER
    id
    name

 SUBSCRIBER
    id
    name
    address
    suburb
    state
    postcode
    round_id

 CURRENT ORDER
    paper_id
    subscriber_id

 ROUND
    id
    name
    paperboy

 RECEIPT
    id
    receipt_date
    paid_till_date
    paper_id
    subscriber_id

I understand that primary keys are already indexed, and that i should use an index where WHERE clauses and ORDER BY clauses are used frequently. So i'm assuming that the subscribers Name would be a good one to index? And also the receipt date?

Any tips appreciated

-----------------Answer-----------------

The original query

SELECT DISTINCT t1.* 
  FROM current_order AS t1 
    LEFT JOIN receipt AS t2 USING (paper_id,subscriber_id)
  WHERE t2.id NOT IN (
    SELECT id
      FROM receipt 
      WHERE paid_till_date > Now()
    )
UNION
  SELECT current_order.* 
    FROM receipt 
      RIGHT JOIN current_order USING (paper_id, subscriber_id)
    WHERE receipt.id IS NULL 
    ORDER BY subscriber_id, paper_id

is more complex than necessary.

The base tables are

CURRENT ORDER
  paper_id
  subscriber_id 

and

RECEIPT
  id
  receipt_date
  paid_till_date
  paper_id
  subscriber_id 

The OP stated that receipt.id is a primary key and I think that (paper_id,subscriber_id) is the primary key of current order.

Both queries

SELECT t1.* 
  FROM current_order AS t1 
    LEFT JOIN receipt AS t2 USING (paper_id,subscriber_id)
  WHERE t2.id NOT IN (
    SELECT id
      FROM receipt 
      WHERE paid_till_date > Now()
    )

and

SELECT t1.* 
  FROM current_order AS t1 
    LEFT JOIN receipt AS t2 USING (paper_id,subscriber_id)
  WHERE paid_till_date <= Now()
    or paid_till_date  is NULL

are equivalent: The resultset f both queries contains exactly the following tuples:

all tuples from current_order where there is a RECEIPT with the same (paper_id,subscriber_id) and paid_till_date <= Now()
all tuples from current_order where there is a RECEIPT with the same (paper_id,subscriber_id) and paid_till_date is NULL
all tuples from current_order where there is no RECEIPT with the same (paper_id,subscriber_id) (and therefore paid_till_date  is NULL)

Therefore the queries are equivalent. And they are also equivalent if an DISTINCT is added to the select clause.

The resultset of the query

  SELECT current_order.* 
    FROM receipt 
      RIGHT JOIN current_order USING (paper_id, subscriber_id)
    WHERE receipt.id IS NULL 
    ORDER BY subscriber_id, paper_id

contains the following tuples

all tuples from current_order where there is no RECEIPT with the same (paper_id,subscriber_id) (and therefore receipt.id is NULL)

So the original query can be changed to the simpler query

SELECT DISTINCT t1.* 
  FROM current_order AS t1 
    LEFT JOIN receipt AS t2 USING (paper_id,subscriber_id)
  WHERE paid_till_date <= Now()
    or paid_till_date  is NULL

Perhaps an index on (paper_id,subscriber_id,paid_till_date) will be usefull. The query uses only columns found in the index (receipt.id is not used anymore). An Index on (paper_id,subscriber_id) of "current order" already exists because this is the primary key.


Previous : How database size affects performance: Theory vs reality
Next : Oracle.exe (SHAD) process keeps increasing