SQL Caching and Table Partitioning (Follow-up)

Published in

ITNEXT

6 min readNov 20, 2022

Following up on my previous article, Techniques for Optimising SQL Queries, I’m going to go through two more topics for improving the performance of your queries and database.

Caching SQL queries result set
Table Partitioning
— What is Partitioning
— What are the types of Partitioning
— Why Partition
— Partitions and Indexes
— Partition Maintenance

Caching SQL queries result set

We have two levels of caching in SQL Database. The first level cache is enabled by default whereas the second level cache has to be configured. You can read more about first and second level caching in this article by Darshan Dalwadi.

Since the first level cache is enabled by default, the only thing we can do is to change the cache driver to use array, files or any other type based on the technology or ORM that we’re using to connect and work with the database.

Second level cache on the other hand, since it’s disabled, we have to enable it and configure it according to our needs. This can be configured with MemCached which is described by Mina Ayoub, Redis with a quick intro provided by DLT Labs, Riak or any other service which we can use as a cache which is compatible with SQL.

In this article from Sudheer Sandu you’ll find a detailed explanation of caching in general, including database caching.

Table Partitioning

What is Partitioning?

Partitioning divides a table into smaller logical parts called “partitions”. Still looks like a table. Partitions are defined with a CREATE or ALTER statement.

CREATE TABLE Sales ( saleDate date, … ) 
  PARTITION BY KEY(saleDate) PARTITIONS 16;

ALTER TABLE Sales 
  ADD PARTITION (date = '2016-05-14');

SQL engine knows how the table was divided into smaller parts and uses this information to speed up queries. Operations on many smaller parts are often faster than on one big table.

What are the types of Partitioning?

In total, there are 5 types of partitioning and two of the types can also have other partitions in an existing partition. All this partition types are explained below.

RANGE partitioning assigns rows to partitions based on column values falling within a given range.

CREATE TABLE Table (
  id INT NOT NULL,
  firstname VARCHAR(25) NOT NULL,
  joined DATE NOT NULL
)
PARTITION BY RANGE( id) (
  PARTITION p0 VALUES LESS THAN (100),
  PARTITION p1 VALUES LESS THAN (200),
  PARTITION p2 VALUES LESS THAN (300)
);

LIST partitioning is similar to partitioning by RANGE, except that the partition is selected based on columns matching one of a set of discrete values.

CREATE TABLE Table (val INT)
PARTITION BY LIST(val)(
  PARTITION myFirstPart VALUES IN (1,3,5),
  PARTITION MyPart VALUES IN (2,4,6),
  .........
);

HASH partitioning is when a partition is selected based on the value returned by a user-defined expression that operates on column values in rows to be inserted into the table. The function may consist of any expression valid in SQL that yields a non-negative integer value. An extension to this type, LINEAR HASH, is also available.

CREATE TABLE Sales (
  id INT NOT NULL,
  store_name VARCHAR(30),
  store_id INT
)
PARTITION BY HASH(store_id) PARTITIONS 4;

KEY partitioning is similar to partitioning by HASH, except that only one or more columns to be evaluated are supplied, and the SQL server provides its own hashing function. These columns can contain other than integer values, since the hashing function supplied by SQL guarantees an integer result regardless of the column data type. An extension to this type, LINEAR KEY, is also available.

CREATE TABLE Sales ( order_date date, ... )
  PARTITION BY KEY(order_date) PARTITIONS 4;

COLUMN partitioning is a variant of RANGE and LIST partitioning. COLUMNS partitioning enables the use of multiple columns in partitioning keys. All of these columns are taken into account both for the purpose of placing rows in partitions and for the determination of which partitions are to be checked for matching rows in partition pruning.

CREATE TABLE rcx (
  a INT,
  b INT,
  c CHAR(3),
  d INT
)
PARTITION BY RANGE COLUMNS(a,d,c) (
  PARTITION p0 VALUES LESS THAN (5,10,'ggg'),
  PARTITION p1 VALUES LESS THAN (10,20,'mmm'),
  PARTITION p2 VALUES LESS THAN (15,30,'sss'),
  PARTITION p3 VALUES LESS THAN (MAXVALUE,MAXVALUE,MAXVALUE)
);

Sub-partitioning, Range and List partitioning can be sub-partitioned with key and has. In the following image it’s shown a range by month sub-partitioned by region. We see different combinations on how we can query the table to get faster results. If we query the full table, it needs 3 minutes to scan and return a result, but if we query only the month of September for all regions, we get a result in 15 seconds.

Why Partition?

By partitioning tables we can delete data only on a specific partition, besides fetching. The only downside is that it only works with range and list partitioning.

We get results faster for non-indexed columns. This is due to the SQL optimiser being aware of the partitioning expression and can eliminate partitions while scanning the table when the columns used in the partitioning expression are in the queries where condition.

We can reduce or even eliminate indexes (indexes are still needed for partition definition)!

Adding indexes can be faster since they are added on only on a a part of the table and all the optimisations can be done by partition.

If data is only being added to one partition then you can OPTIMISE only that partition instead of running OPTIMISE on the whole table.

Partitions and Indexes

We have to types of indexes used in a table partition.

Unique Indexes
We need to have unique indexes partitions and every column used in a partitioning expression for a table must be part of every unique key on that table.
This does not mean you must have only unique keys, if you do, then every one of them must include all of the values used in the partitioning expression!
Non-Unique indexes
We can also have non-unique indexes on a partitioned table. The partition engine will execute a separate non-parallel index lookup on each partition ! Performance may be OK with a very small number (4) of partitions, but gets really bad with large numbers of partitions. If you must have non-unique indexes, keep the number of partitions low (<16). Too many partitions, over 124, start to really slow down non-unique index lookup

A well designed partitioned table has few or even NO indexes! It should always have less than the non-partitioned table since it needs to re-design indexes!

Partition Maintenance

Table partitioning is not something that is done once and forget about it. It needs to be maintained day in and day out.

We need to fix indexes accordingly, if needed.

Run optimise and analyse process on partitions on a regular schedule, a nightly job for example for each partition which will run for 5–10 minutes.

CREATE TABLE Sale (
  saleID INT AUTO_INCREMENT PRIMARY KEY, 
  salesDate TIMESTAMP, 
  storeID smallint, 
  amount decimal(10,2)
);
Alter table Sale Partition by hash(saleID) partitions 7;
Alter table sale optimize partition P1;

Insert, Update, and Delete will be done as usual and by default will target the last partition

Try to keep the number to 16 partitions or less, 8 or less best for reasons mentioned above.

Hope you find this information useful and feel free to let me know your thoughts.

If you want to get in touch with me, you can follow or reach out on any of my socials or through the contact form found on my personal site.