We recently moved the infrastructure of a live SaaS product from Windows Azure Cloud to Amazon AWS. While we’re still pleased with the decision and are reaping cost saving benefits, we wanted to see if we could push our optimization further. Managing our hosting costs went down, but one line item in the AWS bill was still exceptionally large in comparison to others - the Amazon RDS for SQL Server.
To be fair, we were happy with Amazon RDS for SQL Server – it’s stable, works well, abundance of tools available and the team has deep knowledge with the tech. And while cost was manageable in the near future, we saw potential challenges long term due to our data architecture decisions. You see, we chose a sharded multi-tenancy database strategy and so far we utilize two shards (RDS instances), where data is split to shards by customer:
- 1st shard – r3.2xlarge (8 vCPU, 61 Memory GiB), hosts 450 Customers, Database size – 35 GB, $911.04 / month
- 2nd shard – r3.xlarge (4 vCPU, 30.5 Memory GiB), hosts 100 Customers, Database size – 18 GB, $448.22 / month
The product is successful and the customer count is constantly growing, so we can foresee the need to spin up the 3rd and 4th shards in near future. Each fully utilized shard would cost us an additional $911.04 per month. Since we’re trying to keep our hosting expenses lower than 5% of revenue, such growth tendencies are not satisfactory based on our current strategy. The answer now obvious – look for cheaper and/or open source alternatives.
Two alternatives were considered: MySQL and PostgreSQL. Due to maturity, current development velocity, and future potential PostgreSQL was chosen as the next database engine.
The following three hurdles had to be overcome in order to start using PostgreSQL:
- Create/migrate database schema objects to PostgreSQL from MSSQL
- Implement PostgreSQL support in our applications
- Migrate data to PostgreSQL
Database schema changes
We lucked out. Since day one our strategy has been to keep the database tiny. Our business logic was stored in the software layer, while the database was responsible only for data storage. The most common objects in our database are tables, views, and so on (primary keys, foreign keys, indexes, etc.).
We also had several stored procedures. Rewriting these required quite a different understanding of how a batch of SQL statements is executed in PostgreSQL. First of all, there is no definition such as „stored procedure” in PostgreSQL. There are functions that has have return clause and that return clause determines whether a function is a „function“ or a „stored procedure“. Another difference is that in MSSQL stored procedure local variables can be declared anywhere, however in PostgreSQL functions local variables have to be declared at the beginning of the function. Obviously it's only a syntax difference that requires a habit change.
Other schema objects were migrated without much complications:
Tables. Some of the data types don’t match directly, so they had to be changed to corresponding PostgreSQL data types. Below is a list of data types which we had to change:
- NVARCHAR(100) -> CHARACTER VARYING(100)
- NVARCHAR(MAX) -> TEXT
- DATETIME -> TIMESTAMP WITHOUT TIME ZONE
- BIT -> BOOLEAN
- DECIMAL(18, 7) -> NUMERIC(18, 7)
- FLOAT -> DOUBLE PRECISION
- IDENTITY -> SERIAL
View syntax is the same as in MSSQL. Minor note: views can not be altered once metadata of a view is changed (added / removed column or changed column data type). In that case you have to drop a view and recreate it once again.
Primary keys. Concept is the same, however in PostgreSQL primary key is always clustered. Meanwhile in MSSQL there is a choice whether primary key is clustered or not – if primary key is non-clustered, then any one other index could be created as clustered. It also should be noted that in MSSQL no matter what kind of primary key is created (clustered or non-clustered), an index will be created as well (respectively clustered or non-clustered).
Foreign keys. Syntax and behavior are very similar. Differences show up if there is a need to disable/enable foreign keys for better performance in specific cases. In MSSQL you can disable any foreign key with following syntax:
ALTER TABLE Foo NOCHECK CONSTRAINT FK_Name
In PostgreSQL you have to foresee the need to disable/enable foreign key on creation time, because the foreign key needs to be flagged as deferrable. Checking of constraints that are deferrable can be postponed until the end of the transaction using the SET CONSTRAINTS command.
Indexes. There are several main differences between MSSQL and PostgreSQL in terms of indexes.
There are no clustered indexes in PostgreSQL. All indexes are non-clustered by default.
There are no indexes with included columns in PostgreSQL. There are two choices when migrating indexes to PostgreSQL: add included columns to index key column list or exclude included columns from index completely. Both options were used in our product migration depending on each case correspondingly.
Overall, indexes in PostgreSQL have to be more accurate and be created more responsibly if you want an index to be used in the query execution plan. Attention should be paid to column order in indexes, because an index is most efficient when there are constraints on the leading (leftmost) columns.
Boolean columns are inefficient for use in index key columns because selectivity of Boolean columns is very low. Our product uses a soft-delete strategy, so almost every table has an “IsDeleted” column and queries have a “...WHERE IsDeleted = false”. Instead of adding an “IsDeleted” column to index key columns list, we use a filtered index which leads to smaller index sizes. For example:
CREATE INDEX IDX_Job_CompanyId ON Job (CompanyId) WHERE IsDeleted = false;
Business layer changes can be split into two categories: nHibernate changes and HQL (or plain SQL) changes.
nHibernate changes. Majority of our business layer uses nHibernate ORM. Most changes were in configuration. Instead of using MsSql2012Dialect we switched to PostgreSQLDialect, changed connection strings and installed Npgsql driver nuget package. Additionally, we had to rewrite several registered functions, which we use via extended dialect (e.g. AddYears, AddDays, etc.).
HQL (or plain SQL) changes. Most software out in the world has some outdated code in it – our product is not an exception. We had to adapt several dozens of SQL statements to make them compatible with PostgreSQL. I will not go into detail with each bit of SQL which we had to replace, but will mention most common we faced:
- Multiple SQL statements in one query must be separated with semi-colons. In Microsoft SQL Server 2014 T-SQL statement terminator is still optional, even if Microsoft highly recommends the use of a semi-colon because in future versions it will become required.
- Boolean (Bit) values in MSSQL are 0 and 1. In PostgreSQL they have to be replaced with true / false or the equivalent of “t” and “f”.
- String concatenation in MSSQL is “SELECT FirstName + LastName FROM...”. In PostgreSQL equivalent is “SELECT FirstName || LastName FROM...”.
- There is no ISNULL function in PostgreSQL. Use COALESCE instead.
In addition to changed nHibernate configuration and plain SQL adoption for PostgreSQL, there are a couple of important database behavior discrepancies which should be taken into account:
Case sensitivity. In MSSQL case sensitivity is configurable via the database or table column collation. In PostgreSQL, however, it’s not so straight-forward and there is no way how to set database case sensitivity. There are several techniques how to make constraints case insensitive:
- Instead of LIKE constraint use ILIKE (Insensitive like)
- Use LOWER/UPPER functions for equality constraints, e.g. “… WHERE LOWER(Name) = LOWER(‘John’)”
- Use CITEXT extension. The CITEXT extension allows you to create case insensitive columns, so you can avoid things like the lower function. In fact, this extension calls the lower function automatically so that you don’t have to do it manually.
Order By with NULLs. Nulls are ordered differently on MSSQL and PostgreSQL. In MSSQL nulls are ordered first and in PostgreSQL nulls are last by default. PostgreSQL does have the flexibility to choose how nulls should be ordered. The NULLS FIRST and NULLS LAST options can be used to determine whether nulls appear before or after non-null values in the sort ordering, e.g.:
SELECT Id, Name FROM Foo ORDER BY Name NULLS FIRST;
Migration step by step
Once the database schema and application were configured for PostgreSQL, we were ready to move over the data. Our business can tolerate a couple of hours of downtime on off-peak time (especially on weekends), so we shut down the app and copied the data in the most simplest method we could figure out. Data migration was performed with the following steps:
- Created RDS instances with initial database schema (tables, views and functions).
- Deployed all websites with PostgreSQL changes to production. App_Offline.htm file was included to notify users of a migration taking place and the fact that no data could be read or written at the time.
- Exported data from MSSQL to CSV files table by table using the BCP tool. Database sizes were relatively small – 1st shard was about 35GB in total, so export to CSV files worked well for us. BCP does have one pitfall – it doesn’t support export to CSV with UTF-8 encoding. So we had to export to CSV using UTF-16 encoding and later convert it to UTF-8.
- Changed CSV file encoding from UTF-16 to UTF-8 using iconv tool.
- Imported CSV files to corresponding tables using PSQL (PostgreSQL interactive terminal) COPY command.
- Created foreign keys and indexes. These objects were created after data migration due to better performance.
- Removed App_Offline.htm from all websites. The next request into the application caused IIS to load the application and app-domain again, and operations continued along as normal.
Overall websites were down for less than two hours and as soon as App_Offline.htm file was removed the system started to operate as usual. Several minor bugs were found in the hours following migration, but were immediately fixed prior to business start on Monday.
While the migration itself was well planned and did go smooth we had concerns about PostgreSQL performance with real load on peak times. We ran jMeter load tests before migration and were happy with the query execution time, so production performance was the last test. Our concerns were reasonable, because as mentioned before, indexes don’t fit one to one as they were tuned on Microsoft SQL Server. During post-migration week we kept an eye on slow performing queries and polished indexes to catch up to MSSQL performance levels and sometimes even achieved better execution times (picture below).
Additionally, we realized that PostgreSQL doesn’t utilize the same amount of memory as we were accustomed with MSSQL. So we had to tune up PostgreSQL RDS instance configuration to gain better performance. Dave’s Head blog post pointed us in the right direction regarding PostgreSQL configuration on Amazon RDS. After configuration changes we decided to keep the same memory size as we had on MSSQL databases (for future growth), but increased second shard vCPU from 4 to 8 – changed instance type from r3.xlarge to m4.2xlarge.
Three weeks have passed since our migration. Our monthly bill now contains “Amazon RDS for PostgreSQL” with following prices:
- 1st shard – r3.2xlarge (8 vCPU, 61 Memory GiB), Database size – 35 GB, hosts 450 Customers, $463.55 / month
- 2nd shard – m4.2xlarge (8 vCPU, 32 Memory GiB), Database size – 18 GB, hosts 100 Customers, $366.46 / month
While current $530 saving per month are not life-changing, this architectural decision will become very advantageous as we keep spinning up new database shards. PostgreSQL works reliably and is stable on peak time. Picture below shows CPU utilization of both shards.
While there are financial advantages, there are also disadvantages in terms of tools. If you are used to working with SQL Server Management Studio, then PostgreSQL GUI tools might appear a bit… rough. PostgreSQL GUI tools are not as good as SSMS, but you can freely choose from several alternatives (even if none of them are perfect). Our team members do not have a favorite yet, so each decides based on needs: PgAdmin, HeidiSQL, DataGrip, EMS.
Our application serves ~3,500 active users on a variety of desktop and mobile devices. While not massive, it’s a great platform for us to flex our muscles and adopt technologies that cut costs, improve performance, and help us continue scaling the business. While current cost savings may not be game-changing, we believe long term this is the right investment to grow the infrastructure responsibly and efficiently.