![]() So, even though we preemptively deleted all the related data from users_roles, Postgres still must check at the moment of the delete if that is still true. If even just one record depends on that value, the foreign key constraint is going to flag that and abort the delete action. If the foreign key value isn't being used by any records in the associated table(s), then the delete can proceed. How is the foreign key constraint having such an impact on performance?įor every single row you tell Postgres to delete from the roles table, it is going to first check with all dependent tables to make sure it is okay to do that. Because of this assurance, I never have to worry that data is going to be inadvertently orphaned. ![]() They help ensure that the data relationships between two tables are always intact (that's called referential integrity). This isn't the solution.įoreign key constraints are essential to solid schema design, and they are one of my favorite features of Postgres. This was to highlight the impact of the constraint. Notice I immediately rolled these changes back. With the foreign key constraint out of the picture, the delete is quite speedy. id ) from generate_series ( 1, 50000 ) as g ( id ) - then another 225,000 random global roles insert into users_roles ( user_id, role_id ) select floor ( random () * 50000 + 1 ), floor ( random () * 50000 + 1 ) from generate_series ( 1, 225000 ) Įnter fullscreen mode Exit fullscreen mode create 50,000 users insert into users select from generate_series ( 1, 50000 ) - create 100,000 roles insert into roles ( name, resource_type, resource_id ) select ( array ), null, null from generate_series ( 1, 50000 ) insert into roles ( name, resource_type, resource_id ) select 'team_collaborator', 'Team', floor ( random () * 1000 + 1 ) from generate_series ( 1, 50000 ) - create 500,000 connections between users and roles - start with 225,000 random global roles insert into users_roles ( user_id, role_id ) select floor ( random () * 50000 + 1 ), floor ( random () * 50000 + 1 ) from generate_series ( 1, 225000 ) - then 50,000 for the team collaborator role insert into users_roles ( user_id, role_id ) select floor ( random () * 50000 + 1 ), floor ( 50000 + g. The full queries are embedded in these code blocks, but you can also check out the code in this repo. If that interests you, let me know and I'll write a follow up. The specific of how this works are out of scope for this post. Here is some SQL that will generate our schema for reproducing the performance issue. Another record in this table tying my user record to a team_collaborator role for the Red team would tell you that in addition to being an admin, I am a collaborator of the Red team. ![]() A record in this table tying my user record to the admin role would tell the system that I am an admin. Then we have the roles table which can contain global roles like admin and support as well as roles tied to specific resources like team_owner and team_collaborator.Ĭreating a relationship between the two is the users_roles join table. It doesn't feature strongly in the example so I'll leave it at the id column. If you want to skip over the example setup, you can jump right to the details.įirst is the users table. These roles help the system determine the access and permissions of each user. Most software systems have users and those users need to be given a variety of roles. ![]() This minimal example is also a real-world example. Without that index, we're bound to run into some really sneaky performance gotchas. I don't tend to think of foreign key constraints as impacting performance. We can combine the two to start getting the most out of our database. Much better! Toward the end of the post we'll see how to make it even faster than that.īut before I reconstruct a minimal example that reproduces the problem and get into the details, here is the.įoreign keys are essential for enforcing the shape and integrity of our data. 10,000x slower! □ That's a nope.Īfter diagnosing and addressing the issue, I got that second delete query down to about 1 second. I then had a similar query deleting the same number of records from another table that would run for ~30 minutes. I had a query deleting 50k records on one table in ~100ms. Let me give you an idea of how bad it was. And I'm paying the lesson forward with this article. I'm better equipped to diagnose the next performance skirmish that comes my way. I didn't just solve the problem, I now better understand Postgres. I got absolutely thrashed by some PostgreSQL performance issues this past week.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |