8 min read

Architecting a Roles & Permissions System Using Rails, GraphQL & React

Share this article!
TwitterLinkedInFacebookPocketBufferEmail

We recently rolled out our own permissions system within the Atrium platform. Below is how we did it, along with some general thoughts on permissions.

In this post:

  • Goals for our permissions system
  • Naming conventions
  • Why permissions should be positive
  • Bundling permission sets into roles
  • Do authorization checks on the permission-level, not the role-level
  • Store role-permission mappings in the code, not the database
  • Soft release permission checks by not enforcing them in production
  • A failing permission check shouldn’t break co-located queries
  • Roll a React component that performs permission checks

 

Goals for our permissions system

Goal 1: Security
Our server should only provide data that the user is permitted to see.

Goal 2: UI control
Different types of users need different user interfaces.

 

Naming conventions

A permission is an action that a user is allowed to do. It should be named according to the feature it enables. At Atrium, we have permissions like:

update_personal_account
view_company_profile
start_hiring_request
update_company_user_permissions

 

Why permissions should be positive

Permissions should be positive, allowing users to access specified features. A negative permission would be something like cannot_update_personal_account—preventing users from accessing specified features. You want exclusively positive permissions for two reasons:

  1. If a user has a mix of both positive and negative permissions, the permission check logic would become confusing. If a user has both cannot_do_x and can_do_x, how do you decide which permission should win out?
  2. If you have negative permissions, it’s not clear if the absence of a negative permission should grant the corresponding positive permission.

Mixing positive and negative permissions will overcomplicate the permission check logic.

 

Bundling permission sets into roles

Think about the UI for assigning a user a set of permissions: Atrium has 22 permissions. Scrolling through many checkboxes is a burden for users and creates unwanted friction.

We solve this problem by bundling permissions into roles:

 

With 3 options to choose from instead of 22, there is far less friction in our UI.

However, there is a trade-off here: We get usability—fewer options to choose from—at the cost of flexibility—can’t customize their permission settings.

What if a user needed to be granted can_view_X only? In that case, we’d have to create a new role that only contained the can_view_X which is about 2 hours of work.

We think the trade-off is worth it.

A good way to buy yourself a bit of flexibility is to allow a user to have many roles. A user’s set of permissions, in that case, is the union of all their roles’ permissions. Atrium didn’t do this out of the gate but our user’s requested it within a few weeks.

Our system of grouping permissions into predefined roles isn’t infinitely scalable though. Atrium’s codebase is 2 years old and we have three roles. After 5 years, we’ll probably have 8-9 roles. And at around the 15 roles mark, our UI could get a little hard to use and a shift may need to be made at that point.

Letting a user have many roles can end up in the redundant state of having roles like admin and billing_member. The billing_member role, in this case, is redundant because our admin role encompasses all permissions.

To prevent this, we added an after_commit hook on the Role model which destroys all of a user’s non-admin roles if an admin role gets created or updated.

#ruby

class Role < ApplicationRecord
after_commit :destroy_users_other_roles_if_admin, on: [:create, :update]

  def destroy_users_other_roles_if_client_admin
    users_roles = Role.where(user_id: user_id)
    user_has_admin_role = users_roles.any? { |role| role.name == 'admin' }

       # User has an 'admin' role, destroy user's other roles.
    if user_has_admin_role
      users_non_admin_roles = users_roles.reject { |role| role.name == 'admin' }
      users_non_admin_roles.each(&:destroy)
    end
  end
end

 

Do authorization checks at the permission-level, not the role-level

Doing checks at the permission-level rather than at the role-level is crucial. It makes your codebase way more resilient when you update roles or create new roles. Imagine the two approaches below:

  1. Authorization check based on roles:
# ruby

if user.has_role? [hiring_member, 'billing_member']
      documents_summary_feature
else
      raise_permission_error
end

 

That would be tedious and super prone to error.Over time, we may want to change the mappings between roles and permissions. Right now we may want “Hiring Members” to be able to “view_documents_summary,” but in the future, we may decide to retract that permission from a “Hiring Member.” Performing permission checks based on permissions, instead of roles, is more resilient against these role-permission mapping changes. If you check based on permissions, you only have to update the file that handles the mappings between roles and permissions. That’s one file to update. If you check based on roles, you then have to check all the permission check call sites to make sure your logic still makes sense. That’s potentially many files to update.

 

2. Authorization check based on permission:

# ruby

if user.has_permission? 'view_documents_summary'
       documents_summary_feature
else
       raise_permission_error
end

 

If you change what permissions belong to a given role, under the permission-level check approach, there are no further steps needed.

Store role-permission mappings in the code, not the database

At Atrium, we store the user’s roles in the database but we store the mappings between each role and their corresponding permissions in the codebase.

The other option we considered was to store the role-permission mappings in the database using a one-to-many relationship between a roles table and a permissions table.

Keeping the role-permission mappings in the code provides the following benefits:

  1. No backfill scripts required when updating a role. If you store your permissions in the database, then every time you change the role-permission mappings you have to run a backfill script. But if you store the mappings in the code, just deploy and—boom—you’re done.
  2. No scaling problems. Imagine that each user has 3 roles, on average, and each role has 15 permissions. Every time you create a new user, you’ll have to write 45 records to the database. If you have 50,000 users, that will mean your permission table will have:

50,000 users 3 roles 15 permissions = 2,250,000 permissions

You’ll end up with a fairly large table that gets queried multiple times per user session. This isn’t the end of the world necessarily. But you can save yourself performance problems if you keep the role-permission mappings in the code from the outset.

At some point in Atrium’s future, the time may come when we have to support custom roles—i.e. roles with bespoke sets of permissions. At that point, we’ll have to create a permissions table so we can store the permissions associated with each custom role.

But one step at a time.

 

Soft release permission checks by not enforcing them initially

At Atrium, we perform a permission check on a GraphQL field using a gql_permission_check helper which raises an exception if the permission check fails.

  field :hire_advisor_document, Types::HireAdvisorDocumentType do
    argument :id, !types.ID, 'ID for HireAdvisorDocument'
    resolve ->(_, args, ctx) {            # Check that the current user has permission view_hiring_dashboard
      Permissions.gql_permission_check(
          current_user: ctx[:current_user],
          permission_to_check: 'view_hiring_dashboard'
      )

      return HireAdvisorDocument.find_by(id: args[:id])
    }
  end

 

If you are releasing a new permissions system into an existing codebase, we recommend soft releasing.

 

In the first few days after release, a failing permission check should raise an exception in development only. In production, a failing permission check should whisper the distressing news to Bugsnag/Sentry but should not raise an exception.

class Permissions

          #...

def self.gql_permission_check(current_user:, permission_to_check:)

         # If the client user has permission, then return.
         if current_user.has_permission?(permission_to_check: permission_to_check)                 
           return
         end





         # User does not have permission!
         if Rails.env.production?
           ping_bugsnag 'Permission Denied'
         else
           raise GraphQL::UnauthorizedError, "Permission Denied"
        end
    end
end

 

Seriously—what are the chances that you put the correct permission restriction on every single field the first time around?

Once you are confident that you have put the appropriate permission checks on each GraphQL field—i.e. your Bugsnag feed is blissfully devoid of Permission Denied messages—then you can hard-enforce your permission checks.

class Permissions

#...

def self.gql_permission_check(current_user:, permission_to_check:)

         # If the client user has permission, then return.
         if current_user.has_permission?(permission_to_check: permission_to_check)
           return
         end

         raise GraphQL::UnauthorizedError, "Permission Denied"
   end
end

 

Oh—and as you’re working on the soft release, we recommend creating a ticket to hard-enforce the permission check. Otherwise, the bug hunting agency you hire a few months later to find security loopholes will point out your negligence and you’ll feel a bit foolish.

 

A failing permission check shouldn’t break co-located queries

Hot take: use GraphQL::UnauthorizedError.

If you have co-located GraphQL queries like this…

query myOverzealousData {
    customer {
         id
    }
    privateData {
         id
         secret
    }
}

 

Bad approach, using anything other than GraphQL::UnauthorizedError.…then you have two approaches:

When a GraphQL query requests data that the current user does not have permission to view, any co-located graphQL queries also fail.

Result: ERROR

2. Good approach, using GraphQL::UnauthorizedError

Result:

customer = {
  id: 1
}
privateData = null

 

Roll a React component that performs permission checks

We made a PermissionsQuery component that takes an array of permissions to check, pings our server and then passes the verdict of each permission check down to its children. This is how you invoke PermissionsQuery:

const permissionsToCheck = [PermissionType.UpdateRecords, PermissionType.ViewBillingMethod]

const HomeContainer = () => {
    return (
       <PermissionsQuery permissionsToCheck={permissionsToCheck}>
       {(permissions: PermissionsQueryOutputType) => {


          // Example response:

          // permissions = {
          //   results: { update_records: true, view_billing_method: false },
          //   permissionCheckLoading: false
          // }

          return (
             <Home
                  permissionCheckResults={permissions.results}
               />
             )
        }}
    </PermissionsQuery>
  )
}

 

The container is only responsible for ascertaining the verdict of each permission check. It is up to the child to work out what should be rendered depending on the permissionCheckResults.

 

This rollout gives our clients the ability to easily tailor the access and features appropriate for each of their users. With the permissions system, our clients can empower certain team members to manage activities, like hiring, without also granting them access to sensitive corporate documents—allowing for more efficient and logical workflows.

Well, you’ve been a wonderful reader. Swing by again soon to read more about what we’re working on.

Share this article!
TwitterLinkedInFacebookPocketBufferEmail
  • Thanks for the nice article!

    > A failing permission check shouldn’t break co-located queries

    This is easier said than done because then all auth-checked fields would have to be nullable right? Also, in the client, how do you differentiate null-because-unauthorized vs null-because-actually-null?

  • mm