While figuring out if we wanted to use Gerrit, which we do, I initially set it up to use HTTP authentication in the proxy in front of the Gerrit server. This was fine for the first step, but it's not desirable long term. I decided using LDAP was the right direction since we are hosting private repos and using a public openid provider would not help. Something about the HTTP users was not allowing authentication via LDAP. This is the conversion story.



The way Gerrit does HTTP auth is it looks for a header to be set on the incoming request that tells it what the username is. It basically trusts the front end and this header and uses it to create the user when they show up if one doesn't already exist. On that first login, Gerrit will ask for an email address to associate the account with.

When I converted over to LDAP, I could not login with the users that already existed via the Web interface. Interestingly, I could do everything using the SSH key already loaded into the account. The error log said:

[HTTP-80] WARN  com.google.gerrit.server.account.AccountManager : Email [EMAIL] is already assigned to account 1000001; cannot
create external ID gerrit:[USER] with the same email for account 1000001.
[HTTP-80] WARN com.google.gerrit.httpd.auth.ldap.Ldap
LoginServlet : '[USER]' failed to sign in com.google.gerrit.server.account.AccountException: Email '[EMAIL]' in use by another account

I found this rather vexing because it doesn't provide anywhere near enough information to understand the problem. I knew it was related to the way HTTP vs LDAP is authenticated, but not what.

The docs are good when you know what you're looking for, but since no amount of search engine finesse was yielding a hint as to what the problem was, I had to go into system engineer mode.

The version of Gerrit

There's plenty of auth related chatter but it all predates 2.16, the version in question. It seems that around 2.15 user information was moved from reviewdb into the embedded NoteDB. So all the chatter about using SQL or gsql to address the situation was a dead end.

External ID

The one takeaway was the problem is all about external ids. I looked through the command line tools, but only found ls-groups, ls-projects, ls-members, and the very interesting create-account and ls-user-refs. The last one is worthwhile understanding. None of these CLI tools was helping.

To understand more about external ids, I found that cloning All-Users, specifically the refs/meta/external-ids which is where in NoteDB the external ids are stored as individual files named by the SHA of the id, thus making impossible more than two entities having the same external id.

Here's how that goes:

  • clone the repo
    • git clone gerrit:All-Users
  • fetch the ref
    • git fetch origin refs/meta/external-ids
  • check out the FETCH_HEAD (which is the external-ids ref)
    • git checkout -b extid FETCH_HEAD

You can actually make changes here. You have to add an Access ref to do it, which is for the ref "refs/meta/*" and allow Push to Administrators. I messed around with some throw away users, though I don't suggest doing that on the gerrit you want to keep since users are permanent in gerrit as a data consistency precaution against removing a user that has activity associated with it. You can only disable a user.

You can also see the contents of a user:

git fetch origin refs/users/01/1000001
git checkout FETCH_HEAD

The directory will now contain the account.config, authorized_keys (if any) and preferences.config which are GIT style config files with, unremarkably, the settings of the user. Again these can be changed here with a little work similar to above, but there's a better way...

REST calls to the rescue

The REST suite is much more capable than any of the other tools provided. They are simple enough to use once you sort out a few details.

REST basics for Gerrit

This can all be done with curl, nice. For a given user (admin) with a password (letmein), to look at a user's info:

curl -u admin:letmein -X GET 'https://code-review/a/accounts/self'
)]}'
{
  "_account_id": 1000014,
  "name": "admin",
  "username": "admin"
}

At first I though the )]}' was a bug, but it's intentional. In this case self is literally the calling user 'admin'. That last path segment can be replaced with 'admin' or '1000014' for the same result.

The part we're really interested in is the external ids:

curl -u admin:letmein -X GET \
 'https://code-review/a/accounts/admin/external.ids
)]}'
[
  {
    "identity": "gerrit:admin",
    "trusted": true,
    "can_delete": true
  },
  {
    "identity": "username:admin",
    "trusted": true
  }
]

This is a user that works with LDAP. The way I created this user was to change [auth] from HTTP to LDAP, create an admin user in LDAP, then login through the browser. That worked with no problems. The issue was the users from the HTTP days. Here is one of those old users:

)]}'
[
  {
    "identity": "mailto:gituser@example.com",
    "email_address": "gituser@example.com",
    "trusted": true,
    "can_delete": true
  },
  {
    "identity": "username:gituser",
    "trusted": true
  },
  {
    "identity": "gerrit:gituser",
    "trusted": true,
    "can_delete": true
  }
]

In comparing the two users, the difference is the identity "mailto:gituser@example.com", and this is the email provided during the first login via the HTTP proxy that was doing Basic Auth. And that's the problem. I wasn't 100% sure, but I'd exhausted all options and looked at literally every bit of user information available in All-Users and the REST api (which told the same story, though the REST api provided it more clearly and safely). So, unless there was something hidden away saying user A authd by HTTP and B by LDAP, that one identity had to be it.

Removing the culprit identity

Now that I was comfortable using the REST interface, I examined how to delete an external identity. It involves two steps: create a json file containing a list of the external-ids to be removed, use curl to post them. My file, called data.json looks like this:

[ "gituser@example.com" ]

The curl to push this thing over the finish line is:

curl --header 'Content-type: application/json' \
 -u admin:letmein \
 -d @data.json \
 -X POST \
 'https://code-review/a/accounts/gituser/external.ids:delete'

Conclusion

Gerrit is a conceptually big system and it is remarkably flexible and powerful. It leverages GIT in ways I've not seen before. The process of getting the system running and then trying to resolve this issue took more time than I'd have liked. But, I appreciate the work and ideas that are in Gerrit and that it's worth the initial investment.