5 tools to support distributed sysadmin teams



Remotely-distributed system administration teams provide around-the-clock coverage without anyone losing sleep, and have the benefit of drawing from a global talent pool. The OpenStack global infrastructure team relies on these five open source tools to communicate, and to coordinate our work.

We also add in a few more provisos:

  • we must do our work in public
  • we work for different companies
  • we must use open source software for everything we do

The following five open source tools allow us to satisfy these goals while remaining a high-functioning team.

1. Text-based chat

We use Internet Relay Chat (IRC), provided to us by the freenode IRC network, which runs on an open source Internet Relay Chat daemon (IRCd). There is a vast array of open source clients available to connect to it with. Our team channel (a chat room, in IRC terms) is the soul of our team. We discuss ongoing issues and challenges, come up with solutions, inform each other of changes in progress, post project status changes and alerts, and have a bot that reports changes submitted to our infrastructure for review. The IRC channel we use is fully public, and we maintain a server that serves channel logs on a web server where anyone can view them.

The following is a quick snapshot of the channel one morning:

<clarkb> hrm no world dump on that failure?
<openstackgerrit> Anita Kuno proposed openstack-infra/storyboard: Add example commands for the Timeline api https://review.openstack.org/337854
<openstackgerrit> Victor Ryzhenkin proposed openstack-infra/project-config: Add openstack/fuel-plugin-murano-tests project https://review.openstack.org/332151
<clarkb> its definitely an io error of some sort
<clarkb> possibly run out of disk space?
<therve> The df output looks normal...
<greghaynes> or, is it writing out to tmpfs?

It takes some getting used to, but once you’re familiar with the flow of the channel and working with the team, these conversations and logs become an invaluable resource for keeping up with our work. You can learn more about IRC, from commands to etiquette guidelines by reading VM (Vicky) Brasseur’s pair of articles, Getting started with IRC and An IRC quickstart guide.

There are times when a quick voice call is valuable for high bandwith communcation, so we also run an Asterisk system to support Voice over IP (VoIP) calls.

Running a private IRCd within a company or other organization is common. There are a variety of open source options available; review them with project activity and security track records in mind to select one that fits your needs. If your team has tried IRC but prefers tooling that has a more modern interface and features, like logs build into the interface even when you’re not directly connected, you may want to look into Mattermost, an alternative to proprietary SaaS messaging, which you can read more about in Charlie Reisinger’s article on how his organization replaced IRC with Mattermost.

2. Etherpad

Etherpads are hosted collaborative text editors that allow a group to edit a document together in real time. Our team uses these for a variety of purposes: collaborating on announcements that go out to the entire project, sharing ideas, talks and topics for our in-person gatherings, writing out maintenance and upgrade plans, and working through tasks during maintenance windows.

Etherpad in action.

The use of an Etherpad typically goes hand-in-hand with our collaboration on IRC and our mailing list, using the Etherpad as a place to keep track of longer-lived, asynchronous notes while we discuss things more broadly directly with each other. We use the open source Etherpad Lite in our infrastructure.

3. Pastebin

A pastebin allows you to paste a large amount of text and returns a URL that you can easily share with members of the team. For our team, that means the ability to share snippets of logs with members of our team that don’t have access to the servers, share configuration files, and share instructions for performing a task that has not yet been documented. We prefer a pastebin to pasting large amounts of text in an IRC channel, or the overhead of an Etherpad for text that makes more sense to be read-only.

There are several open source options for running your own pastebin. We’re currently using a version of LodgeIt that we maintain ourselves. Tip: If you’re running a public pastebin, use a robots.txt file to stop search engines from indexing them. There are rarely valuable indexable data, and if the data are not indexed it will help keep the spammers off your pastebin.

4. GNU Screen

Officially known as a terminal multiplexer, GNU Screen allows you to run a command, or multiple commands, inside of a screen terminal session and keep applications running after you log out. This has been particularly valuable if we’re running a rare long-lived, manually triggered command that would fail if we lose our connection, such as reindexing of our code review system, or performing certain upgrades. Many of the people on our team also use it to keep their IRC clients running 24/7.

Most interestingly, we use GNU Screen sessions for training other root members of our team on system administration tasks that we haven’t yet automated or documented. Several users on a system can attach to a screen session at the same time for a collaborative terminal session. We can then walk new members of the team through access to our password vault, or share the procedures for rare or complicated maintenance tasks. It also has built-in tooling to keep a log of the entire session.

While it works for us, some critique GNU Screen for being a bit lean on modern features. Alternatives also exist like tmux and Byobu.

5. Git

Git was invented by Linus Torvalds for managing Linux kernel development. Git is now the go-to revision control system for open source projects. It may go without saying that systems administration teams should be using some kind of version control for all changes to their infrastructure, but this is particularly important for a distributed team. With our team spanning time zones, it can be hard to catch up on eight hours of discussion. With Git you can see all changes made to systems by reading commit logs, and learn who changed things.

It also allows us to more easily roll back to prior states, or at least see what prior states existed before possibly disruptive changes were made. When documenting changes more formally for team consumption, the commit history also allows us to effectively describe what and why specific changes were made.

Tip: Always make sure you include why a change was made in your commit message. We can typically figure out what was changed by reading the commit itself, but the rationale can be tricker to track down after a change is made, particularly several months later and few people remember.

Learn more about the OpenStack infrastructure, and other tools we use to support the massive development community in OpenStack running by visiting the OpenStack Project Infrastructure documentation.



Source link

,

Leave a Reply