Difference between revisions of "Escalation"

From Dreamwidth Notes
Jump to: navigation, search
(What is "Down"?)
(What is "Down"?)
Line 80: Line 80:
  
 
===What is "Down"?===
 
===What is "Down"?===
 +
* Complete downtime for everyone (in a way that isn't blatantly Someone Else's Fault, such as a Big Yellow Digger Error that takes out half a continent)
 +
* Significant site performance degradation for everyone
 +
 
* Can you load the main page? https://www.dreamwidth.org / http://www.dreamwidth.org  
 
* Can you load the main page? https://www.dreamwidth.org / http://www.dreamwidth.org  
 
* Can you load the main page of your journal?
 
* Can you load the main page of your journal?

Revision as of 21:20, 26 December 2016

Technical support escalation

Goes lowest to highest.

  1. File a support request
  2. File a ticket for developers
  3. Support
  4. Senior volunteers/Non-sysadmin staff
  5. Senior staff (Rah)
  6. Sysadmin staff (Mark or Robby)

User in peril phone tree

Who to contact first:

Goes most direct to least direct

  1. Terms of Service/Senior staff (Rah, Mark)
  2. Anyone with contact info for Terms of Service/Senior staff (such as senior volunteers, non-sysadmin staff)
  3. Anyone with contact info for senior volunteers/non-sysadmin staff
  4. File a Terms of Service report with "SUICIDE" or other similarly descriptive word in the subject

What ToS needs to know:

  1. What is the username?
  2. What sort of peril? Why are they in danger?
    1. Links to entries, even locked entries, can save time for ToS.

What ToS needs to find:

  1. Is there a reason to believe they may be in imminent danger of their life?
  2. (if so) What is their physical location?

In case of a user with suicide plans, it can help if friends:

  • Reach out to them
  • Tell them they care
  • Tell them they would miss them
  • Tell them there is help
  • Refer them to help lines

Implementation:

  1. Consult the User in Peril phone tree.
  2. Whoever is closest to the top of the phone tree, activate. Multiple people can activate if necessary.
    1. Announce that you're present and whose contact info you have.
    2. Report who you've contacted and via which method, back to the coordinating channel.
    3. Use multiple methods to contact, such as Twitter DM *and* text message *and* phone call.
  3. Someone take responsibility for interviewing the person who is reporting the user in peril. Ideally this would be a senior volunteer or someone else with a strong commitment to confidentiality. This should perhaps not be someone who is actively ringing the phone tree.
  4. All information about the user in peril should be kept to PM and privileged channels, for their privacy.
  5. Ask for the username of the user in peril. (This should be given in PM.)
  6. Ask what type of peril (if it's not already apparent).
  7. Ask how they know this.
  8. Ask for any links -- this can save time for ToS if these are already gathered. (This should be given in PM.)
  9. Ask for physical location. (This should be given in PM.)
  10. (if you have the username) Review the user's profile to see if they have set up a link for a post with contact information. If there is one, pass it to ToS privately.
  11. Ask the person if they are in communication with the user.
  12. If the person is, see helpful talking points.
  13. If the user is a member of any particular groups, there may be specialized resources for members of that group in crisis. The user's interests list can help here.
  14. If no one can contact anyone from Terms of Service in a timely manner, direct the person reporting to file a ToS ticket, with all the relevant information.
  15. Pass information to ToS as soon as they arrive, username and type of peril first, evidence for peril second, and everything else third.
  16. As ToS arrives, they can take point on interviewing the person reporting.
  17. Once the situation is handed off to ToS, de-escalate the channel. @rahaeli maintains a stash of kitten pictures which can be helpful for this.
  18. Some people in channel may be adversely affected. If you can, remain around to chat with anyone who seems to need someone.
  19. Adrenaline after-effects are fun! Check embodiment: food, drink, warmth, stretch/physical motion, medication, bathroom/washing, human contact, sleep.

Downtime

If the site is down, this wiki page may not be available. If that is the case, use your best judgment. Otherwise, try to follow these escalation procedures.

Before escalation, try to do some basic troubleshooting:

  • Is it down for you?
  • Can you load other internet pages?
  • What errors are you getting?
  • On refresh, do you get a working site?
  • Does DownForEveryoneOrJustMe.com think it's down?
  • Can other people in IRC load it? Are there multiple reports?
    • What does Fig say? (The command "fig, is dw down?" will tell the bot to check stuff on the back end.)
  • How long has it been going on?
  • Is it persistent or intermittent?

What is "Down"?

  • Complete downtime for everyone (in a way that isn't blatantly Someone Else's Fault, such as a Big Yellow Digger Error that takes out half a continent)
  • Significant site performance degradation for everyone
  • Can you load the main page? https://www.dreamwidth.org / http://www.dreamwidth.org
  • Can you load the main page of your journal?
  • Can you load your reading page?
  • Can you load the main page of the other journal or community in question?
  • Site not responding or entirely fail to load site, with browser error message (what error message is it? investigate user and local connectivity issues)
  • Something responds, but whatever it is, isn't serving you Dreamwidth pages (what is it? Cloudflare, Amazon ELB, or something else?)
    • Does it look like a Dreamwidth-side caching layer, like Varnish, Perlbal, or nginx? (ESCALATE. As of 2016-ish, Cloudflare should be it.)
  • A page that looks like a Dreamwidth page is giving you an error message and not the content you expected (investigate further or escalate)
  • Dreamwidth pages load, with content, but in a strangely pantsless fashion (it's probably missing stylesheets, if this happens, and is most often caused by browser or internet problems on the user end)
  • Dreamwidth pages with content load, but a specific (and often uncommon) task fails. (File a ticket.)
  • Is it a Cloudflare outage?
  • Do subsequent refreshes also fail to result in a working site? (Stuff failing approximately 1/6 of the time can mean that one of the webservers is hosed in some hilarious way.)

If others are able to reach the site:

  • Clear your browser cache, restart your browser, restart your computer (if possible). Also consider rebooting any network hardware, or if using WiFi, trying a different wireless network.
  • File a support request, including any error messages you may be getting. (See Filing support requests for instructions.)
    • Consider including a traceroute.

Intermittent errors:

  • Is it for everyone, or just some people?
  • Can you discern a pattern to who gets it (if some), or when people get it (if everyone)?

If it is down for everybody:

  • Locate a senior volunteer or staff (their IRC cloaks will contain "Delegate" or "Staff"), which can often be done by a simple inquiry of the form "Any staff or senior volunteers on duty? SOMETHING'S ON FIRE."
    • Try paging kareila, azurelunatic, kaberett.
  • Senior volunteers or non-Mark staff will confirm the error, and file/assist with filing support tickets and/or bug reports if warranted.
  • After confirming that it is a hair-on-fire situation, Mark will be paged.

Other Site Weirdnesses

If something is not working as you expect it to, it is ok to poll IRC to see if it is not working for everyone else also.

If it is working for everyone else:

  • Check the DW FAQ to see if the issue is in there
  • File a Support request.
    • This helps staff track issues, and allows Support to do further troubleshooting

If it is not working for everyone else:

  • Ping kareila and let her know and/or mention it in #dreamwidth-support
  • then file the Support request

Filing support requests

Even if you report an error you are having in IRC, please also file a support request if you can. Support requests help staff remain accountable (so issues do not get forgotten) and keep track of how many people are experiencing a particular problem.

  • If you can reach Dreamwidth at all, file a support request from the web form.
  • If you cannot reach Dreamwidth, email support (at) dreamwidth (dot) org with a full description of what you are trying to do, what you expect will happen, what is happening instead, and the full text of any error messages you are getting.
  • Include steps for replication
    • This could be as simple as "when I post an entry, it explodes" or as complex as a step-by-step walk-through of everything you do, including URLs and possibly screenshots.
    • The latter is preferred, but the former is ok too.

IRC problems

Sometimes there are problems in IRC. Note that this applies to moderated channels.

  • Is it a situation where you (or someone without portfolio) saying something in that channel would help? For example, saying "I'm not comfortable with this topic" or "We don't use that kind of language in here" or deflecting the topic to something that is not currently on fire?
  • Does it require intervention from an IRC op? If so, (discreetly or otherwise) page someone who can attain that role. (It can help to holler both by role (can op up in #dreamwidth) and by name (almost anybody with /staff/ or /delegate/ in their cloak).)
    • PM people who can op up; if urgent, all of them in order of likely-to-be-there
    • holler in another channel which has people who can op up but doesn't have the individuals involved with the problem
  • Is it long-term & not time-sensitive? Emailing [info]denise is good, even if you were able to de-escalate it, to make sure that it's known in case it ever comes up again.
  • If there is an urgent type channel problem and none of #dreamwidth's own ops are around, seek help in #freenode

Who ya gonna call?

Classifications for people, and the sorts of emergency where these people might be called in.

Founders

An actual emergency concerning Dreamwidth-in-general (super rare).

Sysadmins

The server is on fire.

  • Mark
  • Robby

Terms of Service

A Dreamwidth user's life is in peril.

People with Terms of Service privs

Anti-Spam

Stuff is getting super spammed up.

People with power to work on Spam support requests

  • [info]azurelunatic (who may well page someone higher on the list)
  • Any Terms of Service person.

Members of the antispam community

Technical Support

There is something very wrong with the site, and people need information or instructions.

People who can page other people

  • [info]denise can perhaps be reached via:
  • [info]mark can perhaps be reached via:
  • [info]jennifer can perhaps be reached via:
  • [info]azurelunatic can perhaps be reached via:
    • fhocutt
    • [info]jd (DW private message)
    • shadowspar
    • Silver_Adept (et al)
    • Woggy
  • [info]kaberett can perhaps be reached via:
    • me_and
  • Any member of [info]dw_lounge may have contact information to page any other member.
  • Anyone with dreamwidth/delegate in their Freenode cloak is likely (but not guaranteed) to have extra ways to contact people.