Difference between revisions of "Bot Policy"

From Dreamwidth Notes
Jump to: navigation, search
(Updated to most recent version on the site. Retained link to the wiki page on compatible clients, which may or may not be up to date.)
Line 1: Line 1:
This page outlines a proposal for the Dreamwidth Studios "Bot Policy", the rules that third party applications and tools must abide by in order to prevent being blocked from accessing the Dreamwidth servers.
+
This is a mirror of the Dreamwidth Studios "Bot Policy", the rules that third party applications and tools must abide by in order to prevent being blocked from accessing the Dreamwidth servers.  
  
'''This document is a proposal and subject to change.'''
+
'''The official and most up to date version is always at [https://www.dreamwidth.org/site/bot Dreamwidth's bot policy] page.'''
  
== General Purpose Guidelines ==
+
== General Guidelines ==
  
Things to keep in mind that are just generally useful:
+
The overall guidelines for bot/tool authors are pretty simple:
  
* Good clients send a proper user-agent (or other info string) that includes a contact email address.
+
* Good clients send a proper user-agent (or other info string) that includes a contact email address. That will let us contact you if there are any problems.
* Be kind to the system, try to rate limit your requests, don't abuse the servers.
+
* Be kind to the system! Rate-limit your requests, and try to cache data whenever possible.
* DW staff and volunteers are more than happy to work with you to do neat projects, fix problems in the servers, or tell you if something is good or not. Just ask!
+
* Don't screen-scrape the HTML/BML output of the site. If you can't do something you want to do through an API, let us know, and we'll try to add an API for it.
 +
* Try to avoid asking for users' passwords. If there's something you can't do through the protocol or through our APIs, let us know, and we'll add it. If you have to ask for users' passwords, you should tell people what you'll do with them, and warn users to change their passwords before and after they access your client if they're worried.
  
 
== Client Applications ==
 
== Client Applications ==
  
A [[Compatible_clients|client application]] is defined as a tool that runs under the control of an end user.  For example, the Semagic client, jbackup, ljMigrate, and similar tools that are run under someone's control for personal use.
+
A [[Compatible_clients|client application]] is defined as a tool that runs under the control of an end user (whether as a downloadable program or an in-browser object).
  
These tools are generally unrestricted. We may, from time to time, have to restrict some of the API methods they use (such as syncitems, getevents) if there is too much load on the site. Generally speaking, though, we will not do this unless it's absolutely necessary to protect the good functioning of the site.
+
Generally, we don't restrict these tools. From time to time, we may need to restrict some of the API methods they use, if there's too much load on the site. We won't do this unless it's absolutely necessary to preserve the proper functioning of the site, and only on a temporary basis.
  
 
== Third Party Sites/Utilities ==
 
== Third Party Sites/Utilities ==
  
Things that fall into this category are more restricted, as they have the potential to do great harm individually. We are here to serve our users (which includes client applications), but third party sites that scrape data sources are not our primary function.
+
We place more restrictions on things that fall into this category, since they have the potential to seriously impact site load. We're here to serve our users (which includes client applications), but third-party sites that scrape data sources aren't our primary audience.
  
Generally speaking, if your site is going to be very small and hit DW little (this term is purposefully vague), you can go right ahead and do it. If it becomes a problem we will contact you and let you know that your particular access is turning out to be hard for us.
+
Generally speaking, if your site is going to be small and access Dreamwidth infrequently, you can go right ahead and do it. (Yes, these terms are purposefully vague -- use your best judgement, and contact us with any questions.) If your use becomes a problem, we'll contact you and let you know that your particular tool is turning out to be a load problem. This is one of the reasons why it's so critical to include contact information in your user agent.
  
On the other hand, if you intend to run a service (or your service gets popular) and you are doing dozens or hundreds of requests, then you should contact us and let us know what you are doing so we can make sure we have the proper resources to support your site.
+
If you intend to run a service (or your service gets popular), and you're making frequent requests, contact us and let us know. This will let us examine your tool and see what changes we can make on our end to better support your use, and what changes we can suggest in your tool to be gentler on the site.
 
+
We ask that all third party sites access Dreamwidth APIs on a special domain: '''b.dreamwidth.org'''.  If you use this domain for your XML-RPC endpoints and web access, you will have full ability to access the site, but we will be able to separate which traffic is bot traffic and which traffic is user traffic.  In case the site becomes overloaded, we can turn off this domain to mitigate third party traffic.
+
 
+
(Note: this domain doesn't work yet.  This page is just a proposal, it will be implemented sometime, if the proposal is well received.)
+
  
 
== IP Rate Limiting ==
 
== IP Rate Limiting ==
  
From time to time you may find your IP address temporarily banned if we determine that your site is hitting the servers too hard. In this case, please contact us and let us know which IP has been banned so we can determine how best to serve your needs.
+
From time to time, you may find your IP address temporarily banned, if we determine that your site is hitting our servers too frequently or causing too much of a traffic or load spike. In this case, please contact us and let us know the IP that's been banned so we can work out a solution.
  
 
[[Category: Dreamwidth.org]]
 
[[Category: Dreamwidth.org]]
 
[[Category: Data Sources]]
 
[[Category: Data Sources]]

Revision as of 20:55, 7 May 2017

This is a mirror of the Dreamwidth Studios "Bot Policy", the rules that third party applications and tools must abide by in order to prevent being blocked from accessing the Dreamwidth servers.

The official and most up to date version is always at Dreamwidth's bot policy page.

General Guidelines

The overall guidelines for bot/tool authors are pretty simple:

  • Good clients send a proper user-agent (or other info string) that includes a contact email address. That will let us contact you if there are any problems.
  • Be kind to the system! Rate-limit your requests, and try to cache data whenever possible.
  • Don't screen-scrape the HTML/BML output of the site. If you can't do something you want to do through an API, let us know, and we'll try to add an API for it.
  • Try to avoid asking for users' passwords. If there's something you can't do through the protocol or through our APIs, let us know, and we'll add it. If you have to ask for users' passwords, you should tell people what you'll do with them, and warn users to change their passwords before and after they access your client if they're worried.

Client Applications

A client application is defined as a tool that runs under the control of an end user (whether as a downloadable program or an in-browser object).

Generally, we don't restrict these tools. From time to time, we may need to restrict some of the API methods they use, if there's too much load on the site. We won't do this unless it's absolutely necessary to preserve the proper functioning of the site, and only on a temporary basis.

Third Party Sites/Utilities

We place more restrictions on things that fall into this category, since they have the potential to seriously impact site load. We're here to serve our users (which includes client applications), but third-party sites that scrape data sources aren't our primary audience.

Generally speaking, if your site is going to be small and access Dreamwidth infrequently, you can go right ahead and do it. (Yes, these terms are purposefully vague -- use your best judgement, and contact us with any questions.) If your use becomes a problem, we'll contact you and let you know that your particular tool is turning out to be a load problem. This is one of the reasons why it's so critical to include contact information in your user agent.

If you intend to run a service (or your service gets popular), and you're making frequent requests, contact us and let us know. This will let us examine your tool and see what changes we can make on our end to better support your use, and what changes we can suggest in your tool to be gentler on the site.

IP Rate Limiting

From time to time, you may find your IP address temporarily banned, if we determine that your site is hitting our servers too frequently or causing too much of a traffic or load spike. In this case, please contact us and let us know the IP that's been banned so we can work out a solution.