CZ:Bots

From Citizendium
Revision as of 16:09, 16 January 2011 by imported>Daniel Mietchen (another year)
Jump to navigation Jump to search

As of January 2011, Citizendium does not have any official policy on running scripts or bots — e.g. by means of the Python wikipedia robot framework, as explained here — though they have been run on occasion, and more frequently in recent months.

This page is intended to help draft such a policy. Feel free to rearrange or comment as you see fit. For background, see these two discussions in the Citizendium Forum.

Contact during the drafting phase: Daniel Mietchen

The main points the policy should address:

  1. We need more than one Citizen to administer this - we all have real-life obligations, but someone who took the time to set up a bot script should be able to receive prompt reaction
  2. No bot run without bot account (however, we need a clear procedure how to apply for these accounts), though one could think of a solution in which any bot is allowed to be run from a user account if its application has been properly filed but received no opposing reaction within a week. For scripts, documented requests by another Citizen may well be enough justification.
  3. One script per bot account (except for some well-defined minor jobs maybe that could be performed by a maintenance bot or even from some user accounts).
  4. Source code has to be posted on CZ before application for the corresponding bot account.
  5. Bots should be run such that they can be undone by an existing bot, the command for which would have to be specified upon application. For scripts, this is probably too much to demand, so they are limited to single runs or to less than 500 edited pages over the course of one month (note: this number is defined at CZ:Bot threshold).
  6. The bot approval should include a statement on the expected traffic volume and scope of the bot.
  7. The bot approval period should allow for some test runs. If more than 5 test edits are made, they have to be labeled as such in the edit summary. The used commands always have to be documented.
  8. The edit summary should include a link to the Community Feedback page (example).
  9. Turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.

Housekeeping bot

A Housekeeping Bot (discussion) account will be used for scripts (less than 500 edits) and for bots that will only be used once. The account will be blocked until it's needed.

Wikipedia's bot considerations

Wikipedia's bot policy includes some failsafe features that restrict bots' functions. CZs policy might want to consider them as well. The following is a direct copy and paste from wikipedia:

In order for a bot to be approved, its operator should demonstrate that it:

  • is harmless
  • is useful
  • does not consume resources unnecessarily
  • performs only tasks for which there is consensus
  • carefully adheres to relevant policies and guidelines
  • uses informative messages, appropriately worded, in any edit summaries or messages left for users

The bot account's user page should identify the bot as such using the {{bot}} tag. The following information should be provided on, or linked from, both the bot account's userpage and the approval request:

  • Details of the bot's task (or tasks)
  • Whether the bot is manually assisted or runs automatically
  • When it operates (continuously, intermittently, or at specified intervals), and at what rate
  • The language and/or program that it is running

While performance is not generally an issue, bot operators should recognize that a bot making many requests or editing at a high speed has a much greater effect than the average contributor. Operators should be careful not to make unnecessary Web requests, and be conservative in their editing speed. Developers will inform the community if performance issues of any significance do arise, and in such situations, their directives must be followed.

  • Bots in trial periods, and approved bots performing all but the most trivial or urgent tasks, should be run at a rate that permits review of their edits when necessary.
  • Unflagged bots should edit more slowly than flagged bots, as their edits are visible in user watchlists.
  • The urgency of a task should always be considered; tasks that do not need to be completed quickly (for example, renaming categories) can and should be accomplished at a slower rate than those that do (for example, reverting vandalism).
  • Bots' editing speed should be regulated in some way; subject to approval, bots doing non-urgent tasks may edit approximately once every ten seconds, while bots doing more urgent tasks may edit approximately once every five seconds.
  • Bots editing at a high speed should operate more slowly during peak hours (1200–0400 UTC), and days (middle of the week, especially Wednesdays and Thursdays) than during the quietest times (weekends). Traffic statistics Template:Deadlink are available.
  • Bots' editing speed may also be adjusted based on slave database server lag; this allows bots to edit more quickly during quiet periods while slowing down considerably when server load is high. This can be achieved by appending an extra parameter to the query string of each requested URL; see mw:Manual:Maxlag parameter for more details.

Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted. When such content is required, download database dumps instead. Bots that require access to run queries on Wikipedia databases may be run on the toolserver; such processes are outside the scope of this policy.

Further points

  1. Should there be a distinction between automated edits that concern (i) contents, (ii) page formatting or (iii) page contextualization?

See also