CZ:Bots: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Daniel Mietchen
(slight expansion)
imported>D. Matt Innis
(from wikipedia)
Line 14: Line 14:
# The bot approval period should allow for some test runs, which have to be labeled as such in the edit summary, and the used commands documented.
# The bot approval period should allow for some test runs, which have to be labeled as such in the edit summary, and the used commands documented.
# Turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.
# Turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.
==Wikipedia's bot considerations==
Wikipedia's bot policy includes some failsafe features that restrict bots' functions. CZs policy might want to consider them as well.  The following is a direct copy and paste from wikipedia:
In order for a bot to be approved, its operator should demonstrate that it:
:* is harmless
:* is useful
:* does not consume resources unnecessarily
:* performs only tasks for which there is [[Wikipedia:Consensus|consensus]]
:* carefully adheres to relevant [[Wikipedia:Policies and guidelines|policies and guidelines]]
:* uses informative messages, appropriately worded, in any edit summaries or messages left for users
The bot account's [[Wikipedia:User page|user page]] should identify the bot as such using the {{tl|bot}} tag. The following information should be provided on, or linked from, both the bot account's userpage and the approval request:
:* Details of the bot's task (or tasks)
:* Whether the bot is manually assisted or runs automatically
:* When it operates (continuously, intermittently, or at specified intervals), and at what rate
:* The language and/or program that it is running
While performance is [[Wikipedia:Don't worry about performance|not generally an issue]], bot operators should recognize that a bot making many requests or editing at a high speed has a much greater effect than the average contributor. Operators should be careful not to make unnecessary Web requests, and be conservative in their editing speed. [[mw:Developers|Developers]] will inform the community if performance issues of any significance do arise, and in such situations, their directives must be followed.
:* Bots in trial periods, and approved bots performing all but the most trivial or urgent tasks, should be run at a rate that permits review of their edits when necessary.
:* Unflagged bots should edit more slowly than flagged bots, as their edits are visible in user watchlists.
:* The urgency of a task should always be considered; tasks that do not need to be completed quickly (for example, renaming [[Wikipedia:Categorization|categories]]) can and should be accomplished at a slower rate than those that do (for example, reverting [[Wikipedia:Vandalism|vandalism]]).
:* Bots' editing speed should be regulated in some way; subject to approval, bots doing non-urgent tasks may edit approximately once every ten seconds, while bots doing more urgent tasks may edit approximately once every five seconds.
:* Bots editing at a high speed should operate more slowly during peak hours (1200–0400 UTC), and days (middle of the week, especially Wednesdays and Thursdays) than during the quietest times (weekends). [http://toolserver.org/~leon/stats/reqstats/page.php Traffic statistics] {{deadlink}} are available.
:* Bots' editing speed may also be adjusted based on slave database server lag; this allows bots to edit more quickly during quiet periods while slowing down considerably when server load is high. This can be achieved by appending an extra parameter to the query string of each requested URL; see [[mw:Manual:Maxlag parameter]] for more details.
Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted. When such content is required, download [http://download.wikimedia.org/ database dumps] instead. Bots that require access to run queries on Wikipedia databases may be run on the [[m:Toolserver|toolserver]]; such processes are outside the scope of this policy.


==Further points==
==Further points==

Revision as of 09:17, 29 September 2009

As of September 2009, Citizendium does not have any official policy on running scripts or bots — e.g. by means of the Python wikipedia robot framework, as explained here — though they have been run on occasion, and more frequently in recent months.

This page is intended to help draft such a policy. Feel free to rearrange or comment as you see fit. For background, see these two discussions in the Citizendium Forum.

Contact during the drafting phase: Daniel Mietchen

The main points the policy should address:

  1. We need more than one Citizen to administer this - we all have real-life obligations, but someone who took the time to set up a bot script should be able to receive prompt reaction
  2. No bot run without bot account (however, we need a clear procedure how to apply for these accounts), though one could think of a solution in which any bot is allowed to be run from a user account if its application has been properly filed but received no opposing reaction within a week
  3. One script per bot account (except for some well-defined minor jobs maybe that could be performed by a maintenance bot or even from some user accounts).
  4. Source code has to be posted on CZ before application for the corresponding bot account.
  5. Bots should be run such that they can be undone by an existing bot, the command for which would have to be specified upon application. For scripts, this is probably too much to demand, so they are limited to single runs, and to less than 500 edited pages over the course of one month edited pages (note: this number is defined at CZ:Bot threshold).
  6. The bot approval should include a statement on the expected traffic volume and scope of the bot.
  7. The bot approval period should allow for some test runs, which have to be labeled as such in the edit summary, and the used commands documented.
  8. Turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.

Wikipedia's bot considerations

Wikipedia's bot policy includes some failsafe features that restrict bots' functions. CZs policy might want to consider them as well. The following is a direct copy and paste from wikipedia:

In order for a bot to be approved, its operator should demonstrate that it:

  • is harmless
  • is useful
  • does not consume resources unnecessarily
  • performs only tasks for which there is consensus
  • carefully adheres to relevant policies and guidelines
  • uses informative messages, appropriately worded, in any edit summaries or messages left for users

The bot account's user page should identify the bot as such using the {{bot}} tag. The following information should be provided on, or linked from, both the bot account's userpage and the approval request:

  • Details of the bot's task (or tasks)
  • Whether the bot is manually assisted or runs automatically
  • When it operates (continuously, intermittently, or at specified intervals), and at what rate
  • The language and/or program that it is running

While performance is not generally an issue, bot operators should recognize that a bot making many requests or editing at a high speed has a much greater effect than the average contributor. Operators should be careful not to make unnecessary Web requests, and be conservative in their editing speed. Developers will inform the community if performance issues of any significance do arise, and in such situations, their directives must be followed.

  • Bots in trial periods, and approved bots performing all but the most trivial or urgent tasks, should be run at a rate that permits review of their edits when necessary.
  • Unflagged bots should edit more slowly than flagged bots, as their edits are visible in user watchlists.
  • The urgency of a task should always be considered; tasks that do not need to be completed quickly (for example, renaming categories) can and should be accomplished at a slower rate than those that do (for example, reverting vandalism).
  • Bots' editing speed should be regulated in some way; subject to approval, bots doing non-urgent tasks may edit approximately once every ten seconds, while bots doing more urgent tasks may edit approximately once every five seconds.
  • Bots editing at a high speed should operate more slowly during peak hours (1200–0400 UTC), and days (middle of the week, especially Wednesdays and Thursdays) than during the quietest times (weekends). Traffic statistics Template:Deadlink are available.
  • Bots' editing speed may also be adjusted based on slave database server lag; this allows bots to edit more quickly during quiet periods while slowing down considerably when server load is high. This can be achieved by appending an extra parameter to the query string of each requested URL; see mw:Manual:Maxlag parameter for more details.

Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted. When such content is required, download database dumps instead. Bots that require access to run queries on Wikipedia databases may be run on the toolserver; such processes are outside the scope of this policy.

Further points

  1. Should there be a distinction between automated edits that concern (i) contents, (ii) page formatting or (iii) page contextualization?

See also