CZ:Bots: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Daniel Mietchen
m (linking to existing bot policies on other wikis)
imported>John Stephenson
(seealso)
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
As of September 2009, Citizendium does not have any official policy on running scripts or bots by means of the [http://pywikipediabot.sourceforge.net/ Python wikipedia robot framework], as explained [http://meta.wikimedia.org/wiki/Using_the_python_wikipediabot here], though they have been run on occasion, and more frequently in recent months.
{{seealso|CZ:Bot status}}
 
As of January 2011, Citizendium does not have any official policy on running [[CZ:Script|scripts]] or [[CZ:Bot|bots]] — e.g. by means of the [http://pywikipediabot.sourceforge.net/ Python wikipedia robot framework], as explained [http://meta.wikimedia.org/wiki/Using_the_python_wikipediabot here] — though they have been run on occasion, and more frequently in recent months.


This page is intended to help draft such a policy. Feel free to rearrange or comment as you see fit. For background, see these [http://forum.citizendium.org/index.php/topic,2752.msg21953.html two] [http://forum.citizendium.org/index.php/topic,2850.msg23214.html#msg23214 discussions] in the [http://forum.citizendium.org/ Citizendium Forum].
This page is intended to help draft such a policy. Feel free to rearrange or comment as you see fit. For background, see these [http://forum.citizendium.org/index.php/topic,2752.msg21953.html two] [http://forum.citizendium.org/index.php/topic,2850.msg23214.html#msg23214 discussions] in the [http://forum.citizendium.org/ Citizendium Forum].
Line 6: Line 8:


The main points the policy should address:
The main points the policy should address:
# we need more than one Citizen to administer this - we all have real-life obligations, but someone who took the time to write a bot script should be able to receive prompt reaction  
# We need more than one Citizen to administer this - we all have real-life obligations, but someone who took the time to set up a bot script should be able to receive prompt reaction  
# no bot run without bot account (however, we need a clear procedure how to apply for these accounts), though one could think of a solution in which any bot is allowed to be run from a user account if its application has been properly filed but received no opposing reaction within a week
# No bot run without bot account (however, we need a clear [[CZ:Application for bot accounts|procedure how to apply for these accounts]]), though one could think of a solution in which any bot is allowed to be run from a user account if its application has been properly filed but received no opposing reaction within a week. For scripts, [[CZ:Bot status|documented requests]] by another Citizen may well be enough justification.
# one script per bot account (except for some well-defined minor jobs maybe that could be performed by a maintenance bot or even from some user accounts).
# One script per bot account (except for some well-defined minor jobs maybe that could be performed by a maintenance bot or even from some user accounts).
# source code has to be posted on CZ before application for the corresponding bot account.
# Source code has to be posted on CZ before application for the corresponding bot account.
# Bots should be run such that they can be undone by an existing bot, the command for which would have to be specified upon application. For scripts, this is probably too much to demand, so they should be limited in scope.
# Bots should be run such that they can be undone by an existing bot, the command for which would have to be specified upon application. For scripts, this is probably too much to demand, so they are limited to single runs or to less than {{:CZ:Bot threshold}} (note: this number is defined at [[CZ:Bot threshold]]).
# the bot approval should include a statement on the traffic volume and scope of the bot.
# The bot approval should include a statement on the expected traffic volume and scope of the bot.
# turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.
# The bot approval period should allow for some test runs. If more than 5 test edits are made, they have to be labeled as such in the edit summary. The used commands always have to be documented.
#The edit summary should include a link to the Community Feedback page ([http://en.citizendium.org/wiki?title=James_Jones_%28disambiguation%29/Definition&curid=100134024&diff=100626948&oldid=100613973 example]).
# Turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.
 
==Housekeeping bot==
A [[User:Housekeeping Bot|Housekeeping Bot]] [[User Talk:Housekeeping Bot|(discussion)]] account will be used for scripts (less than 500 edits) and for bots that will only be used once.  The account will be blocked until it's needed.
 
==Wikipedia's bot considerations==
Wikipedia's bot policy includes some failsafe features that restrict bots' functions. CZs policy might want to consider them as well.  The following is a direct copy and paste from wikipedia:
 
In order for a bot to be approved, its operator should demonstrate that it:
:* is harmless
:* is useful
:* does not consume resources unnecessarily
:* performs only tasks for which there is [[Wikipedia:Consensus|consensus]]
:* carefully adheres to relevant [[Wikipedia:Policies and guidelines|policies and guidelines]]
:* uses informative messages, appropriately worded, in any edit summaries or messages left for users
 
The bot account's [[Wikipedia:User page|user page]] should identify the bot as such using the {{tl|bot}} tag. The following information should be provided on, or linked from, both the bot account's userpage and the approval request:
:* Details of the bot's task (or tasks)
:* Whether the bot is manually assisted or runs automatically
:* When it operates (continuously, intermittently, or at specified intervals), and at what rate
:* The language and/or program that it is running
 
While performance is [[Wikipedia:Don't worry about performance|not generally an issue]], bot operators should recognize that a bot making many requests or editing at a high speed has a much greater effect than the average contributor. Operators should be careful not to make unnecessary Web requests, and be conservative in their editing speed. [[mw:Developers|Developers]] will inform the community if performance issues of any significance do arise, and in such situations, their directives must be followed.
 
:* Bots in trial periods, and approved bots performing all but the most trivial or urgent tasks, should be run at a rate that permits review of their edits when necessary.
:* Unflagged bots should edit more slowly than flagged bots, as their edits are visible in user watchlists.
:* The urgency of a task should always be considered; tasks that do not need to be completed quickly (for example, renaming [[Wikipedia:Categorization|categories]]) can and should be accomplished at a slower rate than those that do (for example, reverting [[Wikipedia:Vandalism|vandalism]]).
:* Bots' editing speed should be regulated in some way; subject to approval, bots doing non-urgent tasks may edit approximately once every ten seconds, while bots doing more urgent tasks may edit approximately once every five seconds.
:* Bots editing at a high speed should operate more slowly during peak hours (1200–0400 UTC), and days (middle of the week, especially Wednesdays and Thursdays) than during the quietest times (weekends). [http://toolserver.org/~leon/stats/reqstats/page.php Traffic statistics] {{deadlink}} are available.
:* Bots' editing speed may also be adjusted based on slave database server lag; this allows bots to edit more quickly during quiet periods while slowing down considerably when server load is high. This can be achieved by appending an extra parameter to the query string of each requested URL; see [[mw:Manual:Maxlag parameter]] for more details.
 
Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted. When such content is required, download [http://download.wikimedia.org/ database dumps] instead. Bots that require access to run queries on Wikipedia databases may be run on the [[m:Toolserver|toolserver]]; such processes are outside the scope of this policy.
 
==Further points==
#Should there be a distinction between automated edits that concern (i) contents, (ii) page formatting or (iii) page contextualization?


==See also==
==See also==
*Bot policies at the [http://en.wikipedia.org/wiki/Wikipedia:Bots English Wikipedia], [http://de.wikipedia.org/wiki/Wikipedia:Bots German Wikipedia], [http://en.wikiversity.org/wiki/Wikiversity:Bots English Wikiversity]
*Bot policies at the [http://en.wikipedia.org/wiki/Wikipedia:Bots English Wikipedia], [http://de.wikipedia.org/wiki/Wikipedia:Bots German Wikipedia], [http://en.wikiversity.org/wiki/Wikiversity:Bots English Wikiversity]
{{Technical Help}}

Latest revision as of 17:13, 26 February 2021

See also: CZ:Bot status

As of January 2011, Citizendium does not have any official policy on running scripts or bots — e.g. by means of the Python wikipedia robot framework, as explained here — though they have been run on occasion, and more frequently in recent months.

This page is intended to help draft such a policy. Feel free to rearrange or comment as you see fit. For background, see these two discussions in the Citizendium Forum.

Contact during the drafting phase: Daniel Mietchen

The main points the policy should address:

  1. We need more than one Citizen to administer this - we all have real-life obligations, but someone who took the time to set up a bot script should be able to receive prompt reaction
  2. No bot run without bot account (however, we need a clear procedure how to apply for these accounts), though one could think of a solution in which any bot is allowed to be run from a user account if its application has been properly filed but received no opposing reaction within a week. For scripts, documented requests by another Citizen may well be enough justification.
  3. One script per bot account (except for some well-defined minor jobs maybe that could be performed by a maintenance bot or even from some user accounts).
  4. Source code has to be posted on CZ before application for the corresponding bot account.
  5. Bots should be run such that they can be undone by an existing bot, the command for which would have to be specified upon application. For scripts, this is probably too much to demand, so they are limited to single runs or to less than 500 edited pages over the course of one month (note: this number is defined at CZ:Bot threshold).
  6. The bot approval should include a statement on the expected traffic volume and scope of the bot.
  7. The bot approval period should allow for some test runs. If more than 5 test edits are made, they have to be labeled as such in the edit summary. The used commands always have to be documented.
  8. The edit summary should include a link to the Community Feedback page (example).
  9. Turning user-run bot jobs into cron jobs should be a valid option for well-tested scripts, but this would require involvement of someone who actually has access to the servers.

Housekeeping bot

A Housekeeping Bot (discussion) account will be used for scripts (less than 500 edits) and for bots that will only be used once. The account will be blocked until it's needed.

Wikipedia's bot considerations

Wikipedia's bot policy includes some failsafe features that restrict bots' functions. CZs policy might want to consider them as well. The following is a direct copy and paste from wikipedia:

In order for a bot to be approved, its operator should demonstrate that it:

  • is harmless
  • is useful
  • does not consume resources unnecessarily
  • performs only tasks for which there is consensus
  • carefully adheres to relevant policies and guidelines
  • uses informative messages, appropriately worded, in any edit summaries or messages left for users

The bot account's user page should identify the bot as such using the {{bot}} tag. The following information should be provided on, or linked from, both the bot account's userpage and the approval request:

  • Details of the bot's task (or tasks)
  • Whether the bot is manually assisted or runs automatically
  • When it operates (continuously, intermittently, or at specified intervals), and at what rate
  • The language and/or program that it is running

While performance is not generally an issue, bot operators should recognize that a bot making many requests or editing at a high speed has a much greater effect than the average contributor. Operators should be careful not to make unnecessary Web requests, and be conservative in their editing speed. Developers will inform the community if performance issues of any significance do arise, and in such situations, their directives must be followed.

  • Bots in trial periods, and approved bots performing all but the most trivial or urgent tasks, should be run at a rate that permits review of their edits when necessary.
  • Unflagged bots should edit more slowly than flagged bots, as their edits are visible in user watchlists.
  • The urgency of a task should always be considered; tasks that do not need to be completed quickly (for example, renaming categories) can and should be accomplished at a slower rate than those that do (for example, reverting vandalism).
  • Bots' editing speed should be regulated in some way; subject to approval, bots doing non-urgent tasks may edit approximately once every ten seconds, while bots doing more urgent tasks may edit approximately once every five seconds.
  • Bots editing at a high speed should operate more slowly during peak hours (1200–0400 UTC), and days (middle of the week, especially Wednesdays and Thursdays) than during the quietest times (weekends). Traffic statistics Template:Deadlink are available.
  • Bots' editing speed may also be adjusted based on slave database server lag; this allows bots to edit more quickly during quiet periods while slowing down considerably when server load is high. This can be achieved by appending an extra parameter to the query string of each requested URL; see mw:Manual:Maxlag parameter for more details.

Bots that download substantial portions of Wikipedia's content by requesting many individual pages are not permitted. When such content is required, download database dumps instead. Bots that require access to run queries on Wikipedia databases may be run on the toolserver; such processes are outside the scope of this policy.

Further points

  1. Should there be a distinction between automated edits that concern (i) contents, (ii) page formatting or (iii) page contextualization?

See also


Citizendium Technical Help
How to edit an article | Searching | Start article with subpages
The Article Checklist | Subpage template

|width=10% align=center style="background:#F5F5F5"|  |}