Wikipedia:Request a query

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

The database replicas do not have access to page content, so queries which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.

Flooding AFD

[edit]

@Czarking0 has an idea about how to measure AFD nominations, specifically focused on how many editors open a disproportionate number of AFD nominations. The context is a discussion about AI/LLM users potentially flooding AFD in the future.

This would require knowing the name of the editor (or some sort of unique identifier that's consistent within the set) who created any given AFD subpage + the creation date for that subpage (e.g., I created Wikipedia:Articles for deletion/Theopathy on July 3rd, so the needed information would be something like "WhatamIdoing – 20250703012900").

But rather than doing Czarking0's original analytical plan, which sounds like a lot of work, could you look at a random sample of weeks in a year, and tell us what the most number of AFD subpages created by any individual editor is? If nobody's normally creating more than 10 or 20 in one week, I think we can safely skip the "hard work" part and propose a limit based on common sense. WhatamIdoing (talk) 01:07, 9 September 2025 (UTC)[reply]

I think there is a lot value in common sense stats. Out of professional pride, I will note there are a ton of caveats for estimating maximums which I would have to re-familiarize myself with to do a true statistical study. In terms of sampling, I think we could draw meaningful results from eight random weeks in each of the last five years. That is also a small enough amount of data that I could do it by hand if needed. Czarking0 (talk) 01:18, 9 September 2025 (UTC)[reply]
Five years takes us back to August/September of 2020, which was during the Covid lockdowns. Editing patterns were off that year. Could we do 10 random non-overlapping weeks in the last four years? WhatamIdoing (talk) 01:22, 9 September 2025 (UTC)[reply]
I think I can save you some steps? quarry:query/97012 has, for each week of the year, everyone who's created more than 10, along with the total afds created that week, starting in 2020. I don't think a random sampling of weeks would have given you a good picture, whether or not you picked one of the weeks where a single user created a full quarter of the afds (there were several). But if you really want the full datasets, let me know. —Cryptic 02:46, 9 September 2025 (UTC)[reply]
It looks like there are 1,661 records showing an editor creating 10+ AFDs in a single (calendar) week.
The highest in that dataset is 135 in a single week, which means an average of 20 per day should have no effect on anyone. Only nine editors ever listed more than 70 in a single week, which means that an average of 10 per day should have almost no effect on anyone. (The underlying goal is to establish a "no flooding" rule before anyone gets the idea to turn an LLM-based bot loose on AFD.)
@Czarking0, I wonder if you think that more evaluation would really be helpful, or if we should just go with a simple "max 70 noms per week" proposal. WhatamIdoing (talk) 06:46, 14 September 2025 (UTC)[reply]
I think this sort of data gets you to something you could build consensus around regardless of my opinion. It really comes down to what does the community consider flooding? 10 in a week does not seem like flooding to me (I think I have done that). 70 in a week does seem like flooding. I agree that additional analysis is mostly academic but I am unsure what the community would think. Czarking0 (talk) 16:21, 14 September 2025 (UTC)[reply]
I've asked people what they think 'flooding' is at Wikipedia:Village pump (idea lab)#Quick straw poll about AFD flooding. (Neither of us have done 10 in the same week during the last five years, but I agree with you that 10 in a week feels very reasonable. I might even accept 10 in a day.) WhatamIdoing (talk) 19:55, 14 September 2025 (UTC)[reply]

List of new WikiProject pages

[edit]
Resolved

I'd like a query that will find pages meeting these criteria:

  • Begin with Wikipedia:WikiProject
  • Are not subpages
  • Have been created recently (e.g., last few months, this calendar year)

WhatamIdoing (talk) 01:20, 9 September 2025 (UTC)[reply]

I recently fixed the one at Database Reports that basically does this but only looks at the past 30 days (from recentchanges). Check this out: Wikipedia:Database reports/New WikiProjects. If you need to go back further, I could rewrite it. Let me know. Stefen 𝕋ower's got the power!!1! GabGruntwerk 02:15, 9 September 2025 (UTC)[reply]
Thanks! I looked back through the history to find some others. It looks like in the last year or so, about 10 pages have been created, and probably more than half of those could be deleted at MFD. I'll take the list to WT:COUNCIL. WhatamIdoing (talk) 02:51, 9 September 2025 (UTC)[reply]
Looking at the database report, I'd add to my original criteria: "4. Not a redirect (including 'not a soft redirect')" and "5. Not already deleted." WhatamIdoing (talk) 02:59, 9 September 2025 (UTC)[reply]
In the previous report, it included a deleted one only because I asked for it to be deleted after the report was generated. Anyway, I think these criteria are mostly doable, although I'm not sure offhand if the database supports checking for soft redirects. I can probably look at this tomorrow. Stefen 𝕋ower's got the power!!1! GabGruntwerk 03:17, 9 September 2025 (UTC)[reply]
WhatamIdoing: Check out this query. It goes back six months, and should keep out both hard and soft redirects (removing soft redirects was easier than I initially thought). Also, there should never be any deleted ones in the list. Let me know if this works for you. Stefen 𝕋ower's got the power!!1! GabGruntwerk 04:25, 10 September 2025 (UTC)[reply]
Yes, that looks good. WhatamIdoing (talk) 23:08, 10 September 2025 (UTC)[reply]
(Wouldn't using the page creation log be easier and faster? Plus, it wouldn't limit you to 30 days' age. —Cryptic 03:05, 9 September 2025 (UTC))[reply]
Perhaps. I didn't challenge myself to make the existing report better like I did with "WikiProjects by human changes". I literally just fixed the database bug in it. Stefen 𝕋ower's got the power!!1! GabGruntwerk 03:10, 9 September 2025 (UTC)[reply]
Hrm. Neither of these are going to find pages created somewhere else and later moved to Wikipedia:WikiProject Whatever, like Wikipedia:WikiProject Sustainable Development was in July. I found that by starting at page and working back to a creation timestamp, but that wouldn't work either for a page created long ago and only recently moved. And the move log is miserable to work with, though since we only care about the current location, it's at least feasible. —Cryptic 03:23, 9 September 2025 (UTC)[reply]
That looks like the only one, which isn't really surprising. I didn't attempt to look more than one move back in the query, but there were few enough to check manually; all the other moves this year were either from other WP:WikiProject* titles or were reverts of moves away from those. —Cryptic 03:43, 9 September 2025 (UTC)[reply]
In that case, I could check for "Wikiproject" as well as "WikiProject". Since we show redirects, we would pick up where it originally came from. That may not resolve all cases, but resolving all cases might be harsh, performance-wise. Stefen 𝕋ower's got the power!!1! GabGruntwerk 03:45, 9 September 2025 (UTC)[reply]
The case I expected to find (and didn't initially) was a page created in userspace or draft:, edited a bit, then moved to WP:WikiProject*. Going back a bit further to the start of 2023, that's happened at (at least) Wikiproject Africa environment, Draft:Wikipedia:WikiProject Australian Transport, User:LoomCreek/sandbox/Wikiproject Housing and Tenant Rights, and User:Zazpot/WikiProject Risk. There aren't enough pages currently at Wikipedia:WikiProject% that querying them is time-intensive; the main issue is that it's very difficult to look more than one move back to omit the innocuous moves like WP:WikiProject Ireland->Baldonnel, County Dublin->WP:WikiProject Ireland but not ones like [[(some hypothetical title in userspace)]]->WP:WikiProject Abandoned Articles->WP:WikiProject Abandoned articles. —Cryptic 03:58, 9 September 2025 (UTC)[reply]
Based on my current understanding (and I could be mistaken), this is resolved in the database with rev_parent_id=0 after a page move. So if you're looking for when a WikiProject enters the project namespace, you know that it has become a "new" WikiProject. When it's a draft or in user space, it's in development (or just a user's brainstorming), and we don't care about those for our purposes. Stefen 𝕋ower's got the power!!1! GabGruntwerk 09:16, 10 September 2025 (UTC)[reply]
rev_parent_id isn't completely robust; it's not updated when history is merged or split. So, for example, there are three revisions of 2024 Minas Gerais road crash with rev_parent_id=0 (1264606347, 1264609936, 1264650842) and none of Home library service. The way to get around that is to JOIN revision ON rev_id = (SELECT rev_id FROM revision WHERE /*something with a composite index with rev_timestamp, usually rev_page=some page id*/ ORDER BY rev_timestamp ASC LIMIT 1), which looks like it's less efficient than looking at rev_parent_id but is actually much more so, and always finds the earliest visible revision at a given page (or other constraints).
That's admittedly rare, but it does happen. The more serious problem with your query 97050 is that a page that has more than a six-month delay between creation and their move to a WikiProject title will never show up on it. —Cryptic 15:03, 10 September 2025 (UTC)[reply]
Since we're talking about WikiProjects, I'm not sure why we would focus on article examples. Given WikiProjects go through these type of changes, are they of concern to us here? If a WikiProject is merged or split, that is occurring normally per a community decision, likely well after they are "new", and a new WikiProject created from a split would show up as a new page.
Per the last point, I had to pick a cut-off point per the request. It can easily be extended to longer periods. Also, we're looking at new pages starting with 'WikiProject'. If there is a move to a WikiProject title during the six months, it should be captured. I don't see the serious problem here, but if you can rework the query to demonstrate the issue, that would show me what I'm missing here. I am teachable. :) Stefen 𝕋ower's got the power!!1! GabGruntwerk 19:19, 10 September 2025 (UTC)[reply]
Also, I've written a subquery like you show for a different report to find an original creator of an article, like when it was a draft or in their user space (so I can show the true creator), but since we're only looking at new pages in the project space, this case seems awfully rare. Stefen 𝕋ower's got the power!!1! GabGruntwerk 19:34, 10 September 2025 (UTC)[reply]
Wikipedia:WikiProject Risk was in userspace for more than six years before being moved to its current title. It's conceivable that some day, a page created in 2002 could be moved to a WikiProject title. What you're querying for is pages currently at a WP:WikiProject title whose first revision was within the last six months. That's inevitably going to eventually miss newly-blue WikiProject titles no matter what time cutoff you pick. Since there's generally going to be few results, I'd think it would be more preferable to have false positives (by including all recent moves, even if they were quickly reverted like the one from WP:WikiProject Ireland above; or even if merely from one WikiProject title to another, like the one from Wikipedia:WikiProject Abandoned Articles, also above) than false negatives.
I picked the anomalous rev_parent_id examples because those were the first illustrative ones I found going through my own deletion log and looking for "history merge" and "history split". If you want examples more relevant to the question at hand, then Wikipedia:WikiProject Finance & Investment, Wikipedia:WikiProject Germany, Wikipedia:WikiProject Protected areas, and Wikipedia:WikiProject Women Do News all have two revisions each with rev_parent_id = 0; and Wikipedia:WikiProject Rosblofnari has none. —Cryptic 00:08, 11 September 2025 (UTC)[reply]
Per the first point, I don't see significant damage from ignoring these rare cases. After all, we'll see these projects pop up in the "WikiProjects by human changes" and "WikiProjects by changes" reports. But if you want to modify the query I wrote that WhatamIdoing requested, feel free.
Per the second point, I will review the implications for the New WikiProjects report and revise the SQL if necessary. Note this wasn't originally my code, and my effort was focused on making sure it didn't fail due to database changes, rather than making it smarter. But I will look at it. Thank you for your ideas! Stefen 𝕋ower's got the power!!1! GabGruntwerk 00:20, 11 September 2025 (UTC)[reply]

Hot articles

[edit]

I'm trying to come up with a query for the most edited pages with a talk page category (say Category:Horse breeds task force articles) in the last 7 days. This is for use in Module:Database reports/Hot articles which currently supports providing a PageAssessments-recognized project as input but not a category. (Prior art: https://github.com/kaldari/hotarticles/blob/master/runbot.php, but seems a bit obfuscated.) – SD0001 (talk) 15:36, 14 September 2025 (UTC)[reply]

The github version jumps through a lot of hoops trying to make things more efficient, but it doesn't matter for the common case and I don't... think... it would help much for cases with many pages in the category and very, very many revisions in the articles. Especially since it only looks at recentchanges anyway, so can only ever see revisions from the last 30 days. (Also, it can go by talk pages linked from a template instead of in a category, but that difference is even smaller now.) quarry:query/97157 for a version that looks at revision, so it can go back further than 30 days; the change for the template version is trivial - categorylinks to templatelinks and cl_* to tl_* - and if it really does get bad performance with ancient articles and you don't need to look past the recentchanges period, that's straightforward too. —Cryptic 16:42, 14 September 2025 (UTC)[reply]
Here is a simplified version that goes by the talk page category, but I have been using a more sophisticated approach for my version of Hot Articles that still goes by the PageAssessments project name. Stefen 𝕋ower's got the power!!1! GabGruntwerk 16:49, 14 September 2025 (UTC)[reply]
Hunh. I had thought FETCH FIRST 10 ROWS ONLY was an Oracleism; no idea that Mariadb supported it. —Cryptic 17:13, 14 September 2025 (UTC)[reply]
This was something I discovered when hunting down a way to do proper ranking, not wanting to leave out articles that have the same number of edits at the end of the list. I guess it can be called a "10+" result. Stefen 𝕋ower's got the power!!1! GabGruntwerk 17:24, 14 September 2025 (UTC)[reply]
I'd probably have flailed around with something based on DENSE_RANK(). This is a lot more elegant. —Cryptic 17:27, 14 September 2025 (UTC)[reply]
(Which oracle?) WhatamIdoing (talk) 19:56, 14 September 2025 (UTC)[reply]
Oracle Database by Oracle Corporation. Stefen 𝕋ower's got the power!!1! GabGruntwerk 20:18, 14 September 2025 (UTC)[reply]
Also here is an implementation of my version of Hot Articles which includes the specific query in production (trimmed down a bit from the one in Quarry). Stefen 𝕋ower's got the power!!1! GabGruntwerk 17:28, 14 September 2025 (UTC)[reply]
Might be worthwhile to go with a recentchanges-based query for the convenient access to rc_bot. You can sort of get at it from revision, but only by looking at the editor's current groups; and that's not guaranteed to be accurate as of when the edit was made. —Cryptic 18:09, 14 September 2025 (UTC)[reply]
Thanks all. I set up Module:Database reports/Hot articles by category based on Cryptic's query. I wanted to add it as an option within Module:Database reports/Hot articles but that proved too difficult due to the different number of columns – the by-category listing doesn't contain rating. – SD0001 (talk) 04:40, 24 September 2025 (UTC)[reply]

Query to find articles using both Template:use dmy dates and Template:use mdy dates (and similar)

[edit]

This is a request for a query to find articles using both Template:use dmy dates and Template:use mdy dates, motivated by this conversation. I know that it is possible to use "hastemplate" searches for this, but I'd like to incorporate it into a database report.

I would also like to include similar conflicts, such as articles that contain both {{Use British English}} and {{Use American English}}, so a general-purpose query that lists pairs of such templates would be helpful, so that I can expand the report as more conflicting pairs of templates are identified.

Please let me know if this request is not specified well enough. – Jonesey95 (talk) 21:01, 15 September 2025 (UTC)[reply]

quarry:query/97196. —Cryptic 22:03, 15 September 2025 (UTC)[reply]
Super. And thanks for tweaking Wikipedia:Database reports/Multiple "Use" templates. I was making the exact same edit after reading the documentation at {{Database report}}. – Jonesey95 (talk) 22:47, 15 September 2025 (UTC)[reply]