Wikipedia talk:Wikipedia articles written in the greatest number of languages

Lots of work necessary

[edit]

I think this is an important component of Wikipedia statistics, but the page obviously needs a lot of development. The goal is to determine the top 50 or 100 articles by multi-lingual representation across all the Wikipedia sites, i.e. the articles covered by the most Wikipedias. This would require an automated program that could evaluate the number of languages each Wikipedia article is available in. Ideally, it would evaluate every article in every Wikipedia, since (although it's improbable) an article not covered in the English Wikipedia is covered in a vast number of other Wikipedias. As of now, the page just shows a sample of articles that were chosen because, hypothetically, they would be available in many different Wikipedia languages. The page counts were determined by copying and pasting an article's language bar into a Word document, then using Word Count to check the number of lines (i.e. the number of languages it is covered in). If you do this, make sure to add 1 to the total, as the language bar does not include the viewing language.

The title is also pretty clunky and is certainly subject to change. Thanks for your help. There are some great Wikipedia statistics pages, and I hope this can become one of them. KBurchfiel (talk) 01:22, 9 July 2011 (UTC)[reply]

And in the meantime, if you can think of an article that might rank among or atop this list, you're welcome to add it. KBurchfiel (talk) 01:25, 9 July 2011 (UTC)[reply]

Wikipedia namespace rather than article space?

[edit]

Shouldn't this be in Wikipedia: namespace rather than article space? It is of mostly (granted, not entirely) internal interest, and, as it stands, is entirely WP:OR. LadyofShalott 01:27, 9 July 2011 (UTC)[reply]


You would know better than I would. Feel free to move it to namespace if it belongs there. I agree that it's internal interest and that it's original research, but hopefully the latter can change as the page develops. KBurchfiel (talk) 05:22, 10 July 2011 (UTC)[reply]

OK, I'll go ahead and move it then. If enough external sources can be found to establish notability as a regular article, the location question can be revisited at that point. LadyofShalott 00:18, 11 July 2011 (UTC)[reply]

Thanks for the move--I appreciate your help in finding the page a good home. :-) KBurchfiel (talk) 07:38, 11 July 2011 (UTC)[reply]

Suggestion: Limit to +200 languages?

[edit]

Unless the task is automated, it could be useful to set a higher treshold if we plan to make a somewhat complete list at any point. P G-A G (talk) 20:09, 3 July 2023 (UTC)[reply]

I wouldn't pick a single threshold. To be listed on 120 Wikipedias is very significant for a book, but not very significant for a chemical element. Even 240 Wikipedias isn't a lot for a country, although I think we ought to include all the countries anyway, just so people can see how they compare to each other. - Burner89751654 (talk) 18:35, 31 August 2024 (UTC)[reply]

What is with the amount of David Woodard articles??

[edit]

Page gets a couple hundred views a day yet has 320 articles total, beating people like Michael Jackson and Jesus Christ himself. Is there a reason to this insane amount of translations? Because I'm curious. XanderK09 (talk) 21:32, 30 August 2024 (UTC)[reply]

I don't know. But someone on Reddit pointed out that a lot of the transclusions (looks like between 100 and 200, compared to 320 total) were all made by the same account, whose list of Wikipedias he's edited can be seen here: Special:CentralAuth/Swmmng. It's possible the same guy also created the other transclusions using different accounts. I assume that's David himself doing it for promotion, or a fan of his doing it as a tribute, or some guy doing it as a joke. And normally I'd say Wikipedia shouldn't be cluttered up with things like this. But I think we have enough room for one such article, which is worth keeping out of admiration for the effort and the ingenuity that went into it. - Burner89751654 (talk) 00:13, 31 August 2024 (UTC)[reply]
@XanderK09 and Burner89751654: I just completed an investigation into the David Woodard case; it was brought up to this number of articles by a large network of users and IP proxies (which I suspect was operated by Woodard himself), attempting to spam him to as many wikis as possible. The global stewards have already deleted 235 pages, and other wikis are now being made aware of the issue for possible deletion discussions. I disagree that it is worth keeping out of admiration for the effort and the ingenuity that went into it; we should not tolerate spam and self-promotion. Just because it's high effort, doesn't make it a good effort (indeed many of the articles were stubs and crap machine translations). --Grnrchst (talk) 09:04, 1 July 2025 (UTC)[reply]

Reformatting list

[edit]

What would people think of reformatting the list so that it was in the order rank-article-languages-category-subcategory-last updated, so that it looked like this? Did it in user space to take advantage of the visual editor's table formatting, but if people are down we can copy it into this one. Ranking would only need to go up to 50 or so, unless there's a way to automate it that I don't realize.MW(tc) 07:06, 10 May 2025 (UTC)[reply]

Automated Algorithm I've Made to Automate Updating

[edit]

I made a Python script to update the list automatically. I have a file "articles.csv" with all of the articles and their category and subcategories. The script, "wiki_langs.py" then runs through each article, sees how many languages each article is in, and then formats it all into a wikitable like the one that appears in "Wikipedia:Wikipedia articles written in the greatest number of languages". It also automatically sorts the articles from the most languages to the least languages. The user then copies and pastes that table into this Wikipedia article. Here are the files if anyone wants to run it for themselves. Note that there are issues with Wikipedia formatting the texts, so I recommend people copy and past from the source code of this talk page, not from the talk page directly.Masktapeisawesome (talk) 02:28, 27 May 2025 (UTC)[reply]

Edit: I updated both files to fix bugs. If you previously copied and pasted the older files into your computer system, please copy and paste again. Masktapeisawesome (talk) 23:03, 27 May 2025 (UTC)[reply]

The code contains a list of all the records in the table. That list should be updated each time the code is run, since records are frequently being added to the table. Insert the list near the beginning of the code in place of the XXX. Each record should list the category, subcategory, and article name, with commas between fields, the article name in quotation marks, and spaces between records. So for example, if the code only contained the first six records, its start would look like this:

“articles.csv”: Category,Subcategory,Article Place,Country,"Turkey" Place,Country,"United States" Place,Country,"Japan" Place,Country,"Russia" Place,Country,"Finland" Business,Website,"Wikipedia" “wiki_langs.py”:

If you copy the table into Excel, you can aggregate the code by typing this formula into cell G2 and copying it down:

=G1&D2&","&E2&","""&B2&""" "

Then the result will be in column G in the last row. Although the entire table holds too many characters to fit into a single cell in Excel, so you might have to aggregate it in two or three pieces (one for each 1,000 rows or so). - Burner89751654 (talk) 04:28, 16 June 2025 (UTC)[reply]
Code
articles.csv:
Category,Subcategory,Article XXX wiki_langs.py:

import csv
import requests
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor, as_completed

INPUT_FILE = 'articles.csv'
OUTPUT_FILE = 'wikipedia_languages_table.txt'
DATE_FORMAT = '%Y/%m/%d'
TODAY = datetime.today().strftime(DATE_FORMAT)

SESSION = requests.Session()

def get_language_count(title):
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "format": "json",
        "prop": "langlinks",
        "titles": title.strip(),
        "lllimit": "max",
        "redirects": 1
    }
    resp = SESSION.get(url, params=params, timeout=10)
    resp.raise_for_status()
    data = resp.json()

    pages = data.get("query", {}).get("pages", {})
    if not pages:
        raise ValueError(f"No page data for title '{title}'")

    page = next(iter(pages.values()))

    if "missing" in page:
        raise ValueError(f"Article '{title}' not found")

    langlinks = page.get("langlinks", [])
    return len(langlinks) + 1  # English + others

def main():
    with open(INPUT_FILE, newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        rows = list(reader)

    enriched_rows = []
    with ThreadPoolExecutor(max_workers=5) as executor:
        future_to_index = {
            executor.submit(get_language_count, row['Article']): idx
            for idx, row in enumerate(rows, start=1)
        }
        for future in as_completed(future_to_index):
            idx = future_to_index[future]
            title = rows[idx - 1]['Article']
            try:
                count = future.result()
            except Exception as e:
                print(f"{idx}: {title} -> ERROR: {e}")
                count = 0
            rows[idx - 1]['LanguageCount'] = count
            enriched_rows.append(rows[idx - 1])
            print(f"{idx}: {title} -> {count} languages")

    # Sort by LanguageCount DESC, Category ASC, Subcategory ASC, Article ASC
    enriched_rows.sort(
        key=lambda r: (-r['LanguageCount'], r['Category'], r['Subcategory'], r['Article'])
    )

    # Assign ranks with tie handling
    wikitable = [
        '== Examples of Wikipedia articles with high language representation ==',
        '{| class="wikitable sortable"',
        '|-',
        '! #',
        '! Article',
        '! Languages',
        '! Category',
        '! Subcategory',
        '! Last Updated'
    ]

    prev_count = None
    rank = 0
    actual_index = 0  # Total rows seen
    for row in enriched_rows:
        actual_index += 1
        if row['LanguageCount'] != prev_count:
            rank = actual_index
            prev_count = row['LanguageCount']

        wikitable.append('|-')
        wikitable.append(
            f"| {rank}\n"
            f"| [[{row['Article']}]]\n"
            f"| {row['LanguageCount']}\n"
            f"| {row['Category']}\n"
            f"| {row['Subcategory']}\n"
            f"| {TODAY}"
        )

    wikitable.append('|}')  # Close table

    with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
        f.write('\n'.join(wikitable))

    print(f"Table written to {OUTPUT_FILE}")

if __name__ == "__main__":
    main()

Not sure it really works

[edit]

Loot at en:Barcelona. It says it has 199 more articles. So, 200 in total. Hence, it should be here. It isn't.

--77.75.179.1 (talk) 23:20, 22 June 2025 (UTC)[reply]

Feel free to add it. - Burner89751654 (talk) 23:45, 22 June 2025 (UTC)[reply]

List too unnecessary long

[edit]

I think we should limit the list to aritcles with 100+ languages or higher because it is becoming unnecessary long (3086 entries as of now), and it will only get even longer in the future. Realtinek (talk) 14:07, 24 June 2025 (UTC)[reply]