Check-in and access this session from the IGF Schedule.

IGF 2024 WS #119 AI for Multilingual Inclusion

    Organizer 1: Claire van Zwieten, Internet Society Foundation
    Organizer 2: Shifa Sorene Assefa , UNECA
    Organizer 3: Jesse Nathan Kalange, Internet Society Uganda Chapter
    Organizer 4: Athanase Bahizire, Youth IGF DRC

    Speaker 1: ABRAHAM SELBY, Technical Community, African Group
    Speaker 2: Ida Padikuor Na-Tei , Private Sector, African Group
    Speaker 3: Shifa Sorene Assefa , Intergovernmental Organization, Intergovernmental Organization

    Moderator

    Jesse Nathan Kalange, Civil Society, African Group

    Online Moderator

    Athanase Bahizire, Technical Community, African Group

    Rapporteur

    Claire van Zwieten, Civil Society, Western European and Others Group (WEOG)

    Format

    Classroom
    Duration (minutes): 60
    Format description: We aim for the format of this session to be led by two individuals, but open to conversation from the participants. Given this, we find that having the two people at the front, and everyone else seated, will lend to the kind of conversation and input we are seeking from our participants. The classroom layout, somewhat in-between a roundtable and a theater, promotes open communication between speakers and participants, while allowing speakers to call on those who would like to speak to maintain order.

    Policy Question(s)

    What measures should be taken to address the digital language divide and ensure equal access to AI technologies for speakers of all languages? How can we ensure that AI models trained on multilingual data represent diverse languages, cultures, and dialects without perpetuating biases through discriminatory language models and biased content moderation algorithms? How can governments, industry, academia, and language communities collaborate to advance multilingual AI research and development?

    What will participants gain from attending this session? Participants will gain insight into how the use of large language models can increase Internet inclusion by allowing users to access content in their local language. Digital literacy campaigns are only as useful as their participant’s Internet access, and participants will gain an understanding of how multilingual inclusion through AI can increase collaboration, cooperation, and access in and between communities of various languages.

    Description:

    One of the challenges of expanding Internet access is the availability of content in local languages. Working towards the sub-theme of Advancing Human Rights and Inclusion in the Digital Age, the AI for Multilingual Inclusion workshop will discuss the expansion of Internet access through greater language inclusion. Through the use of multilingual AI systems, we can engage digitally isolated populations and grant them equal access to information. In turn, this bolsters digital literacy education efforts by making the Internet’s content available to everyone.

    Expected Outcomes

    We expect participants of this session to leave with a greater understanding of how lack of access due to linguistic differences can lead to a host of societal issues and foster disconnection in communities. Those who attend will be able to more confidently contribute AI conversations in the future, particularly when discussing how AI can increase content accessibility. Furthermore, feedback from the session will be turned into a report and made available to the IGF community. Some of the questions or comments from the audience will influence the Internet Society Alumni Network's priorities and events.

    Hybrid Format: The session will alternate between on-site speakers and online speakers, ensuring equal opportunities for participation and interaction. The onsite moderator will start the session by introducing the onsite speakers and give the floor to the online moderator to introduce the online speakers. Each speaker will be allocated 5 minutes for introductory remarks, followed by 15 minutes of questions from both onsite and online participants. Then the speakers will engage in 15 minutes of moderated discussion, then 10 minutes for questions followed by a wrap-up by onsite and online moderators. We will use dynamic presentations and other online tools and platforms, including a shared Google Doc to collect further insight and comments that may not be addressed due to time constraints to increase engagement and participation during the session, including real-time polls, questions and answers activities.

    Key Takeaways (* deadline at the end of the session day)

    Artificial Intelligence is an important asset needed to increase the accessibility of the Internet.

    The only way to solve the disparity before we leave the AI hype age is to create more local language content to increase data sets of minority languages.

    By promoting multilingualism, one is also promoting access. We need materials and support in multiple languages to make Internet multilingualism a reality.

    Call to Action (* deadline at the end of the session day)

    If you speak a language underrepresented on the Internet, it is crucial that you find ways to get your language on the Internet. Write a blog, talk to your family about your lineage, and post content however, and whenever you can in your local language. Increasing the data sets used to train AI in various languages is virtually the only way to increase the ability for AI to learn.

    Session Report (* deadline 9 January) - click on the ? symbol for instructions

    Summary 

    Workshop #119, titled AI for Multilingual Inclusion, covered the topics of AI and how it can be used to generate greater accessibility on the Internet. Over 7,100 languages are spoken around the world today, yet English, which is only spoken by roughly 17% of the world, prevails as the Internets dominant language. The disparity between English speakers and non-English Internet users creates a harmful dissonance in access. How can non-English speakers enjoy the full benefits of the Internet if they have no means of translating the Internet’s content to their language? What effect does that diminished access have on peoples ability to utilize the opportunities provided by the Internet? We do not have stable figures to quantify how great this problem is, but we do know that the rise of AI will play a major role in reducing its harms. 

     

    This session aimed to discuss the role of AI in translating the Internet, what benefits could be extracted from this process, and how the Internet community contribute to multilingualism through AI becoming a reality for all. We not only met this goal, but had the priviledge of hearing from different stakeholder groups in the audience for their perspective. 

     

    Discussion

    The discussion started with the speakers explaining the concepts of multiligualism and AI, before seamlessly linking them together to show how they can support eachother. We discussed the roles and unique challenges of different stakeholder groups in achieving this goal, and unanimously agreed that the multistakeholder model of Internet Governance would yield the best outcomes. 

     

    Academia: 

    An audience member representing academia shared the need for increased minority language representation in academia. AI needs data to train on, and discussing, writing, and publishing materials in minority languages will play a huge role in AI’s ability to translate the Internet. Not only should academics do this themselves, they should also promote this among their colleagues and students to increase the quantity and quality of trainable data. 

     

    Minority Language Speaking Communities:

    Multiple audience members highlighted the importance of minority language speakers to take ownership over their language and content. Athanase Bazihire, one of the speakers, also suggested that stakeholders jump on this opportunity now before the AI hype cycle dries up. That way, minority language speakers would be able to capitalize on resources, public attention, and open source technology while the topic is fresh. Minority language communities are encouraged to discuss their linguistic history with their family, and write blogs or other content on the Internet in that language. 

     

    Governments: 

    Governments also play a role in making sure that their national languages are thoroughly represented on the Internet. Futhermore, AI represents a unique opportunity to national leaders. Not only can AI help translate the Internet’s content to increase access to materials, it can also play a role in preserving or even revitalizing a dead language. One speaker, Claire van Zwieten, highlighted the Navajo Nation, an indigenous tribe of the United States. They have done incredible work digitizing their language as a means of preservation and encouraging new learners, as many of their native speakers are aging. 

    Another reason that governments and also civil society organizations should play a significant role in this process is the valuable intimate knowledge they have of their national culture and linguistic diversity. For example, Ethiopia has over 80 nations and over 80 languages. If all of those languages could be translated through AI to represent all the data on the Internet, those speakers would have much greater access to the opportunities provided by the Internet. 

    Feedback 

    One piece of feedback was that we needed to take digital equity into account. For communities to document their language online, they must first have Internet access. This feedback underscores how vital Internet access is to economic and social opportunity, but also how that opportunity is worth little if one cannot understand the Internet’s content due to linguistic barriers.