计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 223-233.doi: 10.11896/jsjkx.240400077

• 人工智能 • 上一篇    下一篇

基于大语言模型的移动应用可访问性增强方法

马琦珉1, 李向民1,2, 周雅倩1   

  1. 1 复旦大学计算机科学技术学院 上海 200438
    2 上海市数据科学重点实验室(复旦大学) 上海 200438
  • 收稿日期:2024-04-11 修回日期:2024-08-19 出版日期:2024-12-15 发布日期:2024-12-10
  • 通讯作者: 周雅倩(ZhouYaqian@fudan.edu.cn)
  • 作者简介:(19210240035@fudan.edu.cn)

Large Language Model-based Method for Mobile App Accessibility Enhancement

MA Qimin1, LI Xiangmin1,2, ZHOU Yaqian1   

  1. 1 School of Computer Science and Technology, Fudan University, Shanghai 200438, China
    2 Shanghai Key Laboratory of Data Science(Fudan University), Shanghai 200438, China
  • Received:2024-04-11 Revised:2024-08-19 Online:2024-12-15 Published:2024-12-10
  • About author:MA Qimin,born in 1998,postgraduate.Her main research interests include na-tural language processing and information search.
    ZHOU Yaqian,born in 1976,Ph.D,associate professor,is a member of CCF(No.14944M).Her main research interests include natural language processing and multimodal language mo-del.

摘要: 移动应用可访问性(Mobile Application Accessibility)是指移动应用程序设计和实现的程度,目的是确保任何用户都能够轻松地访问和使用该应用。国内移动应用市场上的海量应用中支持无障碍功能的应用少之又少,与数量庞大且与日俱增的老年群体和视觉障碍群体追求享受数字时代红利、打破数字鸿沟的愿景产生矛盾。大规模语言模型(Large Language Model,LLM)在实现人类水平的智能方面表现出了巨大的潜力,通过提示词工程引导可以进行简单的逻辑推理和决策判断。此外,缩短交互路径是一种最为直观的移动应用可访问性增强方法。受到上述事实的启发,提出一种基于大规模语言模型的移动应用可访问性增强方法,创新性地应用可访问性服务和大语言模型,兼顾安全性、自动化和智能化。实现了一种移动应用可访问性辅助工具AccessLink,在非侵入式和用户授权的前提下,感知和操作移动应用的图形化用户界面,由此实现了基于自动化方法的数据集构建方法,并在构建的数据集上使用大模型GPT-3.5、GPT-4.0、通义千问和百川进行实验,证明了所提方法的有效性。

关键词: 大语言模型, 安卓, 移动应用, 可访问性, 自然语言处理

Abstract: Mobile application accessibility refers to the degree to which mobile applications are designed and implemented to ensure that any user can easily access the application.However,only a small fraction of the vast number of applications in the domestic mobile application market support accessibility features,which contradicts to the vision of breaking the digital divide and enjoying the benefits of the digital age for the growing elderly and visually impaired population.Large language models(LLMs) have demonstrated significant potential for achieving human-level intelligence.Through prompts guidance,they can engage in simple logical reasoning and decision-making.In addition,shortening the interactive pathway is an intuitive strategy for enhancing mobile application accessibility.Inspired by the aforementioned facts,we propose an innovative method for enhancing mobile application accessibility based on LLMs.This method creatively applies accessibility services and LLMs,aiming to improve security,automation,and intelligence.We have implemented a mobile application accessibility tool called AccessLink.Under the premise of non-invasiveness and user authorization,AccessLink perceives and interacts with the graphical user interface of mobile applications.Additionally,we have developed a dataset construction approach based on automated methods.Experimental validation is conducted using the constructed dataset with large models such as GPT-3.5,GPT-4.0,QianWen and Baichuan,demonstrating the effectiveness of the proposed method.

Key words: Large language model, Android, Mobile application, Accessibility, Natural language processing

中图分类号: 

  • TP311
[1]卜佳俊,唐李真.无障碍与信息技术[M].沈阳:辽宁人民出版社,2019.
[2]YU J E,CHATTOPADHYAY D.“Maps are hard for me”:identifying how older adults struggle with mobile maps[C]//Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility.2020:1-8.
[3]中国信息通信研究院.信息无障碍白皮书(2022年)[R/OL].(2022-05-19)[2024-04-09].http://www.caict.ac.cn/kxyj/qwfb/bps/202205/P020220518510041281463.pdf.
[4]LI X M,SHEN L W,DONG Z.Mobile Application Accessibility Enhancement Method Based on Recording and Playback[J].Computer Science,2023,50(12):32-48.
[5]ONEY S,LUNDGARD A,KROSNICK R,et al.Arboretum and arbility:Improving web accessibility through a shared browsing architecture[C]//Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology.2018:937-949.
[6]LUKIĆ N,TALEBIPOUR S,MEDVIDOVIĆ N.Remote control of ios devices via accessibility features[C]//Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation.2020:35-40.
[7]ZHANG X,ROSS A S,FOGARTY J.Robust annotation of mobile application interfaces in methods for accessibility repair and enhancement[C]//Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology.2018:609-621.
[8]ZHANG X,DE GREEF L,SWEARNGIN A,et al.Screen recognition:Creating accessibility metadata for mobile applications frompixels[C]//Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.2021:1-15.
[9]ELER M M,ROJAS J M,GE Y,et al.Automated accessibility testing of mobile apps[C]//2018 IEEE 11th International Conference on Software Testing,Verification and Validation(ICST).IEEE,2018:116-126.
[10]VENDOME C,SOLANO D,LIÑÁN S,et al.Can everyone use my app? an empirical study on accessibility in android apps[C]//2019 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2019:41-52.
[11]ALSHAYBAN A,AHMED I,MALEK S.Accessibility issues in android apps:state of affairs,sentiments,and ways forward[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.2020:1323-1334.
[12]FOK R,ZHONG M,ROSS A S,et al.A Large-Scale Longitudinal Analysis of Missing Label Accessibility Failures in Android Apps[C]//Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.2022:1-16.
[13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[14]PETERS M,NEUMANN M,IYYER M,et al.Deep Contextua-lized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018.
[15]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[16]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI blog,2019,1(8):9.
[17]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[18]张奇,郑锐,黄萱菁.大规模语言模型:从理论到实践[M].北京:电子工业出版社,2024.
[19]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thought prompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
[20]KOJIMA T,GU S S,REID M,et al.Large language models are zero-shot reasoners[J].Advances in Neural Information Processing Systems,2022,35:22199-22213.
[21]YAO S,YU D,ZHAO J,et al.Tree of thoughts:Deliberateproblem solving with large language models[J].arXiv:2305.10601,2023.
[22]MADAAN A,TANDON N,GUPTA P,et al.Self-refine:Iterative refinement with self-feedback[J].arXiv:2303.17651,2023.
[23]YAO S,ZHAO J,YU D,et al.React:Synergizing reasoning and acting in language models[J].arXiv:2210.03629,2022.
[24]SHINN N,CASSANO F,LABASH B,et al.Reflexion:Lan-guage Agents with Verbal Reinforcement Learning[J].arXiv:2303.11366,2023.
[25]YANG Z,LIU J,HAN Y,et al.Appagent:Multimodal agents as smartphone users[J].arXiv:2312.13771,2023.
[26]FURUTA H,NACHUM O,LEE K H,et al.Multimodal web navigation with instruction-finetuned foundation models[J].arXiv:2305.11854,2023.
[27]ZHAN Z,ZHANG A.You only look at screens:Multimodalchain-of-action agents[J].arXiv:2309.11436,2023.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!