Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Multilingual Computing Advanced Technology in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Computer Science    2022, 49 (1): 7-8.   DOI: 10.11896/jsjkx.qy20220101
    Abstract339)      PDF(pc) (1186KB)(611)       Save
    Related Articles | Metrics
    Meta Knowledge Intelligent Systems on Resolving Logic Paradoxes
    Jeffrey ZHENG
    Computer Science    2022, 49 (1): 9-16.   DOI: 10.11896/jsjkx.210700023
    Abstract718)      PDF(pc) (4996KB)(1404)       Save
    Professor Q.S.GAO (Chinese Science Academician) published New Fuzzy Set Theory in 2006 to explore possible solutions removing paradoxes in Fuzzy logic.In 2009,he published Foundation of Unified Linguistics from Science Press to provide bases of theoretical supports on computational multiple linguistics.The two monographs are the topmost invaluable diamonds in his creative academic activities.In memory of professor Q.S.Gao passed away for 10 years,it is my great pleasure to use new vector logic-variant construction,to describe the newest development on meta knowledge construction following advanced researches of professor Gao's legacy.Starting from vector logic,conjugate structure,meta knowledge model and other advanced mechanisms,it is a critical condition to use modern logic and mathematics to guarantee a complex system to be a consistent-dynamic one without paradoxes,to avoid if the complex system contains any logic paradox.From a classified and adjudicate viewpoint,paradoxes are divided into two categories:logic paradoxes,and semantic paradoxes.Using conjugate ring,it systematically resolves single surface property of Mobius ring to be four colored bands that support possible for this construction to resolve a series of intrinsic logicparadoxes in geometry,topology and logic.Conjugate ring provides a complete solution to resolve the Mo-bius type of paradoxes in general.Corresponding structures include many abstract systems,such as I Ching,differential geometry,geometric topology,global variation and optimization etc.Associated with resolving the Mobius type of paradoxes on topology,geometry and logic,it is natural for meta knowledge model to establish relevant key modules to support complex natural/artificial knowledge systems.Starting from classic logic,typical components are listed,such as classical logic,finite automata,Turing machine and Von Neumann architecture.Applying vector logic construction,conjugate structure and variant construction as key components with paradox-free properties,it is convenient to establish a series of architectures to support quantum Turing machine,vector machine on multiple complex functions,complicated intelligent systems,and analysis system of unified linguistics.This is an initial step for meta knowledge model to create future complicated intelligent systems.
    Reference | Related Articles | Metrics
    Incorporating Language-specific Adapter into Multilingual Neural Machine Translation
    LIU Jun-peng, SU Jin-song, HUANG De-gen
    Computer Science    2022, 49 (1): 17-23.   DOI: 10.11896/jsjkx.210900005
    Abstract606)      PDF(pc) (1989KB)(1072)       Save
    Multilingual neural machine translation (mNMT) leverages a single encoder-decoder model for translations in multiple language pairs.mNMT can encourage knowledge transfer among related languages,improve low-resource translation and enable zero-shot translation.However,the existing mNMT models are weak in modeling language diversity and perform poor zero-shot translation.To solve the above problems,we first propose a variable dimension bilingual adapter based on the existing adapter architecture.The bilingual adapters are introduced in-between each two Transformer sub-layers to extract language-pair-specific features and the language-pair-specific capacity in the encoder or the decoder can be altered by changing the inner dimension of adapters.We then propose a shared monolingual adapter to model unique features for each language.Experiments on IWSLT dataset show that the proposed model remarkably outperforms the multilingual baseline model and the monolingual adapter can improve the zero-shot translation without deteriorating the performance of multilingual translation.
    Reference | Related Articles | Metrics
    Similarity-based Curriculum Learning for Multilingual Neural Machine Translation
    YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang
    Computer Science    2022, 49 (1): 24-30.   DOI: 10.11896/jsjkx.210800254
    Abstract317)      PDF(pc) (2038KB)(610)       Save
    Multilingual neural machine translation (MNMT) with a single model has drawn more attention due to its capability to deal with multiple languages.However,the current multilingual translation paradigm does not make use of the similar features embodied in different languages,which has already been proven useful for improving the multilingual translation.Besides,the training of multilingual model is usually very time-consuming due to the huge amount of training data.To address these problems,we propose a similarity-based curriculum learning method to improve the overall performance and convergence speed.We propose two hierarchical criteria for measuring the similarity,one is for ranking different languages (inter-language) with singular vector canonical correlation analysis,and the other is for ranking different sentences in a particular language (intra-language) with cosine similarity.At the same time,the paper proposes a curriculum learning strategy that takes the loss of validation set as the curriculum replacement standard.We conduct experiments on balanced and unbalanced IWSLT multilingual data sets and Europarl corpus datasets.The results demonstrate that the proposed method outperforms strong multilingual translation systems and can achieve up to a 64% decrease in training time.
    Reference | Related Articles | Metrics
    Survey of Mongolian-Chinese Neural Machine Translation
    HOU Hong-xu, SUN Shuo, WU Nier
    Computer Science    2022, 49 (1): 31-40.   DOI: 10.11896/jsjkx.210900006
    Abstract475)      PDF(pc) (2542KB)(1046)       Save
    Machine translation is the process of using a computer to convert one language into another language.With the deep understanding of semantics,neural machine translation has become the most mainstream machine translation method at present,and it has made remarkable achievements in many translation tasks with large-scale alignment corpus,but the effect of translation tasks for some low-resource languages is still not ideal.Mongolian-Chinese machine translation is currently one of the main low-resource machine translation studies in China.The translation of Mongolian and Chinese languages is not simply the conversion between the two languages,but also the communication between the two nations,so it has attracted wide attention at home and abroad.This thesis mainly expounds the development process and research status of Mongolian-Chinese neural machine translation,and then selects the frontier methods of Mongolian-Chinese neural machine translation research in recent years,including data augmentation methods based on unsupervised lear-ning and semi-supervised learning,reinforcement learning,adversarial lear-ning,transfer-learning and neural machine translation methods assisted by pre-training models,etc.,and briefly introduce these methods.
    Reference | Related Articles | Metrics
    Construction Method of Parallel Corpus for Minority Language Machine Translation
    LIU Yan, XIONG De-yi
    Computer Science    2022, 49 (1): 41-46.   DOI: 10.11896/jsjkx.210900012
    Abstract434)      PDF(pc) (1581KB)(1433)       Save
    The training performance of neural machine translation depends heavily on the scale and quality of parallel corpus.Unlike some common languages,the construction of high-quality parallel corpora between Chinese and minority languages has been lagging.The existing minority language parallel corpora are mostly constructed by using automatic sentence alignment technology and network resources,which has many limitations such as domain and quality confined.Although high-quality parallel corpora could be constructed by manual,it lacks relevant experience and method.From the perspective of machine translation practitioners and researchers,this article introduces a cost-effective method to manually construct parallel corpus between minority languages and Chinese,including its overall goals,implementation process,engineering details,and the final result.This article tries and accumulats various experiences in the construction process,and finally forms a summary of the methods and suggestions for constructing parallel corpora from minority languages to Chinese.In the end,this paper successfully constructs 0.5 million high-quality parallel corpora from Persian to Chinese,Hindi to Chinese,and Indonesian to Chinese.The experimental results prove the quality of our constructed corpora,and it improves the performance of the minority language neural machine translation models.
    Reference | Related Articles | Metrics
    Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods
    CHENG Gao-feng, YAN Yong-hong
    Computer Science    2022, 49 (1): 47-52.   DOI: 10.11896/jsjkx.210900013
    Abstract501)      PDF(pc) (1410KB)(1489)       Save
    With the rapid development of multimedia and communication technology,the amount of multilingual speech data on the Internet is increasing.Speech recognition technology is the core for media analysis and processing.How to quickly expand from a few major languages such as Chinese and English to more languages has become a prominent issue yet to be overcome in order to improve multilingual processing capabilities.This article summarizes the latest progress in the field of acoustic model modeling,and discusses breakthroughs needed by traditional speech recognition technology in the course of moving from single language to multi-languages.The latest end-to-end speech recognition technology was exploited to construct a keyword spotting system,and the system achieves favorable performance.The approach is detailed as follows:1)multi-lingual hierarchical and structured acoustic model modeling method;2)multilingual acoustic modeling based on language classification information;3)end-to-end keyword spotting based on frame-synchronous alignments.
    Reference | Related Articles | Metrics
    Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition
    YANG Run-yan, CHENG Gao-feng, LIU Jian
    Computer Science    2022, 49 (1): 53-58.   DOI: 10.11896/jsjkx.210800269
    Abstract534)      PDF(pc) (1586KB)(924)       Save
    In the past decade,end-to-end automatic speech recognition (ASR) frameworks have developed rapidly.End-to-end ASR has shown not only very different characteristics from traditional ASR based on hidden Markov models (HMMs),but also advanced performances.Thus,end-to-end ASR is being more and more popular and has become another major type of ASR frameworks.A keyword search (KWS) framework based on end-to-end ASR and frame-synchronous alignment is proposed for solving the problem that end-to-end ASR cannot provide accurate keyword timestamps and confidence scores,and experimental verification on a Vietnamese dataset is made.First,utterances are decoded by an end-to-end Uyghur ASR system,obtaining N-best hypotheses.Next,a dynamic programming-based alignment algorithm is implemented on each of these ASR hypotheses and per-frame phoneme probabilities,which are provided by a phoneme classifier jointly trained with the ASR model,to compute time stamps and confidence scores for each word in N-best hypotheses.Then,final KWS result is obtained by detecting keywords within N-best hypotheses and removing duplicated keyword occurrences according to time stamps and confident scores.Experimental results on a Vietnamese conversational telephone speech dataset show that the proposed KWS system achieves an F1 score of 77.6%,which is relatively 7.8% higher than the F1 score of the traditional HMM-based KWS system.The proposed system also provides reliable keyword confidence scores.
    Reference | Related Articles | Metrics
    Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining
    LI Zhao-qi, LI Ta
    Computer Science    2022, 49 (1): 59-64.   DOI: 10.11896/jsjkx.210900007
    Abstract348)      PDF(pc) (1823KB)(899)       Save
    Query-by-Example is a popular keyword detection method in the absence of speech resources.It can build a keyword query system with excellent performance when there are few labeled voice resources and a lack of pronunciation dictionaries.In recent years,neural acoustic word embeddings has become a commonly used Query-by-Example method.In this paper,we propose to use wav2vec pre-training to optimize the neural acoustic word embeddings system,which is using bidirectional long short-term memory.On the data set extracted in SwitchBoard,the features extracted by the wav2vec model are directly used to replace the Mel frequency cepstral coefficient features,which relatively increases the system's average precision rate by 11.1% and precision recall break-even point by 10.0%.Subsequently,we tried some methods to fuse the wav2vec feature and Mel frequency cepstral coefficient feature to extract the embedding vector.The average precision rate and precision recall break-even point of the fusion method is a relative increase of 5.3% and 2.5% compared to the method using only wav2vec.
    Reference | Related Articles | Metrics
    Survey of Multilingual Question Answering
    LIU Chuang, XIONG De-yi
    Computer Science    2022, 49 (1): 65-72.   DOI: 10.11896/jsjkx.210900003
    Abstract623)      PDF(pc) (1925KB)(1001)       Save
    Multilingual question answering is one of the research hotspots in the field of natural language processing,which aims to enable the model to return a correct answer based on understanding of the given questions and texts in different languages.With the rapid development of machine translation technology and the wide application of multilingual pre-training technology in the field of natural language processing,multilingual question answering has also achieved a relatively rapid development.This paper first systematically reviews the current work of multilingual question answering methods,and divides them into feature-based methods,translation-based methods,pre-training-based methods and dual encoding-based methods,and introduces the use and characteristics of each method respectively.Meanwhile,it also discusses the current work related to multilingual question answe-ring tasks,and divides them into text-based and multi-modal-based tasks and gives the basic definition of each one.Moreover,this paper summarizes the dataset statistics,evaluation metrics and multilingual question answering methods involved in these tasks.Finally,it proposes the future research prospect of multilingual question answering.
    Reference | Related Articles | Metrics
    Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation
    XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian
    Computer Science    2022, 49 (1): 73-79.   DOI: 10.11896/jsjkx.210900036
    Abstract387)      PDF(pc) (2164KB)(655)       Save
    Dependency parsing aims to identify syntactic dependencies between words in a sentence.Dependency parsing can provide syntactic features and improve model performance for tasks such as information extraction,automatic question answering and machine translation.The training data size has an significant impact on the performance of the dependency parsing model.The lack of training data will cause serious unknown word problems and model over-fitting problems.This paper proposes various data augment strategies for the problem of low-resource dependency parsing.The proposed method effectively expands the training data by synonym substitution and alleviates the unknown words problem.The data augment strategies of multiple Mixups effectively alleviate the model overfitting problem and improve the generalization ability of the model.Experimental results on the universal dependencies treebanks(UD treebanks) dataset show that the proposed methods effectively improve the performance of Thai,Vietnamese and English dependency parsing under small-scale training corpus conditions.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 1, 11 records