Featuring Ljubljana-based Jožef Stefan Institute (IJS) as the coordinator, research institutes from Germany, Spain, Croatia and China, as well the Slovenian Press Agency (STA) and contributions from Bloomberg and The New York Times, the three-year project is estimated at almost EUR 5m, EUR 3.5m of which is to come from EU funds.
The idea is to combine scientific insights for a breakthrough in the area of machine-based cross-lingual text understanding, including for the needs of media, the head of the project Marko Grobelnik, who works in the IJS artificial intelligence lab, told the STA.
The project, detailed information on which is available at www.xlike.org, will draw on the latest findings in computational linguistics, machine learning, text mining and semantic technologies.
The software solutions envisaged would for instance enable the connecting of reports by individual media with similar reports on related subject in any language. The detection of plagiarism could be one of the applications.
Also a goal is fast detection of new events via announcements in different languages on the internet. This would give editors a tool for identifying new developments, such as spontaneously organised protests, at an early stage.
The project will focus on six languages - English, German, Spanish and Chinese as major languages, and Catalan and Slovenian as smaller languages.
The basic messages of texts will however be attainable in more than 100 languages, which is to be achieved with the help of a Wikipedia-based corpus.
The meeting at Bled featured guests and representatives of partners, including the IJS, the Karlsruhe Institute of Technology, Universitat Politecnica de Catalunya, University of Zagreb, the Chinese Tsinghua University, and iSOCO from Spain.
The next meeting is expected in April in Barcelona, where the goal is a detailed plan for the execution of the project.