Eight International Workshop on Finite-State Methods and Natural Language Processing Pretoria, South Africa
July 21st - 24th 2009

FSMNLP - Call poster

Finite-State Methods and Natural Language Processing - FSMNLP 2009

Eight International Workshop

University of Pretoria, South Africa



21-24 July 2009

As in 2008, FSMNLP is merged with the FASTAR (Finite Automata Systems - Theoretical and Applied Research) workshop.



The International Workshop Series of Finite State Methods and Natural Language Processing (FSMNLP) is a forum for researchers and practitioners working on

  1. NLP applications or language/language technology research resources,

  2. theoretical and implementation aspects, or

  3. their combinations

      having obvious relevance or an explicitly discussed relation to Finite-State Methods in NLP.

In the past, seven FSMNLP workshops have been organised (Budapest 1996, Ankara 1998, Helsinki 2001, Budapest 2003, Helsinki 2005, Potsdam 2007, Ispra 2008).

We invite submissions related to all obvious or traditional FSMNLP topics, see e.g. FSMNLP 2008. The updated list of topics includes all the obvious or traditional topics plus some new topics such as:

  1. common interfaces, portability, and shared methods for testing/benchmarking/evaluation of finite-state tools

  2. coping with large alphabets during finite-state compilation and in real-word applications

  3. fixed parameter tractability and narrowness in streamed NLP

  4. conventional/parallel algorithms using/manipulating conventional/stochastic finite-state automata/paths

  5. applications of rational kernels to active/statistical machine learning of finite-state models.


In recognition of its location on the African continent, this year's FSMNLP has Finite-State Methods for Under-Resourced Languages as a special theme. The theme is relevant to finite-state methods

  1. applied to practical tasks such as language survey, elicitation, data collection, computer-aided annotation, morphological description, modeling and normalization,

  2. considering demanding conditions such as linguistic complexity and diversity, scarce resources, research infrastructures, real-time grammar updates,

  3. in language processing fields such as comparative linguistics, field linguistics, applied linguistics, language teaching, and computer-aided translation.

The special theme does not restrict the scope but attempts to draw the attention of contributors to the challenges of computational linguistics in Africa. We hope that the theme teases out promising and useful applications of Finite-State Methods in this context.



To catalyze discussion and participation, three special sessions or subworkshops will be organized, containing presentations of regular papers and extended abstracts presenting ongoing research, tutorials,

competitions etc. relating to the following topic areas:

  1. 1.Finite-State Methods for African and other Under-Resourced/Low-Density Languages

    Under-resourced/low-density languages, including many African languages, often require documentation and language development, in particular including methods for field linguistics, localized information technology and basic language resources. However, the conditions for applying large scale statistical methods do not hold in general, while existing knowledge-based methods may not directly generalise to low-density languages. The purpose of this subworkshop is to take a fresh look at new resources, methods, collaboration and innovative approaches for these (clusters of) languages. Submissions may be concerned with the following aspects of the subworkshop topic:

      - project proposals and joint projects
      - basic language resources
      - innovative approaches to language clusters
      - field linguistics and language documentation
      - iterative language description
      - resource-efficient machine learning.

    The local organizers are Laurette Pretorius and Sonja Bosch. The subworkshop is tentatively accompanied by two tutorials given by Kemal Oflazer and Colin de la Higuera (see TUTORIALS AND INVITED TALKS below).

  2. 2.Practical Aspects and Experience of Finite-State Methods and Systems

    The last decade has seen a significant increase in the number of toolkits and implementations of finite state systems for NLP. Work on such implementations has highlighted a wide variety of implementation issues. Unfortunately much of this knowledge remains trade-secret or is only embodied in implementations themselves, and is not presented academically at conferences. This special session focuses on such issues and knowledge and covers the following topics (as they relate to NLP and real-life implementation issues):

      - user interfaces, visualisation, tracing and debugging
      - specification formalisms and languages (e.g. grammars, regular relations, etc.)
      - application programmer interfaces, interchange formats
      - performance, profiling and tuning techniques
      - classification, comparison and evaluation
      - data-structures and representations
      - compression of alphabets, lexicons and rules

  3. 3.Tree Automata and Transducers

    In recent years, applications of formal tree language theory in natural language processing have been on the rise, as witnessed by papers at conferences and in journals on formal language theory, finite automata, natural language processing, and computational linguistics. FSMNLP 2009 will therefore have a special session/subworkshop on tree automata and tree transducers in natural language processing. This subworkshop includes (but is not limited to) the following topics as long as they relate to natural language processing:

      - unweighted and weighted tree languages,
      - unweighted and weighted tree transformations,
      - the formalisms to represent and model them (including a.o. tree grammars, tree automata, tree
          expressions, tree transducers),
      - expressiveness of such models and representations,
      - relations to synchronous grammars,
      - learning of such models and representations,
      - algorithms for pattern matching, accepting, parsing of tree languages, and
      - large-scale applications (including those in statistical machine translation).

For each of the three subworkshops, a subcommittee of the PC supported by further experts from the field is responsible for the program.


The following invited speakers will present a talk at FSMNLP 2009:

  1. Andre Kempe (Cadege Technologies, Paris, France)

  2. Thomas Hanneforth (University of Potsdam, Germany)

We will have tutorials on the following tentative topics:

  1. Developing Computational Morphology for Low- and Middle-Density Languages by Kemal Oflazer  (Sabanci University, Turkey)

  2. Machine Learning with Automata by Colin de la Higuera (Jean Monnet University, Saint-Etienne, France)

  3. OpenFST by Johan Schalkwyk (Google, USA)


During FSMNLP 2009, we hope to announce a small competition / shared task related to machine learning of morphology.


SIGFSM is currently being established as a Special Interest Group in the Association for Computational Linguistics (ACL). If the necessary initial actions are completed before FSMNLP 2009, a SIGFSM business

meeting will be held as part of the workshop.



We initially invite submissions of full papers i.e. scientific contributions presenting new theoretical or experimental results. Papers should present original, unpublished research results and should not be submitted elsewhere simultaneously.

We also invite submission of extended abstracts containing or describing systems descriptions/demos, progress reports/ongoing work, joint projects/project proposals, small focused contributions, negative results, and opinion pieces, related to either of the subworkshop themes or the broader FSMNLP themes.

We particularly invite demo submissions, which should consist of an extended abstract of the technical content with authors (!), full contact information, references, acknowledgements, plus a "script outline" of the presentation and a detailed description of hardware, software and internet requirements.

Note that the early acceptance notification date for full papers may help to keep travel costs for international participants reasonably low. If you come from far away and have only an extended abstract, the abstract can be submitted earlier as if it were a full paper.

The information about the author(s) should be omitted in the submitted papers since the review process wil be double blind, except for demo submissions. Submissions are electronic and in PDF format via a web-based submission server.

Authors are encouraged to use Springer LNCS style (Proceedings and Other Multiauthor Volumes) for LaTeX in producing the PDF document. For graph visualization, Vaucanson-G LaTeX style, Graphviz/dot and XFig are recommended. If you use a non-roman script or Microsoft Word, it is advisable to warn the organizers as early as possible. The page limit is 12 pages for full papers and 8 pages for extended abstracts.


The on-site pre-proceedings will be on CD.

The post-proceedings with revised regular papers will be published after the conference in a volume of Lecture Notes in Artificial Intelligence as a part of the LNCS Series by Springer-Verlag.

High quality extended abstracts may be invited to be included in the LNCS post-proceedings, while other extended abstracts may be published as arranged by subworkshop organizers.

In addition, a special journal issue on the topics of the workshop is being planned. We are already pleased to inform that a special issue for Finite State Methods and Models in Natural Language Processing will be published in the Journal of Natural Language Engineering in 2011. Extended versions of the workshop papers and abstracts may be submitted to such a special issue (the publication involves a second review/selection cycle).


  1. Full paper submissions due: 26 April 2009

  2. Notification of acceptance for full papers: 26 May 2009

  3. Extended abstract submissions due: 17 May 2009

  4. Notification of acceptance for extended abstracts: 14 June 2009

  5. Deadline for inclusion in preproceedings: 28 June 2009