Xml Stream Language

The Transducer Stream Processor - aka XSP - language provides expressive syntactic construction for XML and Text analysis. These constructions are rules applied when associated activation conditions are verified.

Components overview

In Vodoo/Stream transformation process is done applying tranducers on documents. Such transducers are built upon group concept which is similar to groups defined in Omnimark for example.

Transcuder Architecture

Transducer

A transducer was in fact a set of groups designed for analysis. Such component is similar to an application but was event-driven. Then main blocs can de specified and rules are fired when conditions are verified. If no rule can fired an error is raised providing visited path and un-verified fragment.


 TRANSDUCER =
   <Transducer syntax=URL?
               extends=URL?
               handler=CLASS-NAME?
               class=ARCHIVE?>
    ( BLOCK-TO-EXECUTE                                               (1)
    | RULE                                                           (2)
    | GROUP                                                          (3)
    | EXTENSION                                                      (4)
    | ...                                                            (5)
    )*
   </Transducer>
        

A transducer can be associated with an handler. This handler provides methods which can be called or used when rule behaviours are executed. Such handler was specified by a class name located in the current class path or in a specified class search space. A transducer propose a way to transform a data an such transducer was expressed using XML notation. Then it's possible to provide a tranducer for transducers providing bootstrap and extensible architecture. This is why we have two attributes like syntax and extends. The syntax specify transducer features used to express the transducer. The extends explain how this transducer of transducer extends already defined transducer.

  • BLOCK_TO_EXECUTE defines block code which are executed when the transfomation starts. In fact end-user can define different blocks and these blocks are all executed with a priority to the first declared and so on. If no execution block is specified a default is created which call the parser using the default group.
  • RULE defines a basic unit which specifies conditional selection on fragments and associated behaviour to be executed when such conditions are verified.
  • GROUP defines a named group. In fact all rules are managed by a group. Then previous rules defined in a transducer are also in a given group which is the default group named *INITIAL*.
  • EXTENSION and the extends attribute has the same meaning
  • Group

    Rules are organized by groups. Such groups have a given name.

    
     GROUP =
       <Group name=CDATA
              inherits=CDATA?
              handler=CLASS-NAME?
              class=ARCHIVE?>
        RULE*
       </Transducer>
            

    Like transducers a group can have a specific handler. In this case rules have by default this handler. If this is not specified the default handler is provided by the transducer handler specification if defined. Indeed this handler can also be in a specific class search space. Finally a group can inherit. In this case inherited rules definitions are reused by the new one. This can be usefull for error handling for example or when parsing text for skipped characters. Be carefull inheritance for the moment only means reusability of rules and not defined handler for the group !

    When a group is defined with a name already used the group already defined and the current one are merge in only one group. Declaration order is preserved by the merge. This feature provides open groups definition.

    Rule definition

    A rule has two distinc parts. First one dedicated to the matching describes conditions to be verified depending on the nature of the data. Second one explains how the selected data has to be managed.

    
     RULE =
       <when>
        (SELECTION-ELEMENT-CONDITION | SELECTION-TEXT-CONDITION)
        BLOCK-TO-EXECUTE
       </when>
            

    Rule block execution

    A rule has two distinc parts. First one dedicated to the matching describes conditions to be verified depending on the nature of the data. Second one explains how the selected data has to be managed.

    
     BLOCK-TO-EXECUTE =
       <Method name=METHOD-NAME handler=CLASS-NAME? class=ARCHIVE?>      (1)
        (<Parameter>CDATA</Parameter>)*
       <Method>
     | <Parse with=CDATA?/>                                              (2)
     | <Ignore/>                                                         (3)
     | ...                                                               (4)
            

    1. Method defines a Java method to be invoked. This method can be in a specified handler in a given ARCHIVE (URL). Such method has an optional s set of string parameter. Such methods has a parameter denoting the parse context like arguments, the current handler etc. ... Implementation in the method has to explain how the element content must be parsed or not using the context.
    2. Parse do the element body parse using current rules group or a specified one. In fact this parse execution was a
    3. Ignore do not parse the element content.
    4. Such block can be "easily" extended by the end-user. One significant example was the Bean Scripting Framework integration allowing blocks using languages implementing BSF application interface.

    Filtering Elements

    An Element rule matches a tree node denoted by a fixed stream pattern design. A first application of this tree node denotation was given by the XML stream representation. It means Element rules are able to filter and decompose XML fragment on demand and all structure which conforms to the tree node representation.

    
     SELECTION-ELEMENT-CONDITION =
       <Element name=CDATA?>
        (<Attribute name=CDATA?
                    bind=CDATA?
                    value=CDATA?
                    type=("implied"|"required")?/>)*
       </Element>
            

    Text Rule

    
     SELECTION-TEXT-CONDITION =
       <Text>REGULAR-EXPRESSION?</Text>