* element in the following fragment for
example:
::
two exactly embedded elements
In such a case, the widget will first create two segments that have the exact
same address (since the embedded XML tags are deleted with **Remove markup**),
then by the effect of **Fuse duplicates**, it will seek to fuse them into one.
It will only be able to keep one of the rival annotation values *A* and *B*
for the annotation key *type*; by default, it will be the value associated to
the element closest to the root in the XML tree, namely *A*. If on the other
hand the **Prioritize shallow attributes** option is selected, the value of
the element closest to the "surface" will be kept, in our example *B*.
The **Conditions** subsection included in the **XML Extraction** section
allows the user to limit the extraction by specifying conditions bearing on
attributes of the extracted elements. These conditions are expressed in the
form of regular expressions that the given attribute values must match. In the
list appearing at the top of this subsection, the columns indicate (a) the
concerned attribute, (b) the corresponding regular expression, and (c) the
options associated to this expression. [#]_
In :ref:`figure 2
` above), we have thus limited the
extraction only to the ** elements that have a type attribute whose value
is *poem*. If several conditions were defined, they would all have to be
fulfilled for an element to be extracted. The buttons on the right enable the
user to delete the selected condition (**Remove**) or to empty the list
completely (**Clear All**).
The remaining part of the **Conditions** subsection allows the user to add new
conditions to the list. To do so, the attribute in question (**Attribute**)
and the corresponding regular expression (**Regex**) must be specified. The
**Ignore case (i)**, **Unicode dependent (u)**, **Multiline (m)** and **Dot
matches all (s)** checkboxes manage the application of the corresponding
options to the regular expression. Adding the new condition to the list is
finally carried out by clicking on the **Add** button.
The **Options** section allows the user to specify the output segmentation
label. The **Auto-number with key** checkbox enables the program to
automatically number the segments of the output segmentation and to associate
the number to the annotation key specified in the text field on the right. The
**Import annotations** checkbox copies in each output segment every annotation
associated to the corresponding segment of the input segmentation. The **Merge
duplicate segments** checkbox enables the program to fuse distinct segments
whose addresses are the same in a single segment; the annotations associated
to the fused segments are copied in the single resulting segment. [#]_
The **Info** section indicates the number of segments in the output
segmentation, or the reasons why no segmentation is emitted (no input data,
no output segment created, etc.).
The **Send** button triggers the emission of a segmentation to the output
connection(s). When it is selected, the **Send automatically** checkbox
disables the button and the widget attempts to automatically emit a
segmentation at every modification of its interface or when its input data are
modified (by deletion or addition of a connection, or because modified data is
received through an existing connection).
Messages
--------
Information
~~~~~~~~~~~
*Data correctly sent to output: segments.*
This confirms that the widget has operated properly.
*Settings were* (or *Input has*) *changed, please click 'Send' when ready.*
Settings and/or input have changed but the **Send automatically** checkbox
has not been selected, so the user is prompted to click the **Send**
button (or equivalently check the box) in order for computation and data
emission to proceed.
*No data sent to output yet: no input segmentation.*
The widget instance is not able to emit data to output because it receives
none on its input channel(s).
*No data sent to output yet, see 'Widget state' below.*
A problem with the instance's parameters and/or input data prevents it
from operating properly, and additional diagnostic information can be
found in the **Widget state** box at the bottom of the instance's
interface (see `Warnings`_ and `Errors`_ below).
Warnings
~~~~~~~~
*No XML element was specified.*
The name of an XML element must be entered in the **XML element** field in
order for computation and data emission to proceed.
*No label was provided.*
A label must be entered in the **Output segmentation label** field in
order for computation and data emission to proceed.
*No annotation key was provided for element import.*
In the advanced settings, the **Import element with key** checkbox has been
selected and an annotation key must be specified in the text field on the
right in order for computation and data emission to proceed.
*No annotation key was provided for auto-numbering.*
The **Auto-number with key** checkbox has been selected and an annotation
key must be specified in the text field on the right in order for
computation and data emission to proceed.
Errors
~~~~~~
*Regex error: (condition #).*
The regular expression in the *n*-th line of the **Conditions** list is
invalid.
*XML parsing error (missing closing tag / orphan closing tag).*
The input XML data couldn't be correctly parsed. Please use an XML
validator to check the data's well-formedness.
Examples
--------
* :doc:`Getting started: Converting XML markup to annotations
`
* :doc:`Cookbook: Convert XML tags to Orange Textable annotations
`
Footnotes
---------
.. [#] In comparison with the advance interface, it should also be noted that
in the basic interface the options **Prioritize shallow attributes**
and **Fuse duplicates** are disabled by default.
.. [#] See `Python documentation `_.
.. [#] In the case where the fused segments have distinct values for the same
annotation key, only the value of the last segment (in the order of the
extracted segments before fusion) will be retained.