R. Ghosh, with most helpful comments from C. Moreton-Smith,ISIS
What is XML?
XML is a protocol for containing and managing information; it extends to include actively filtering and formatting documents within a network context of filter programs and linked files.At first glance the structure of an XML document resembles an HTML file, with similar possibilities for including links both to local and remote files. One main difference is that the HTML tags (typically the anchor <a>) is replaced by sets of element names which can have either local or network wide significance if suitably defined.
As an exercise two examples follow containing ILL data packed as XML files, and part of a sasCIF file from Steve King at ISIS, again repacked as an XML file. This is to demonstrate how we might reconcile two ASCII file formats where the primary interest is the table of intensities, which are identified similarly.
Example of ILL data in XML form
<?xml version = "1.0" encoding = "UTF-8"?>
<SANSdatafile>
<head_sasill><![CDATA[
teflon instrument tests 40 lines+(Q, I(Q), errI(Q))
ILL SANS D11
22100 0 19 1 38 34
1 0 32 0 3 1
rnil 15-Jan-2001 16:42:32
0.0000 ! Theta-0 Detector offset angle
32.0000 ! X0 cms Beam centre
31.5000 ! Y0 cms Beam centre
2.0000 ! Delta-R cms regrouping step
10.0000 ! SD m Sample-detector distance
12.0000 ! Angstroms incident wavelength
8.0000 ! m collimation distance
1.0000 ! concentration
66. ! ISUM central window sum
100000. ! flux monitor counts
:
:
0.0000 ! K sample temperature
0.0000 ! sample transmission
0.0000 ! mm sample thickness
34.9000 ! counting time secs
0.0000 ! reserved
0.0000 ! reserved
0.0000 ! reserved
0.0000 ! reserved
0.0000 ! reserved
0.0000 ! reserved
0.0000 ! reserved
0.0000 ! reserved
19 0 0 0 0 0 0 6
0.100000E+01 0.100000E+04 0.000000E+00 0.100000E+01 0.120000E+01
0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00
]]></head_ill>
<iqdata columns="3" col1_title="Angstroms-1" col2_title="Intensity" col3_title="Error">
2.617993E-04 3.700000E+01 4.301163E+00
1.062462E-03 6.412500E+01 1.634587E+00
2.107973E-03 1.410135E+03 5.207492E+00
3.167636E-03 1.752197E+03 4.801586E+00
4.189463E-03 7.581771E+02 2.810281E+00
:
:
1.255376E-02 1.486688E+01 2.197023E-01
1.360724E-02 1.204012E+01 1.927716E-01
1.466810E-02 1.026648E+01 1.679423E-01
1.572930E-02 8.808511E+00 1.530585E-01
1.672592E-02 7.862857E+00 1.498843E-01
1.775654E-02 7.079167E+00 1.717455E-01
1.857273E-02 6.913043E+00 2.741200E-01
<\iqdata>
</SANSdatafile>
Example of sasCIF in XML form
<?xml version = "1.0" encoding = "UTF-8"?>The aim is to get to <iqdata> in the simplest fashion, since this contains the real results and which must be easily read using existing languages or packages.
<SANSdatafile>
<head_sascif><![CDATA[
_sas_intensity.title "SAMPLE: 83396 EMPTY CAN: 83393"
# Next Item : Error in the name or in the value
#entry.id "default_LOQ_sasCIF_tpl"
_audit.creation_date 2000-09-22
_audit.creation_method "part typed, part updated by COLETTE"
_audit.update_record
; 2000-09-27 Many revisions by SMK
2000-11-09 Revised to work with COLETTE o/p routine
2000-11-18 Revised by SMK following changes to sasCIF dictionary by MM
2000-11-22 Corrected by MM
2000-11-23 Revised by SMK
;
loop_ _publ.contact_author_name
_publ.contact_author_email
_publ.contact_author_phone
_publ.contact_author_fax
'King, S M '
'smk@isise.rl.ac.uk '
'+44(0)1235-446437'
'+44(0)1235-445720'
'Bucknall, D G'
'david.bucknall@materials.oxford.ac.uk'
'+44(0)1865-273763'
'+44(0)1865 273789'
'Beckham, H W '
'haskell.beckham@tfe.gatech.edu '
'+1(404)894-4198 '
'+1(404)894-9766 '
_publ.contact_author_address
; ISIS Facility
Rutherford Appleton Laboratory
Chilton
Oxfordshire
OX11 0QX
United Kingdom
;
_diffrn_source.source "ISIS UK"
_diffrn_source.type "spallation source"
_diffrn_source.details "200 uA 800 MeV 50Hz proton synchrotron"
_diffrn_source.power 160.0
_diffrn_source.target
:
:
loop_ _sas_intensity.detc_id
_sas_intensity.detc_resp_raw
"MainDet" DIRECT913_001.FUDGE
"HighDet" DIRECTHAB.983
# q-axis units : 4*Pi sin [ theta ] / Lambda [Angstrom-1]
loop_ _sas_intensity.momentum_transfer
_sas_intensity.intensity
]]></head_sascif>
<iqdata columns="3" col1_title="Angstroms-1" col2_title="Intensity" col3_title="Error">
0.00900 1.456441E+01 4.897460E-01
0.01100 7.079165E+00 1.688033E-01
0.01300 3.699121E+00 8.279238E-02
0.01500 2.112296E+00 5.290537E-02
0.01700 1.338770E+00 3.596483E-02
0.01900 1.018834E+00 2.735320E-02
0.02100 8.131560E-01 2.213864E-02
0.02300 6.901327E-01 1.878839E-02
0.02500 5.894193E-01 1.642621E-02
0.02700 5.418695E-01 1.480873E-02
</iqdata>
</SANSdatafile>
The CDATA descriptor terminated by ]] indicates that special characters occurring within should be ignored.
Other aspects of XML include the possibility of including binary (could these be NeXus ? ) sections, and, of course the filters might be links to data browser programs.
If these files are cut and pasted into files a.xml, b.xml, then Internet Explorer 5 can open them and recognise the XML structures.
References:http://www.xml.com
http://msdn.microsoft.com/xml