2-02-1 DOCLE Report - University of Adelaide

Transcription

2-02-1 DOCLE Report - University of Adelaide
2-02-1
GP Vocabulary Project – Stage-2
DOCLE
Report
Don Walker
April 2004
The Docle System Report
Acknowledgements:
The GP Vocabulary Project is being funded by the
Department of Health and Ageing (DHA) through
the General Practice Computing Group (GPCG)
Publication Date:
April 2004
Contact person:
Dr Don Walker
Department of General Practice
University of Adelaide
South Australia 5005
Email: [email protected]
Don Walker (April 2004)
2
The Docle System Report
FOREWORD
The "GP Vocabulary Project – phase-2" links (or “maps”) a subset of GP terms to the
concepts of various systems. The links are made via the terms of the concepts. Thus
“ICD10-AM codes” are linked via their “index terms”.
This is a report about the DOCLE system. It covers its loading into the Poly-browser
and Authoring Tool (PAT) and its subsequent examination, assessment and analysis
as specified for the GP Vocabulary Project – stage-2.
Additional information is available on the web-page of the "GP Vocabulary Project".
It may be found via the home-page of the Department of General Practice at the
University of Adelaide:
http://www.generalpractice.adelaideuni.org/
http://www.generalpractice.adelaideuni.org/nav/res_nav/current/it.shtml#vocab
Don Walker (April 2004)
3
The Docle System Report
CONTENTS
1
2
3
4
5
6
7
8
Introduction............................................................................................................6
1.1
The GP Vocabulary Project stage-2...............................................................6
1.2
The Docle System ..........................................................................................6
Liberties taken........................................................................................................6
Definitions pertinent to the Docle System .............................................................6
The supplied data tables.........................................................................................8
4.1
Terms All .......................................................................................................8
4.2
Reason for encounter .....................................................................................9
4.3
Primary Key - Secondary Key .....................................................................10
4.3.1
Data sample..........................................................................................10
4.4
Genus to Species ..........................................................................................12
4.4.1
Data Sample .........................................................................................12
4.5
Species to Genus ..........................................................................................13
4.5.1
Data Sample .........................................................................................13
Preparation ...........................................................................................................14
5.1
Expand the “Terms-All” data.......................................................................14
5.2
Identification of “Atomic Concepts” ...........................................................14
5.3
Creation of a “Unique Docle List”...............................................................14
5.4
Creation of “Preferred-Terms” ....................................................................14
5.5
Creation of Concept and Operator usage .....................................................14
5.6
Creation of other “Terms” data....................................................................16
5.7
Creation of CODE data ................................................................................16
5.8
Creation of LIST data ..................................................................................16
5.9
Creation of the Hierarchy.............................................................................16
5.9.1
Discussion ............................................................................................16
5.9.1.1 Level-0 - the “Phylum” ....................................................................16
5.9.1.2 Level-1 - the “Genus” ......................................................................17
5.9.1.3 Level-2 – the “Species” ...................................................................17
5.9.1.4 Docle Orphan Concepts ...................................................................17
5.9.1.5 Cyclic relationships in the Genus-to-Species table..........................17
5.9.1.6 Poly-hierarchies ...............................................................................18
5.9.1.7 Cyclic relationships in a poly-hierarchy ..........................................18
5.9.1.8 The “Type” of a Hierarchical relationship.......................................19
5.9.2
Creation of the Hierarchy Table ..........................................................19
5.10 Creation of Non-hierarchical Relationships.................................................20
5.10.1
Relationship-types in SNOMED..........................................................20
5.10.2
Creation of the Non-hierarchical relationships ....................................20
5.10.2.1
The Docle Operators ....................................................................20
5.10.2.2
Docle Non-Hierarchical Relations...............................................21
5.11 Allocation of “Unique Identifiers”...............................................................23
Loading the Docle System into the Poly-Browser...............................................23
6.1
Docle’s Appearance in the Poly-Browser....................................................23
Further Reading ...................................................................................................30
Analysis................................................................................................................30
8.1
Supplied data................................................................................................30
8.2
Content.........................................................................................................30
8.3
Machine located contents.............................................................................32
Don Walker (April 2004)
4
The Docle System Report
8.3.1
Finding GP Vocabulary terms .............................................................32
8.3.2
Terms in SNOMED CT that were also in other systems .....................33
8.4
Manually located contents ...........................................................................34
8.4.1
Comparing machine matches and manual linking ...............................35
8.5
Relevance.....................................................................................................35
8.6
Ease of use and functionality: ......................................................................36
8.7
Cost and availability ....................................................................................37
8.8
Strengths and implications...........................................................................37
8.9
Weaknesses, consequences and solutions....................................................37
9
Concluding remarks .............................................................................................39
10
Creator’s Comments ........................................................................................39
Don Walker (April 2004)
5
The Docle System Report
1 Introduction
This document describes the inclusion of the DOCLE System into the poly-browser
and authoring tool (PAT) and its subsequent examination, assessment and analysis.
The work was part of the GP Vocabulary Project – stage 2.
1.1 The GP Vocabulary Project stage-2
The objectives of stage-2 of the GP Vocabulary Project are to:
1. Trial the mapping of the GP Vocabulary project to other coding systems in use
within Australia (which include where feasible: DOCLE, CATCH, ICD-10-AM,
ICPC2-Plus, and SNOMED-CT).
2. Analyse the structure of these terminologies to determine how well they relate to
each other to assess their suitability for general practice and other parts of the
health sector; and
3. Inform consideration of the process and costs involved in completing a map from
the GP Vocabulary to a standard reference terminology and classifications. To
achieve this, the pilot will need to use appropriate methodologies and tools which
could be applied to an eventual large scale completion of the linking exercise.
1.2 The Docle System
The Docle System is the creation of an active Victorian general practitioner - Dr
Kuang Oon. It is used as the controlled vocabulary in the medical record system
called “Medical Director”.
2 Liberties taken
Some liberties had to be taken with the Docle System. These were necessary to enable
its loading into the Poly-browser. The liberties included the following…
o Some Docle operators were changed to “Docle operator concepts”. These are
discussed below.
o The question-mark character (“?”) was treated as a new concept named
“possible”.
3 Definitions pertinent to the Docle System
The Docle System is somewhat unconventional. Perhaps the following definitions and
explanations may help initial understanding of it…
Don Walker (April 2004)
6
The Docle System Report
Term
Definition and Explanation
Docle
A “Docle” is the name given to the unique identifier used by the Docle
System. It generally consists of the first four letters of the name of the
concept it represents. For example the docle for “chest” is “ches”. The
identifier is thus usually understandable.
Atomic Concept
“Atomic concepts” are those that cannot be reduced further. They are
combined by “docle operators” to create “compound concepts”. For
example “Chest” and “pain” are two atomic concepts that form the
compound concept of “Chest pain”. The docle for which is that for
“chest”, plus the operator for “apropos” plus the docle for “pain” – the
resulting docle being “ches@pain”.
Atomic Docle
An “atomic docle” is a single docle that identifies an “atomic concept”. It
does not contain an operator. “ches” and “pain” are each “atomic docles”.
Docle Operator
The “docle operator” is the single punctuation character that joins atomic
concepts to form “compound concepts”. In the above “chest pain”
example it is the “@” character meaning “apropos”.
Compound Concept
A “Compound concept” is one that is made up of two or more “atomic
concepts”.
Compound Docle
A “compound docle” is a docle that uniquely identifies a compound
concept. It contains two or more atomic docles joined by operators.
Genus
“Genus” is the name given by the Docle System to the top level of its
hierarchical concepts. The level above (“phylum”) has not been
implemented. A genus concept is the parent of children that are called its
“species”. (“In the Docle System, a genus is a parent “metaclass” of its
species. It carries the “^” stigma.” (Kuang Oon))
Species
“Species” is the name given by the Docle System to the second level of
its hierarchical concepts. They are the children of their parent “genus”.
Orphan concepts
“Orphan concepts” are those that are unclassified. They therefore have no
parents.
Primary key
The docle “primary-key” is an expression of a docle in “long-hand”. For
example the primary key for the concept “Chest pain” is “chest@pain”.
“Preferred terms” and “secondary keys” may be derived from the
primary-docle by using the “docle computer algorithm”..
Secondary key
The docle “secondary-key” is the short-hand form of the primary-key
docle. It is derived by applying the “Docle algorithm” to the primary-key.
Tertiary key
The docle “tertiary key” is the “alias” or “alternative description” for a
docle concept. Aliases represent concept “synonyms”.
Docle algorithm
The “Docle algorithm” converts the “Docle primary-key” to the “Docle
secondary-key” and can generate “preferred terms” from the “Docle
primary-key”. To generate secondary-keys, the Docle algorithm acts
along the follows lines:
For one word scenario – the first four characters are used.
For two word scenario – the first four characters of first word plus the
first character of the second word.
For three or more word scenario – the first characters of each word
concatenated.
Table 1: Terms and definitions used to describe the Docle System
Don Walker (April 2004)
7
The Docle System Report
4 The supplied data tables
The Docle System was supplied in five text tables that are described below. The table
were…
1.
2.
3.
4.
5.
Terms All
Reason For Encounter (RFE) subset
Primary-Key to Secondary-Key
Genus to Species
Species to Genus
4.1 Terms All
The “Terms All” table contained two fields – one was the “name” of the term and the
other its “docle”. Mixed amongst them were primary-keys, ICD codes, and aliases.
There were 26,704 records. An example of the data is shown below…
Name
uterinePolypSurgery
Docle
surg.uter@polyp
[email protected]@=@rvt
rvt
[email protected]@=@bcs
bcs
icd9@451@[email protected]
infl.vein
arterialBlood@baseExcess
arteb@basee
Anti viral prescription
pres@anti@vira
[email protected]@=@dvt
dvt
arterialBleeding
blee@arte
infection<ascaris@lumb-ricoides
infe<asca@lumbrico
Autoantibodies - Glomerular basement membrane
autoa@gbm
Brain Injury
deficiency@proteinS@acquired
inju.brai
defi@prots@acqu
biopsyCervix
surg.cerv@biop
infection<arenaVirus
Autoantibodies - Gastric parietal cell
infe<arenv
autoa@gpc
cervix@dilatation
cerv@dila
uterineHypoplasia
Brain cyst
dysplasticBarrettsEsophagus
hypoplas.uter
cyst.brai
infl.esop@barr@dysplast
Don Walker (April 2004)
8
The Docle System Report
infection<arboVirus
Autoantibodies - Extractable nuclear antigen
infe<arbov
autoa@ena
uterineFibromyoma
[email protected]@=@tietd
fibroid
tietd
X-ray - Discography
xr@discogra
xxxKaryotype
Brain CT
xxxk
ct.brai
Table 2: A sample of the “Terms All” data supplied
4.2 Reason for encounter
The Reason for Encounter (RFE) subset file was a subset of the Terms-All data. It
contained the “Terms” and docles imbedded in reporting text (“seek view” and
&ctx@seek@view[...]”).
There were 18,920 RFE records. Some 116 “new entries” were not included in the
“Terms-All” data file - they were added during preparation of the data.
An example of the RFE subset is shown below…
Name/Primary Key
seek view possibleDiabetesMellitus
seek view suspectDiabetesMellitus
seek view possibleHIVCarrier
seek view
possibleHumanImmunodeficiencyVirusInf
ection
seek view queryHIVInfection
seek view diabetesMellitusControlled
seek view hyperlipidemiaControlled
seek view hyperTensionControlled
seek view helicobacterPyloriEradication
seek view cardioversionFailed
seek view familyHistoryBowelCancer
seek view familyHistoryCarcinomaBowel
seek view systemReviewCardiovascular
seek view
diabetesMellitusNilFamilyHistory
Don Walker (April 2004)
Docle/Secondary Key
&ctx@seek@view[&ctx@?,diabm]
&ctx@seek@view[&ctx@?,diabm]
&ctx@seek@view[&ctx@?,infe<hiv]
&ctx@seek@view[&ctx@?,infe<hiv]
&ctx@seek@view[&ctx@?,infe<hiv]
&ctx@seek@view[&ctx@eval[diabm],outx[,g
ood,cont]]
&ctx@seek@view[&ctx@eval[hype@lipi],out
x[,good,cont]]
&ctx@seek@view[&ctx@eval[hypet],outx[,go
od,cont]]
&ctx@seek@view[&ctx@eval[infe<helib@py
lori],outx[,cure]]
&ctx@seek@view[&ctx@eval[ppoc@cardiov
e],outx[,fail]]
&ctx@seek@view[&ctx@fh[carc.bowe@larg]
]
&ctx@seek@view[&ctx@fh[carc.bowe@larg]
]
&ctx@seek@view[&ctx@hx[hx@cvs]]
&ctx@seek@view[&ctx@no,fh@diabm]
9
The Docle System Report
seek view
noFamilyHistoryDiabetesMellitus
&ctx@seek@view[abdomen@indigestion]
seek view Abdomen - Indigestion
seek view dyspepsia
seek view Epigastric discomfort
seek view epigastricDiscomfort
seek view indigestion
&ctx@seek@view[abdomen@murmur]
seek view Abdomen - Murmur
seek view abdomenBruit
seek view Abdominal bruit
seek view Bruit - abdomen
seek view bruitAbdomen
&ctx@seek@view[abdomen@pain&ctx@i
ll,undiagnosed]
seek view undiagnosedAbdomenPain
&ctx@seek@view[[email protected]
rium]
seek view Abdominal pain - Epigastric
seek view Epigastric pain
seek view epigastriumPain
seek view Pain - epigastrium
seek view painEpigastrium
&ctx@seek@view[&ctx@no,fh@diabm]
&ctx@seek@view[abdo@indi]
&ctx@seek@view[abdo@indi]
&ctx@seek@view[abdo@indi]
&ctx@seek@view[abdo@indi]
&ctx@seek@view[abdo@indi]
&ctx@seek@view[abdo@indi]
&ctx@seek@view[abdo@murm]
&ctx@seek@view[abdo@murm]
&ctx@seek@view[abdo@murm]
&ctx@seek@view[abdo@murm]
&ctx@seek@view[abdo@murm]
&ctx@seek@view[abdo@murm]
&ctx@seek@view[abdo@pain&ctx@ill,undi]
&ctx@seek@view[abdo@pain&ctx@ill,undi]
&ctx@seek@view[[email protected]]
&ctx@seek@view[[email protected]]
&ctx@seek@view[[email protected]]
&ctx@seek@view[[email protected]]
&ctx@seek@view[[email protected]]
&ctx@seek@view[[email protected]]
Table 3: A sample of the “Reason For Encounter” data supplied
4.3 Primary Key - Secondary Key
The “Primary Key - Secondary Key” table was supplies at the request of the author. It
was needed to enable “preferred terms” to be identified. The table contained two
fields. The “primary-key” fields contained the “long hand” version of the docle, while
the “secondary-key” field contained the shorter docle.
There were 5,961 records.
4.3.1 Data sample
A sample of the data is shown below…
Primary_Key
Docle
mri.bone
skin@bruising@umbilicus
mri.bone
skin@brui@umbi
surgery.jejunum
cerebralPalsy
immunisation@japaneseEncephalitis
surg.jeju
cerep
immu@japae
hemoGlobin@h@incl-usions
hemog@h@inclusio
feeling=
infection<herpesSimplexVirus
feel=
infe<hsv
Don Walker (April 2004)
10
The Docle System Report
ct.spine@neck
ct.spin@neck
edema.lung
immunisation@infl-uenzae
edem.lung
immu@influenz
coma@hypo@glucose@pre
coma@hypo@gluc@pre
aneurysm.aorta@thorax
aneu.aort@thor
inflammation.larynx
osteoArthritis.ankle
lesion.subthalamicNucleus
injury&ctx@ill,pres-ent
infl.lary
ostea.ankl
lesi.subtn
inju&ctx@ill,present
lesion.substantiaNigra
osteoArthritis
preventiveCare@osteoPorosis
lesi.subsn
ostea
prevc@ostep
preventiveCare@lung
prevc@lung
preventiveCare@hyperTension
prevc@hypet
edema
dislocation.hip@congenital
edem
disl.hip@cong
coma@hypo@glucose
coma@hypo@gluc
feeling*l
birthControl
cerebralEdema
eczema@vein
feel*l
birtc
ceree
ecze@vein
autoHemolysis@test
autoh@test
fracture.vertebra@t12@crus-h
frac.vert@t12@crush
injury&ctx@ill,powe-red@non
inju&ctx@ill,powered@non
inflammation.lacrimalDuct
infl.lacrd
Table 4: A sample of the “Primary key - Secondary key” data supplied
Notes:
o Case is used to delimit words in phrases eg. “preventiveCare” and
“birthControl”.
o 101 docle Secondary Keys were not unique – ie. The table contained aliases. A
few examples are shown below…
Don Walker (April 2004)
Primary key
Secondary key
Mittleschmerz
mitt
mittelschmerz
mitt
nsu
nsu
11
The Docle System Report
Primary key
Secondary key
nonSpecificUrethritis
nsu
OCD
ocd
obsessiveCompulsiveDisorder
ocd
Pneumonitis
pneu
pneumonia
pneu
tremor
trem
trembly
trem
Table 5: A few examples of the 101 non-unique secondary-keys
4.4 Genus to Species
The “Genus to Species” table contained 851 records of a single variable length text
field containing tab-delimited docles. The first data item of each line was the
“Genus”. Subsequent items in each line were the “Species” associated with each
“Genus”. The “^” character was used to denote a genus docle.
4.4.1 Data Sample
A sample of the data is shown below…
&ctx@hx[hx@cvs]^ &ctx@hx[hx@cvs]
abdo@pain^ [email protected] abdo@pain@chro abdo@pain@rebo [email protected]
abdo@pain@guar [email protected] abdo@pain@touc abdo@rigi abdo@pain [email protected]@colic
[email protected] [email protected] [email protected] abdo@pain&ctx@ill,undi [email protected]
[email protected] abdo
abdo@swel^ [email protected] [email protected] [email protected] [email protected] [email protected]
abdo@swel@puls abdo@swel@mass [email protected] [email protected] [email protected]
abdo@swel [email protected]
abdo@upse^ [email protected] abdo@pain@chro abdo@pain@rebo [email protected]
abdo@pain@guar [email protected] abdo@pain@touc abdo@rigi abdo@upse abdo@pain
[email protected] abdo@rumb [email protected] [email protected] abdo@pain&ctx@ill,undi
[email protected] [email protected] abdo
abor^ abor@thre abor@part abor@miss abor@recu abor abor@drug abor@seps abor@inev
absc.brai^ absc.brai<enta@histolyt absc.subd absc.brai
absc.live^ absc.live<enta@histolyt absc.live
absc.lung^ absc.lung<enta@histolyt absc.lung
absc.skin^ absc.skin furu absc.skin@mult
absc^ absc.brai absc.gallb absc.epid absc.lung absc.bartg absc.live absc.pleu absc.subd
absc@perin absc.panc absc.brai<enta@histolyt absc.live<enta@histolyt absc absc@extrd
absc.lung<enta@histolyt absc@peria absc@subp absc@ischr absc.pelv absc.kidn absc.br
abus@alco^ encep@wern infl.stom@alco withdraw@alco abus@alco hepa@alco
abus@laxa^ melanosi.colo abus@laxa
abus@subs^ abus@drug@iv abus@volas abus@opia withdraw@drug prob@alco abus@coca
abus@laxa abus@benzd withdraw@benzd withdraw@opia withdraw@synd
impa@visi&ctx@sequ<,abus@toba abus@toba abus@amph abus@halla abus@subs
abus@gambling abus@alco abus@lysea abus@dr
Don Walker (April 2004)
12
The Docle System Report
abus@toba^ abus@toba impa@visi&ctx@sequ<,abus@toba
accu@drug^ accu@drug overdose@drug
acidosis^ acidosis@renat acidosis@resp acidosis acidosis@diabm acidosis@metab
acidosis@metab@aniog@norm acidosis@metab@aniog@incr
acne^ acne&ctx@sequ<,drugr acne acne&ctx@ill,wors
adhe.fallt^ adhe.fallt
adhe.labi^ adhe.labi fusi.labi
adjud^ adjud
Table 6: A sample of the “Genus to Species” data supplied
4.5 Species to Genus
The “Species to Genus” table contained 3126 records of a single variable length text
field containing tab-delimited docles. The first data item of each line was the
“Species”. Subsequent items in each line were the “genera” associated with the
“Species”. The “^” character was used to denote the genus docles.
4.5.1 Data Sample
A sample of the data is shown below…
abdo@indi systr@gasti^ abdo@upse^
abdo@pain systr@gasti^ abdo@pain^ pain^ abdo@upse^
abdo@pain&ctx@ill,undi abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected] abdo@pain^ pain^ abdo@upse^
[email protected]@colic abdo@pain^ pain^
[email protected] abdo@pain^ pain^ abdo@upse^
abdo@pain@acut abdo@pain^ pain^ abdo@upse^
abdo@pain@chro abdo@pain^ pain^ abdo@upse^
abdo@pain@guar abdo@pain^ pain^ abdo@upse^
abdo@pain@rebo abdo@pain^ pain^ abdo@upse^
abdo@pain@touc abdo@pain^ pain^ abdo@upse^
abdo@rigi abdo@pain^ pain^ abdo@upse^
abdo@rumb abdo@upse^
abdo@swel abdo@swel^
Don Walker (April 2004)
13
The Docle System Report
[email protected] abdo@swel^
[email protected] abdo@swel^
[email protected] abdo@swel^
[email protected] abdo@swel^
Table 7: A sample of the “Species to Genus” data supplied
5 Preparation
The suppled data files were loaded into a database environment, processed, converted
and manipulated so the form of their data suited the structure of the poly-browser, into
which they were to be imported. The broad stages are outlined below…
5.1 Expand the “Terms-All” data
The “Terms-All” data was imported. The original record count was 26,704. About
116 additional terms were added from the “Primary Key - Secondary Key” and
“Reason for Encounter” data. Atomic Concepts that were implied but not specified
were added. After merging some 2,776 of these, the Terms-All list became 29,596
5.2 Identification of “Atomic Concepts”
“Atomic Concepts” were extracted from “docles” and their “primary keys”. The total
number was 3,828. Those that had not been explicitly stated in the supplied files
numbered 2,776. They were created and added to the above Terms-All data.
5.3 Creation of a “Unique Docle List”
From the expanded Terms-All data, a “Unique Docle List” was created. 8,640 unique
docles were identified. Flags indicating “Atomic Concepts” were added
5.4 Creation of “Preferred-Terms”
The arbitrary “preferred terms” for each docle were selection for each record in the
above “Unique Docle List”. This was done by:
1) Deriving the name from the structure of the “primary key”
…or failing that…
2) The longest name that was otherwise extracted or derived.
5.5 Creation of Concept and Operator usage
The “top-2-preceding” and “top-2-following” operators used with each unique atomic
docle were recorded. This was done more out of interest than necessity. The most
used atomic concepts (in the Docle System) and their top-two-adjacent operators are
shown in the table below…
Don Walker (April 2004)
14
94
100
Apropro
4
0
323
Injury
Starting
99
Apropro
1
178
136
133
108
98
Ppoc
S
Pain
Fracture
Prescription
Starting
Starting
Apropro
Starting
Starting
86
99
92
72
100
Apropro
Apropro
Starting
Apropro
13
1
8
28
0
Due to
Contextual
modifier
Code shear
character
Apropro
Apropro
Ending
Located at
Apropro
94
92
89
86
86
81
70
65
63
Skin
Extirpation
Urine
Xr
X Ray
Neoplasm
Lesion
Abdomen
Inflammation
Starting
Apropro
Starting
Starting
Starting
Starting
Starting
Starting
Starting
58
100
96
100
100
84
80
55
97
Located at
34
0
4
0
0
15
20
32
3
Apropro
Ending
Apropro
Located at
Located at
Located at
Located at
Apropro
Located at
58
57
50
49
47
46
Mc
Carcinoma
Breast
Spine
Eye
Class
Apropro
Starting
Located at
Located at
Located at
Apropro
67
93
78
88
62
100
Starting
Apropro
Starting
Starting
Starting
33
7
20
10
32
0
Apropro
Located at
Apropro
Apropro
Apropro
Ending
46
45
44
44
44
Knee
Abscess
Bladder
Chest
Serology
Located at
Starting
Located at
Starting
Starting
89
56
89
82
100
Starting
Apropro
Apropro
Located at
9
44
9
18
0
Apropro
Located at
Apropro
Apropro
Apropro
43
Hyper
Starting
93
Apropro
7
Apropro
43
42
42
Swelling
Allergy
Problem
Apropro
Starting
Starting
98
74
100
Starting
Apropro
2
24
0
Ending
Apropro
Apropro
40
Immunisation
Starting
80
15
Apropro
37
37
35
Ear
Vagina
Sh
Located at
Located at
Starting
81
62
100
Contextua
l modifier
Starting
Starting
14
32
0
Apropro
Apropro
Apropro
35
34
Uterus
Deficiency
Located at
Starting
94
100
Starting
6
0
Apropro
Apropro
34
33
Neck
Hypo
Apropro
Starting
53
88
Starting
Apropro
24
12
Apropro
Apropro
33
33
32
32
Lung
Vein
Kidney
Us
Located at
Located at
Located at
Starting
88
82
69
100
Starting
Apropro
Apropro
9
15
25
0
Apropro
Apropro
Ending
Located at
31
31
Biopsy
Feeling
Apropro
Starting
52
77
Starting
Apropro
45
23
Apropro
Apropro
76
64
10
0
77
10
0
59
10
0
64
76
59
10
0
68
68
30
30
Hip
Joint
Located at
Starting
93
43
Starting
Located at
7
33
Apropro
Apropro
87
57
361
355
Don Walker (April 2004)
Apropro
Apropro
Apropro
Located at
Apropro
PostOperator_
NextPerCent
PreOperator_
NextPerCent
0
0
PostOperator_
NextType
PreOperator_
NextType
Apropro
PostOperator_
TopPerCent
PreOperator_
TopPerCent
99
100
Infection
Ill
Starting
Code shear
character
Starting
Apropro
753
405
PostOperator_
TopType
PreOperator_
TopType
Surgery
Ctx
Total_Use_In_
Compounds
Derived_Name
The Docle System Report
Located at
Apropro
93
10
0
74
99
Apropro
6
0
Located at
Ending
18
1
89
Located at
10
83
99
44
69
99
Located at
Ending
Apropro
Ending
Code shear
character
Ending
Apropro
Abnormal
Apropro
Apropro
Apropro
Apropro
Ending
8
1
29
28
1
Ending
Ending
Ending
Ending
Ending
Code shear
character
Ending
Ending
Ending
Ending
38
2
24
22
23
7
83
60
84
73
73
73
61
83
10
0
62
98
66
71
62
93
80
44
64
89
10
0
10
0
65
86
10
0
83
13
30
12
27
27
23
30
17
0
20
40
30
11
0
0
Located at
Ending
23
14
0
Ending
18
Ending
Ending
22
31
0
Ending
20
0
Ending
41
0
Ending
Ending
Apropro
33
18
41
0
Ending
Abnormal
low
Ending
Ending
32
23
13
43
15
The Docle System Report
Table 8: A sample of the most used atomic concepts (in the Docle System) and their top-two-adjacent
operators
5.6 Creation of other “Terms” data
The various terms that describe each concept were extracted from the “Terms-All”
table and placed in a “Term” table. These comprised the “preferred terms” and their
“synonyms” – i.e. Docle System “aliases”.
5.7 Creation of CODE data
The ICD9 and ICD10 codes were extracted from the “Terms-All” data table. From
these a “code-table” was created. To the code-table were added the Docles. Thus, the
ICD codes and docles were retained and were related to their original concept.
There were 1,550 ICD9 codes and 1,505 ICD10 codes.
5.8 Creation of LIST data
To preserve derived data of potential interest, two “Lists” of concepts were created.
They were: (1) List of Atomic Concepts, and (2) List of Reason for Encounter (RFE)
concepts.
5.9 Creation of the Hierarchy
Before describing the creation of the “hierarchy table” used by the poly-browser, a
brief discussion about the Docle System hierarchies may be helpful.
5.9.1 Discussion
The Docle System was described as having three levels of classification: “Phylum”,
“Genus” and “Species”. These are discussed below…
5.9.1.1 Level-0 - the “Phylum”
Unfortunately the first level of the Docle System has not been created “because the
system has been constructed from the bottom-up rather than from the top-down” (Dr
Kuang Oon).
Documentation1 provided with the Docle System suggested that the entries might
be…
a) Medical Administration
b) Symptoms Signs
c) Diagnostic Non Imaging
d) Diagnostic Imaging
e) Procedures Process of Care
1
“Unitary Health Language – DocleScript” by Dr Yeong Kuang Oon of Docle Systems P/L, 29 Darryl
St, Scoresby, Vic 3179 Australia 03-97638935
Don Walker (April 2004)
16
The Docle System Report
f) Therapeutics
g) TAMTAP- (Thinking About Medical Thinking and Practice)
h) Reason for encounter
i) Clinical Domains
j) Context
The absence of this level meant that the top level of the Docle system was an
unorganised list of more than 700 concepts.
5.9.1.2 Level-1 - the “Genus”
The first level of the Docle System hierarchy was called the “Genus”. It was an
unorganised list of 719 unique docles. A Genus concept had the “^” character added
to its docle in the “Genus to Species” and “Species to Genus” tables provided. Each
genus had one or more children (or “Species”).
5.9.1.3 Level-2 – the “Species”
The second level of the system was the “Species”. Each species had one (or more)
genus as its parent. The species numbered 2,588 unique concepts.
5.9.1.4 Docle Orphan Concepts
There were many concepts that had not been classified. They were neither Genus nor
Species. In the poly-browser they were allocated the parent “Docle Orphan Concept”.
They numbered 5,587.
5.9.1.5 Cyclic relationships in the Genus-to-Species table
If a level-1 (“Genus”) concept is also in its level-2 (“Species”), then the hierarchy is
illogical and “cyclic”. In other words, a “parent” must not be its own child.
In the Genus-to-Species table supplied there were 798 “apparent cyclic relationships”.
Note: The “apparent cyclic relationships” occurred because the parent genus is a
“metaclass” in the Docle-System” where a child (species) may be its own parent metaclass.
24 Species Docles were absent and 13 Genus Docles were absent from the supplied
“Terms-All” table. They thus had no “term”.
In the Species-to-Genus supplied there were 811 “apparent cyclic relationships”. All
Species Docles occurred in the supplies “Terms-All” table, however, 13 genus Docles
were absent and therefore had no “term”.
The expanded data contained in the Genus-to-Species and Species-to-Genus tables
were expected to be the same, but it was not so. Some 415 relationships had to be
added to the Genus-to-Species table from the Species-to-Genus table. The resulting
merged table contained 3,566 relationships, of which 24 Species Docles were absent
and 13 Genus Docles were absent from the supplied “Terms-All” table.
Don Walker (April 2004)
17
The Docle System Report
All “apparent cyclic relationships” were avoided.
5.9.1.6 Poly-hierarchies
227 level-1 (Genus) concepts were also children (Species) of other level-1 concepts.
279 Level-2 (Species) concepts were also parents (Genera) of other level-2 concepts
These relationships form a poly-hierarchy (i.e. concepts with in multiple-hierarchies).
A poly-hierarchy best reflects medical concepts in a computer based terminology. It is
used by systems such as SNOMED and MeSH. On the other hand, single hierarchies
tend to be used by classification systems.
ICD9 and ICD10 are somewhat hybrid. They allocate different codes for the same concept in
different hierarchical contexts (e.g. “dagger codes”).
Note: A “Linnean hierarchy” is used by the creator of the Docle system to depict its
structure. However, Linnaeus described and used a classification hierarchy in which
items occurred only once.
5.9.1.7 Cyclic relationships in a poly-hierarchy
A poly-hierarchy has the potential to be cyclic. This occurs when an ancestor or
parent is also a child or descendent. An example and explanation is shown below…
Generations
1
A
B
C
2
B D C
E A F
G B H
[B] D C
E A F
3
E [A] F
G B H
Dia
gram 1: An example of “cyclic poly-hierarchies”
An example of cyclic poly-hierarchies is shown in the above diagram. Consider a
three first generation concepts called A, B, and C. Let us suppose their children (the
second generation) are “B, D, C”, “E, A, F” and “G, B, H” respectively. In this
example, the first generation also exists in the second. Consequently each has their
descendents, forming a third generation. Those shown in square brackets are cyclic, as
they are their own ancestor.
Thirty poly-hierarchy cyclic relationships were identified in the docle data supplied.
The offending relationships were deleted. The ten concepts involved were…
Don Walker (April 2004)
18
The Docle System Report
Hypo gluco Carticoid
In Sufficiency adrenal Gland
Pneumoconiosis
S creatinine Urea Electrolytes
Withdrawal drug
Hypo mineralo corticoid
Occupational Lung Disease
Pulmonary Fibrosis
S electrolyres
Withdrawal syndrome
Table 9: The concepts that had cyclic poly-hierarchy relationships
5.9.1.8 The “Type” of a Hierarchical relationship
Hierarchical relationships are of the “type” of relationship called “is-a”. A child “is a
….” of its parent. The characteristics of an ancestor in such a hierarchy also apply to
its descendents – i.e. they may be “inherited”.
5.9.2 Creation of the Hierarchy Table
Hierarchical relations (“is-a” type relationships) define parents and children in the
poly-browser. They are often described as “vertical relationships” They were implied
in the supplied Docle System table called “Genus to Species” and “Species to Genus”.
These were used to derive the hierarchical relationships of the Docle System in the
poly-browser.
The hierarchical relationships are stored in a “Hierarchical Table” in the polybrowser. It has two essential fields – (1) the Concept, and (2) its Parent. In computing
terms it is an “acyclic directed graph”. An overview of the hierarchical relationships is
shown below…
Top Polybrowser Concept
Other Systems…
DOCLE Operator Concept
8
Operator Concept 1
Operator Concept 2
Operator Concept 3…
DOCLE System
3
DOCLE Classified Concept
719
Genus Concepts 1
Genus Concepts 2
Genus Concepts 3…
DOCLE Unclassified Orphan Concept
5587
Orphan Concept 1
Orphan Concept 2
Orphan Concept 3…
Species Concept 1
Species Concept 2
Species Concept 3…
Species Concept 4
Species Concept 5
Species Concept 6…
Diagram 2: Overview of the hierarchical relationships at the top of the Docle tree
Don Walker (April 2004)
19
The Docle System Report
5.10 Creation of Non-hierarchical Relationships
5.10.1 Relationship-types in SNOMED
A concept may be related to other concepts with “types” of relationships other than an
“is-a” type. These form “non-hierarchical relationships”. An example of the
relationship types used by SNOMED is shown below…
Undefined
Approach
Associated morphology
Direct device
Episodicity
Has definitional manifestation
Interprets
Onset
Priority
Severity
Access instrument
Has specimen
Recipient category
Specimen source morphology
Subject of information
Is a (i.e. hierarchical type)
Associated etiologic finding
Causative agent
Direct morphology
Finding site
Has focus
Method
Part of
Procedure site
Temporally follows
Component
Indirect morphology
Specimen procedure
Specimen source topography
Access
Associated finding
Course
Direct substance
Has active ingredient
Has intent
Occurrence
Pathological process
Revision status
Using
Has interpretation
Laterality
Specimen source identity
Specimen substance
Table 10: An example of the relationship types used by SNOMED
5.10.2 Creation of the Non-hierarchical relationships
As explained above, non-hierarchical relationships occur in a system when a concept
is related to another with a type of relationship that is other than “is-a”. They are often
referred to as “lateral relationships”.
5.10.2.1 The Docle Operators
The Docle System contained compound-docles that were constructed from atomicdocles that were joined by “operators”. The operators, there meaning and their
explanation are shown below.…
Operator
Meaning
Example of Use
APROPROS
detachment@retina is read as detachment apropos /
associated with retina.
.
LOCATED AT
tuberculosis.kidney (tube.kidn) is read as tuberculosis located
at kidney.
>
LEADING TO
back@pain>buttock (back@pain@butt) is read as backpain
radiating to buttock
<
DUE TO
pneumonia<virus (pneu<viru)
:
DESCRIBED AS
pain:dull is read as pain described as dull.
QUANTIFICATION
breast@lump,%2cm means lump at breast 2 cm in size.
@
%
Don Walker (April 2004)
20
The Docle System Report
Operator
Meaning
Example of Use
/
INCREASED
chest@pain/swallow (ches@pain/swal) reads as chest pain
increases with swallowing.
\
DECREASED
chest@pain\food (ches@pain\food) reads as chest pain
decreased with food.
=
NORMAL
wcc= reads as white cell count is normal
*
ABNORMAL
fbe*
*l
ABNORMAL low
wcc*l means whiteCellCount abnormal low
*h
ABNORMAL High
wcc*h means whiteCellCount abnormal high
History of
Surg! means Surgery history (of)
Code Shear character
&ctx@ill is the illness contextual organizer that can be
sheared off
contextual modifier
,%2cm is value 2 cm
!
&
,
Table 11: The Docle System operators, there meaning and their explanation
It was noted that despite the above approach, some use was made of docles (i.e.
concepts) that appeared to have the same function as the above operators. They
included…
Docle
abno
decr
incr
norm
Name
abnormal
decreased
increased
normal
Table 12: Docle System concepts that appeared to
have the same function as “Docle Operators”
5.10.2.2 Docle Non-Hierarchical Relations
Non-hierarchical relationships were derived from the content and syntax of the docle.
To enable the Docle System to co-exist with other systems in the poly-browser some
modifications were made. These involved treating some docle-operators as concepts.
They were thus created as “DOCLE operator concepts” and are shown in the table
below…
Docle Operator
Don Walker (April 2004)
Concept Name
/
Increased (Docle operator)
\
Decreased (Docle operator)
21
The Docle System Report
Docle Operator
Concept Name
=
Normal (Docle operator)
*
Abnormal (Docle operator)
*l
Abnormal low (Docle operator)
*h
Abnormal high (Docle operator)
?
Possible
!
History of
Table 13: “Docle Operators” that were converted to “Concepts”
When these were used the “Relation Characteristic” in the poly-browser became
“Qualifier”. (The “Relation Characteristic” is a refinement incorporated into the polybrowser and used by SNOMED.)
When loaded into the poly-browser, the non-hierarchical “relationship-types” used by
the Docle-System (and derived from docle-operators), were…
Relationship-Type
Docle Operator Prefix
*Docle Starting
`
Docle Apropos
@
Docle Located at
.
Docle Leading to
>
Docle Due to
<
Docle Described as
:
Docle Quantification
%
Docle Context Modification
,
* The first relationship-type was created as a generic solution to
undefined “starting atomic concepts”.
Table 14: The non-hierarchical “relationship-types” used by the DocleSystem (and derived from docle-operators)
Docle expressions following the “&” (ampersand) or “,” (comma) operator were
allocated the “Relation Characteristic” of “Additional” indicating contextual
information.
Don Walker (April 2004)
22
The Docle System Report
5.11 Allocation of “Unique Identifiers”
Docle “Concepts” and “Terms” were given unique identifiers to be used by the polybrowser. Numbers started at 6,020,001.
6 Loading the Docle System into the Poly-Browser
The prepared data described above was imported into the poly-browser. When loaded,
the poly-browser ran the following tests and operations
o Test for cyclic hierarchical relationships
o Test for redundant siblings
o Count of children
o Build enhanced keywords
o Rebuild Poly-browser lexicon
6.1 Docle’s Appearance in the Poly-Browser
The appearance of the Docle System in the Poly-browser is shown in the following
pictures. A few SNOMED examples are included for comparison…
Picture 1: A list of all the Docle concepts
Don Walker (April 2004)
23
The Docle System Report
Picture 2: The top of the Docle hierarchy
Picture 3: The Docle “Operator” concept hierarchy
Don Walker (April 2004)
24
The Docle System Report
Picture 4: The hierarchy for “Abscess”
Picture 5: The first hierarchy for “Abscess brain” - it has three parents.
Don Walker (April 2004)
25
The Docle System Report
Picture 6: The second hierarchy for “Abscess brain” - it has three parents.
Picture 7: The third hierarchy for “Abscess brain” - it has three parents.
Don Walker (April 2004)
26
The Docle System Report
Picture 8: The terms for “Abortion Missed”
Picture 9: The relationships and codes for “Abortion Missed”
(Note: Because the poly-browser uses the “@” character as its “wild-card” when searching data, for
technical reasons all Docles appear with their “@” characters changed to the “~” character.)
Don Walker (April 2004)
27
The Docle System Report
Picture 10: The terms for “Missed abortion” in the SNOMED system
Picture 11: The relationships and codes for “Missed abortion” in the SNOMED system
Don Walker (April 2004)
28
The Docle System Report
Picture 12: The relationships and codes for “Chest pain” in the Docle system
(Note: Because the poly-browser uses the “@” character as its “wild-card” when searching data, for
technical reasons all Docles appear with their “@” characters changed to the “~” character.)
Picture 13: The relationships and codes for “Chest pain” in the SNOMED system
Don Walker (April 2004)
29
The Docle System Report
7 Further Reading
The following references were supplied by Dr Kuang Oon
Oon Y. K. ‘Docle - the coding scheme which comes with a free medical belief
system’ HIC-APAMI 1997. Conference proceedings of the Health Informatics Society
of Australia. Sydney, Australia. McGhee S et al (ed), 1997.
Oon, Y. K. ‘The Gelati Syndrome’. HIC 2000. Conference proceedings of
the Health Informatics Society of Australia. Adelaide, Australia. Pradhan M et al (
ed), 2000.
Oon, Y. K. ‘DocleScript-unitary health language’ . Conference proceedings of
the NCCH 7th Biennial Conference 2001 Potts Point, Sydney 2001.
Oon, Y. K. ‘A unified theory of medical informatics based on multi and
distinct semiotic forms’. Proceedings of the NCCH 8th Biennial Conference 2003
Melbourne 2003.
Oon Y.K. ‘The emoticon charged Docletalk interface language’ Proceedings
RACGP/HIC conference 2003 Sydney 2003
Oon Y. K. ‘Docletalk/DocleScript- the language and its implementation’
Docle Systems 29 Darryl St, Scoresby, Vic 3179, Australia. Book to be released Late
2003
8 Analysis
An analysis of the DOCLE System revealed the following…
8.1
Supplied data
Tables: The supplied tables have been described in detail earlier. They contained a
somewhat bizarre mixture of data that was not easy to understand.
Documentation: Documentation of the data structure was absent.
Processing required: Processing required to load the supplied data into the polybrowser was large. Major parsing and re-building of the supplied data tables was
required before they were able to be managed.
8.2 Content
Size: The size of the files was small.
Scope: The scope of the data was limited but presumably it is appropriate for general
practice.
Don Walker (April 2004)
30
The Docle System Report
Detail: Detail was limited as would be expected in a small system designed for and
used by general practitioners. However, presumably it is adequate as it is widely used.
Complexity: Because the system is a poly-hierarchy with semantic interrelationships
it is more complex than most.
Pre-coordination: The level of pre-coordination in DOCLE was low. In the author’s
opinion, the numbers of words in the concept descriptions offer a comparative
measure of pre-coordination. Table 15 below shows the count of words in the
descriptions for the most to the least quartiles for terms that describe the DOCLE
concepts. Others have been included for comparison.
Average word count in descriptions
Most words
1/4
2nd 1/4
3rd 1/4
Least words
1/4
CATCH Terms
5.6
3.3
2.3
1.3
DOCLE Terms
4
3.6
2
1.4
ICD-10-AM Terms
10.4
6.2
4.4
2.8
ICPC-2 Rubric
6.3
4.29
3.36
2.12
ICPC-2 Plus Interfaceterm-concept
4.8
3.05
2.14
1.7
SNOMED-CT Terms
8
4.7
3.4
2.0
Description Type
Table 15: The average words used in Terms might give a measure of pre-coordination
Readiness for use: The descriptions and DOCLE identifiers are used widely in a
current GP system. The system is not ready for general use, particularly regarding its
hierarchies and mappings to ICD10am classification system. It would require a
additional work before the potential of the system could be realised.
Concept identifiers: Docles are the concept identifiers. They have meaning and
contain the semantic interrelationship data. In a compound concept, they therefore
identify the contained atomic concepts and their relationships.
Concept descriptions: Main description (preferred terms) were supplied only after a
specific request. Synonyms seemed plentiful.
Hierarchies: The system has a poly-hierarchy. It is largely undeveloped. It seems to
contain semantic “isa” relationships. These are of the usual type designed for
navigation, where a child of a concept may be its parent plus the addition of an
additional attribute e.g. “Back pain” is a child of “pain”.
Interrelationships: The Docle identifiers define the semantic interrelationships
between concepts. All compound (pre-coordinated) concepts are related to their
atomic components.
Associated data: There was no associated data supplied
Mappings: A limited and simple “mapping” to ICD9 and ICD10AM is provided.
Don Walker (April 2004)
31
The Docle System Report
Supporting features: “Docle script” was described in documents for further reading.
Its usefulness, functions, and robustness were not able to be estimated.
System maintenance: There are several aspects to system maintenance. They
include…
(a) Organisation: The organisation supporting the system is one person.
(b) People: Dr Oon alone maintains the system, including the creation of new terms.
(c) Customers: The system has only one significant customer - Health Communication
Network (HCN) which is the vendor of the “Medical Director” software. HCN in
turn support many end-user general practitioners.
(d) Content: GPs ask HCN to include various concepts. Dr Oon creates new concepts
and adds terms according to requests received from HCN.
(e) Computer tools: Dr Oon has developed his own computer tools to help in the
creation and maintenance of the system’s contents. Details of these tools are not
known.
(f) Validation methods: Presumably some validation software is used. Finding only a
few cyclic relationships in the hierarchy would suggest this. The fact that any
were found perhaps suggests that the tools could be improved.
(g) Updates: The system appears to be updated regularly in accordance with the needs
of HCN and its users.
8.3
Machine located contents
8.3.1 Finding GP Vocabulary terms
The report “Target System Analysis Using Term Matching Techniques” contains the
results of looking for the contents of the GP Vocabulary terms in DOCLE.
The pie-graph Figure 1 below, compares the number of GP-Vocabulary terms that
were found to be “similar” in the five systems examined.
CATCH, 961, 8%
ICPC, 1540, 13%
SNOM ED-CT,
4915, 39%
ICD10AM , 1657,
14%
DOCLE, 3186,
26%
Figure 1: Pie-graph of the number of terms that were “similar” to those of the
GP-Vocabulary test terms, for each target system.
Don Walker (April 2004)
32
The Docle System Report
The GP Vocabulary terms were machine matched to the terms in the target systems.
The results were then analysed according to the frequency of use of the GP terms in
the field. Matches were either “similar to” (blue-lower), “contained the GP term”
(red-middle), or they were “unmatched” (yellow-upper). (See the graphs in Figure 2
below).
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
The most used 400 GP Vocabulary terms
CATCH
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
ICD10AM ICPC2Plus
SNOMEDCT
The most used 3000 GP Vocabulary terms
CATCH
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
DOCLE
DOCLE
ICD10AM ICPC2Plus
SNOMEDCT
All 11,583 valid GP Vocabulary terms
CATCH
DOCLE
ICD10AM ICPC2Plus
SNOMEDCT
Figure 2 Matches according to the source-terms frequency of use in the field. Matches that
were “similar to” are in blue at the bottom, “contained the GP term” are red in the middle
and “non-matching” are yellow at the top.
8.3.2 Terms in SNOMED CT that were also in other systems
An analysis was done of the terms that were the same (or “similar”) in SNOMED CT
and CATCH, DOCLE, and ICPC2Plus.
Matching ICD10AM-NCCH terms in SNOMED CT has been omitted because:
1. ICD10AM-NCCH terms were very numerous (335,451), being lexical variants and computer
constructs from the index of the book. A very large number did not match. The results were
Don Walker (April 2004)
33
The Docle System Report
2.
deceptively bad at the term level, and less so at the concept level.2 This project was focussed
on linking ‘Terms”, not “Concepts”.
SNOMED CT was building mapping tables and rules to enable automatic coding of its
interface concepts into ICD10. This work is complex and in progress.
Matching was on the same basis as that described in the report titled: “Target System
Analysis using Term Matching Techniques”. The terms that were considered
“similar” were those that had any of the following matching characteristics…
• Exact Match between terms
• All the Same-words matched
• All Words less ''Stop-words'' matched
• Equivalent Term matched
• Same-words less ''Exclude-words'' matched
% of "Similar"
TERMS
CONCEPTS of the
"Similar" terms
% of CONCEPTS
that were "Similar"
6249
2375
22.4
1356
21.7
DOCLE - All
DOCLE - Atoms
DOCLS - Others
23352
5233
18119
8652
3736
4916
6111
2803
3308
26.2
53.6
18.3
4340
2321
2019
50.2
62.1
41.1
2430
14811
ICPC2Plus - All
ICPC2Plus – Nat. Lang.
ICPC2Plus - Other
12689
5528
7161
7161
5528
7161
3695
1482
2213
29.1
26.8
30.9
2603
1482
2213
36.3
26.8
30.9
4046
4948
Totals
21987
16334
"Non-similar"
CONCEPTS to be
mapped or added
"Similar" TERMS
in SNOMED
10605
"Non-similar"
TERMS to be
mapped or added
Total CONCEPT
count
CATCH - All
Systems whose
terms were sought
in SNOMED CT
terms
Total TERM count
The result of the analysis is shown in Table 16 below. DOCLE terms were divided
into “atomic” and “other terms”, while ICPC2-Plus terms were divided into their
“natural language terms” and “other terms”.
8230
4893
17241
4312
1415
2897
8994
4558
4046
4948
34465
13763
Table 16: Analysis of the Terms and Concepts from CATCH, DOCLE and ICPC2Plus that are the
same (or “similar”) in SNOMED CT
8.4 Manually located contents
Table 17 below shows the total number of manually created links made between the
GP Vocabulary term subsets and the terms of the target systems. The percentage of
the total number of attempted links is also shown.
2
Using a 23% sample of the 335,451 ICD10AM terms, 5.9% were found to be "similar" to TERMS in
SNOMED. When the CONCEPT belonging to these terms were considered, then the match was a 20%.
Don Walker (April 2004)
34
The Docle System Report
620 chronic
problems
500 ramdom
stratified
terms
1120
combined
terms
Target
System
Total
Links
% of
620
Total
Links
% of
500
Total
Links
% of
1120
CATCH
213
34
153
31
366
33
DOCLE
354
57
274
55
628
56
ICD10AM
190
31
236
47
426
38
ICPC
189
30
210
42
399
36
SNOMEDCT
540
87
430
86
970
87
Table 17: the results of manually-linking 1,120 GP terms to the terms of the target systems
8.4.1
Comparing machine matches and manual linking
Figure 3 below shows the results of (a) machine-matching and (b) manually-linking
1,120 GP terms to the terms of the target systems.
Machine Match
Manual Link
Manual Link
Machine Match
SNOMED-CT
ICPC
ICD10AM
DOCLE
CATCH
1000
900
800
700
600
500
400
300
200
100
0
Figure 3: Graph of the machine-matched and manually-linked terms (from the combined “620 Chronic
Problems” and “500 Stratified Random” subset) that were “similar” for each of the target systems
8.5
Relevance
To general practitioners: The relevance of the Docle System to general practitioners
has several aspects:
(a) Current use: Current use by GPs is very large, as it is the terminology of the most
used medical record system in Australia.
Don Walker (April 2004)
35
The Docle System Report
(b) Content: The content is presumably relevant as it has been created for GPs by
GPs.
(c) Decision support: There has been some use of the system in decision support
software. The Medical Director (MD) software has some drug-disease alerts. The
management of diabetes and asthma via the MD interface are current projects
sponsored by GPCG and DHAC. At this stage, and for the above projects, it might
be assumed the system is adequate.
(d) Research: The Docle System has been “tried in the field” by HCN, GPs, and
Divisions of GP. However, it is believed there has been no research that
specifically tested and compared the Docle System for research use.
To Healthcare generally: The relevance of the Docle System to wider healthcare
domains has several aspects:
(a) Current use: Current use is limited largely to general practice.
(b) Content: Its contents would probably not suite some healthcare fields.
8.6 Ease of use and functionality:
The ease of use and functionality of a system includes the following aspects…
(a) Programmers: The ease of use and functionality of the system for programmers are
a function of…
o Supplied data formats: The bizarre data formats, and their need to be parsed
and processed, could make the use of the system problematic - unless a simple
list of terms with associated Docles was all that was utilised.
o System maintenance and support: The system seems well supported.
(a) Decision support: The ease of use and functionality of the system for builders of
decision support subsystems are a function of…
o Content: The required concepts must be in the system. This depends on system
support. The system seems well supported.
o Relationships: Compound concepts need to be related to their atomic
components. The system is one of the very few that have this feature.
o Hierarchies: The Docle System hierarchies are very deficient. Those that exist
are designed for navigation or data aggregation. However decision support
often refers to a concept and all its descendents. This can be problematic if
descendents have been created by the inclusion of post-coordinated concepts
comprising the parent (or ancestor) plus additional attributes (eg. “Pain in the
ankle” being a child of “Pain”).
The system rates well except for its hierarchies.
(b) Primary users: The ease of use and functionality of the system for its primary users
(namely general practitioners) are a function of…
o Adequate concept scope, detail and maintenance
o Plentiful synonyms acronyms and abbreviations
o Few inappropriate terms
Don Walker (April 2004)
36
The Docle System Report
o The relationships between atomic-concepts and their compound-concept
o Adequate and appropriate hierarchies
o Good interface design:
The system rates well for the first four. The hierarchies could be improved. The
interface design is not a direct issue for the terminology
(c) Secondary users: The ease of use and functionality of the system for secondary
user (e.g. statisticians and researchers) are a function of…
o Adequate content detail
o The existence of atomic relationships on compound concepts
o Mappings to required classification systems
o Appropriate hierarchies
The system provides all the above, however mappings and hierarchies may be
deficient.
8.7 Cost and availability
The cost and availability are unknown. However it is imagined that the system is of
low cost and is readily available.
8.8 Strengths and implications
The DOCLE System has the following strengths and implications…
Strengths
Implications
Because of its inherent design, atomic parts
(concepts) are always created for all
compound concepts (i.e. pre-coordinated
concepts).
Utilisation by decision support and ad hoc
searching is made easier. Pre- and postcoordinated concepts can be better equated.
A poly-hierarchy is provided
Concepts exist in several positions in hierarchies
as they do in reality. The location and management
of concepts and data is made easier
Its content is apparently suited to GPs
The system is suitable for general practice medical
records
It is in wide use by GPs in the “Medical
Director” software application
The system has been widely “field tested” over a
significant time, so it fulfils current needs.
It is used, to a limited extend, in some
decision support applications
The system seems suited to real decision support
software applications
The cost is presumed to be low and its
availability high.
There should be low market resistance based on
cost.
8.9 Weaknesses, consequences and solutions
The following table summarises the system’s weaknesses, points out their
consequences, and offers possible solutions…
Don Walker (April 2004)
37
The Docle System Report
Weaknesses:
Consequence:
Possible solution:
Content is limited to the domain of
General Practice
It may not suit the wider health
domains
The content of the system could
be increase to cover other
domains. This is a matter of
funding.
Support is limited to one individual
Support resources and reliability
are limited
Increase funding to expand
support
Internationally it is a non-standard
system
International decision support,
reporting, application and
research software would tend to
be incompatible with the Docle
System
Build and maintain translation
tables from the Docle System to
an internationally recognised
system, or…
Build and promote the Docle
system so it is adopted
internationally.
Delivered tables are poorly organised
and deficient in content
The system is difficult to utilise
fully in application programs.
Reorganise the data into tidy and
predictable formats
Many atomic concepts are not
specified
There will be problems for
decision support applications
and the interchange ability of
pre- and post-coordinated
concepts
Create and maintain atomic
concepts. This was done when
preparing Docle to be loaded
into the poly-browser.
Concept identifiers have meaning
Dr Oon believes this is a
“strength”. However it
contravenes terminology
desiderata standards. The result
is that medical thinking and
terms are frozen.
Concepts could be identified by
numbers. They need not be seen.
Docles could exist if required,
but rather than being the
identifiers they would be
abstracted by one level. They
could then be changed. These
things were done prior to loading
into the poly-browser.
Relationship identifiers are limited to
a few specific punctuations
There are potentially many more
types of relationships than the
punctuation characters that are
available.
Create a file of “relationship
types” and give each relationship
type a unique numeric identifier.
Authoring software would be
required (or need to be updated)
to manage the additional table of
relationships. These things were
done prior to loading into the
poly-browser.
Relationships are a mixture of
semantic types and “statuses”
Unusual relationships are
developed
Convert the unusual
relationships to those that are
more conventional. This was
done when preparing Docle to be
loaded into the poly-browser.
Hierarchies are few
There will be problems with data
aggregation, decision support,
reporting and research
Fund the rebuilding and
expansion of the hierarchies.
Don Walker (April 2004)
38
The Docle System Report
Weaknesses:
Consequence:
Possible solution:
Hierarchies allow children to result
from the introduction of additional
attributes. They are therefore
designed for navigation and data
aggregation
Navigation hierarchies can cause
problems for the interchange
ability between pre- and postcoordinated concepts and for
decision support and data
matching rules that refer to “a
concept and all its descendents”.
Rebuild the hierarchies using
appropriate rules.
Cyclic relationships occurred
Cyclic relationships are illegal
and illogical. They lead to cyclic
endless loops in software.
Improve the authoring software.
9 Concluding remarks
The Docle System is a small unconventional system that is widely used by general
practitioners. It is the creation of one person who alone continues to support it. Its
contents have been created by GPs for GPs and so are likely to be adequate and
appropriate for that domain.
The system is one of the very few to be created from atomic concepts. It therefore
lends itself to decision support, flexible data collection and research.
It has many weaknesses; all but one could be overcome. Being a non-standard system,
from an international perspective, may not be solved.
10 Creator’s Comments
The creator of the DOCLE System, Dr Kuang Oon, was invited to comment on this
report. His comments were as follows:
Paragraph 4.3 titled “Primary Key - Secondary Key” on page 10
Regarding issue of the 101 erroneous primary keys - this has been subsequently determined to be
spurious and attributed to a faulty export from the Docle database. A subsequent file release from
Docle systems had the problems eliminated. Kindly amend polybrowser to reflect the latest Docle
release I sent you.
In the above faulty examples, the spurious tertiary keys are allocated primary key status they do not
deserve. The HCN -MD Docle files do not have these 101 erroneous codes. The corrected primarysecondary Docle file release I sent you reflects this aspect. Kindly amend table above/add new table to
reflect the latest and corrected Docle release I sent you.
Peter McIsaac, in his letter (2003), has indicated that this is no problem, and this episode has previously
evoked some correspondence amongst us. As this is a serious matter that has arisen from a careless
"clerical error", I hope you will elect not to make too much an issue of it. Besides incorporation of
corrected data will only enhance use of the polybrowser.
Paragraph 8.2 titled “Content” on page 30
The word bizarre is pejorative ...perhaps better stated as "innovative" or at worst "idiosyncratic" which
"on initial inspection evokes mental resistance".
Don Walker (April 2004)
39
The Docle System Report
Complexity: Because the system is a poly-hierarchy with semantic interrelationships it is more
complex than most. The embodiments of the codes are human readable and in that sense it is an easier
to use coding system from the programmers viewpoint.
Paragraph 8.6 titled “Ease of use and functionality:” on page 36
(a) Decision support: The ease of use and functionality of the system for builders of decision support
subsystems are a function of
• Content: The required concepts must be in the system. This depends on system support. The
system seems well supported.
• Relationships: Compound concepts need to be related to their atomic components. The system
is one of the very few that have this feature.
• Hierarchies: The Docle System hierarchies are very deficient. Those that exist are designed for
navigation or data aggregation. However decision support often refers to a concept and all its
descendents. This can be problematic if descendents have been created by the inclusion of
additional attributes,
The system rates well except for its hierarchies….
Docle is committed to populate every Docle concept in a Linnean framework with Object medica as the
root object and with levels of phylum, class, order, family, genus and species. It is a work in progress.
Paragraph 9 titled “Concluding remarks” on page 39
If the spirit moves...please append at the end of conclusion: Medical coding is undergoing rapid
evolution.
For decision support, detailed and specific information about the patient is required. For example we
may need to code for a patient with rheumatoid arthritis treated with gold injections and need to
retrieve this information outside the medical record context.
For this type of coding - we will have combinatorial explosion using static reference coding by
multiplying the number of syndromes versus the number of treatment modalities.
We also need to code for medical heuristics. The solution to this problem is a compositional or
propositional coding system. This is where Docle scales particularly well to a health language. Let's
hope for progress in 2004 and best wishes to you.
Kuang Oon
Don Walker (April 2004)
40