NO 1, 1995 - Ericssonhistory.com

Transcription

NO 1 , 1995
APZ 21220 - The New High-end Processor
Measuring Quality of Service in Public Telecommunications Network
AXE 10 System Processing Capacity
Using Predictions to Improve Software Reliability
Test Marketing of Mobile Intelligent Network Services
AXE 10 Dependability
CONTENTS
No. 1 1995 • Vol.72
APZ 21220 - The New High-end
Processor
Cover: To meet the great demand for
processing capacity in telecommunications
systems, Ericsson manufactures
proprietery ASICs in VLSI technology. The
photo shows one of the ASICs for the latest
version of the AXE 10 central processor
PHOTOS BY
Labe Allwin
Karl-Evert Eklund
Nina Reistad
5
Measuring Quality of Service in Public
Telecommunications Networks
13
22
Using Predictions to Improve Software
Reliability
30
Test Marketing of Mobile Intelligent
Network Services
36
42
Malcolm Brow
Olav Hammero
CONTENTS
Previous issues 1994
No.l
No. 2
A New Standard for North American
Digital Cellular
RBS 884 A New Generation Radio
Base Station for the American
Standard
A 10 Gbit/s Demonstrator
A Prototype Demonstrating User
Mobility and Flexible Service
Profiles
New-generation True Pocket Phones
The exchange Manager - An
Operation & Maintenance System
for the Switched Network
Operations Support System for
CME20
Implementation of UPT - Universal
Personal Telecommunication
An Integrated TMN Solution - Eripax
and TMOS
DCT 1800 - A DECT Solution for
Radio Access Application
Fibre to the Home Field Trial in
Ballerup, Denmark
No. 3
An Information-Based Approach to
Engineering Telecommunication
Networks
Integrated Photonics for Optical
Networks
Ericsson's Turnkey System for
GSM 900 and CDCS 1800
Networks - CME 20
An Optical Transport Network Layer
- Concept and Demonstrator
In-House Plant for Submicron
Intelligent Network Architecture in
the Japanese Digital Cellular
Standard - PDC
No. 4
ATM Traffic Management at the
Initial Deployment of B-ISDN
Re-defining Management Systems The TMOS Architecture Evolution
Trends in Wide Area Paging
Network Traffic Management (NTM)
Using AXE and TMOS Systems
MOMS - An Operations System for
MINI-LINK
Ericsson Review © Telefonaktiebolaget L M Ericsson • Stockholm 1995 • Publisher Hakan Jansson .Editor Editorial Board
•Editorial staff Eva Karlstein 'Address Telefonaktiebolaget L M Ericsson S-126 25 Stockholm, Sweden
• Fax +46 8 681 2710 • Published in English and Spanish with four issues per year.
Ericsson Review No. 1 , 1995
3
CONTRIBUTORS
in this issue
Terje Egeland, Technical Coordinator and Chief Designer at Ericsson Telecom's core unit Basic Systems. He received an MSc in
Applied Physics from the Royal Institute of Technology, in Stockholm, 1981.
Ragnar Huslende, Product Manager at Ericsson AS' Network Management Systems department (in Olso); as a senior engineer
engaged in QoS measurements, and object-oriented modelling for
the Telecommunications Management Network (TMN). He also participates in standardisation work in ETSI and ITU. Ragnar Huslende
received an MSc in 1974 and his doctor's degree in 1983, from
the Telecommunications Engineering Department of the Norwegian Institute of Technology. In 1978 and 1979 he was a visiting
scholar at the University of California, Los Angeles.
Leif Hakansson, Product Manager at Ericsson Telecom's core unit
Basic systems; responsible for the AXE 10 control system (regional and central processors). He has been appointed Senior Expert
in control system capacity. In 1970, Leif Hakansson received his
MSc in Electrical Engineering from the Royal Institute of Technology, Stockholm.
Bjorn Kihlblom, Section Manager and responsible for Traffic and
Systems Dimensioning at Ericsson Telecom's Systems Management department. He holds an MSc in Applied Mathematics,
awarded by the Royal Institute of Technology, Stockholm, in 1988.
Hans Lundberg, Program Manager for the APZ System at Ericsson
Telecom's Program Management department. Hans Lundberg
received his MSc in Electrical Engineering from the Royal Institute
of Technology, Stockholm, in 1983.
Camilla Nord, member of the Software Reliability Team and editor of the Software Reliability network newsletter at Ericsson
Telecom's core unit Basic Systems. She has been engaged in research on methods for fault prediction in SW and in the definition,
verification and prediction of SW reliability. In 1993, Camilla Nord
received an MSc in Industrial Engineering and Management from
the Linkoping Institute of Technology.
Ojvind Johansson, Systems Engineer, member of the Software
Reliability Team at Ericsson Telecom's core unit Basic Systems.
He has been engaged in research on methods for fault prediction
in SW and on software reliability, as related to release criteria.
Ojvind Johansson holds an MSc in Engineering Physics, awarded
by the Royal Institute of Technology, Stockholm, in 1993.
Rima Qureshi is Technical Project Manager at the Systems Design
department of Ericsson Research Canada. She holds Bachelor's
degrees in Management and Computer Science and has completed 75% of the courses required for an MSc in Business Administration.
Stephen Crombie, Marketing Manager at the Cellular Systems
department of Ericsson Communications Ltd, New Zealand, is
responsible for the marketing and sales of cellular infrastructure
and services to Telecom Mobile. Stephen Crombie holds an MSc
in Technology Management, awarded by the University of Sussex,
England.
Tina Sutton, Senior Product Manager at Telecom Mobile Communications Ltd, New Zealand, is responsible for the development,
launch and management of value-added services in Telecom
Mobile's cellular network. Tina Sutton has an MA from Massey
University in New Zealand and an MA in Library and Information
Science from the University of Hawaii.
4
Karl-Axel Englund, Senior Specialist in dependability engineering
work at Ericsson Telecom's Network and Systems Characteristics
department, currently engaged in reliability and maintainability
analysis work in the APZ 212 20 project. In 1969, Karl-Axel Englund graduated from Eskilstuna Upper Secondary Technical
School, specialising in control systems engineering. Additional
courses include mathematical statistics and reliability engineering.
Ericsson Review No. 1, 1995
APZ 21220 - The New High-end Processor
for AXE 10
Terje Egeland
The demand for processing capacity in telephone systems doubles every
four years, due to increased use of existing services, new service offerings
and a rising demand for operations support. In AXE 1 0 , this demand is met
by introducing dedicated regional processors, optimising compiler and
instruction primitives, enhancing software and increasing capacity in the
central computer system.
The author describes how this is achieved in the latest version of the central processor in AXE 1 0 , through new and faster technology, dedicated
hardware logic, and enhanced architectural features.
APZ 212 20 is the latest AXE 10 control
system in the high-performance APZ 212
series. It includes a new central processor
(CP) with substantially increased capacity.
The previous CP generations for APZ 212
were released in 1985 (APZ 212 02), and
in 1990 (APZ 212 10).
The low and medium capacity ranges for
AXE 10 are served by the APZ 2 1 1 control
system. From an application point of view,
all variants of the APZ control system are
highly compatible. The only fundamental
Rg.l
The IPU board with the ASIC's CPS and UMC plus
the PSCM and CM-E memories
difference is in the implementation of the
central processor subsystem. Thus, both
APZ 212 1 1 and APZ 2 1 1 1 1 can be
replaced by APZ 212 20.
APZ consists of a number of different processors, such as the regional processor,
RP, RPD, EMRPD, I/O processors, and the
central processor, CP. The central processor is part of a subsystem, CPS, which
forms part of the APZ control system. The
main focus of this article is on the CP hardware. A general description of the APZ
system was given in Ericsson Review,
No. 3, 1990.
OBJECTIVE
The objective of the new CP was to
increase the capacity of AXE 10 to meet
the expanding needs of the market, primarily caused by rapidly increasing
demands for new and enhanced services.
The improvement, it was deemed, must at
least quadruple the capacity of APZ 212 10.
Another requirement was full backward
compatibility with other APZ versions,
thereby allowing existing program code to
run without having to recompile it. The
changeover to the new APZ processor
must also be possible with a minimum of
disturbance to the traffic, and without
taking the system out of operation. As in
previous versions of APZ, it must also be
possible to upgrade installed software
while the APZ is in full operation.
Originally, the central processor, the chosen instruction set and the language PLEX - in AXE 10 were designed together,
to provide telecom applications with
Fig. 2
The central processing system consists of two
identical processors CP-A and CP-B, each with its
own instruction processor unit, IPU, signal
processor unit, SPU. regional processor bus
handler, RPH, program store, PS, and data store,
DRS. The system's parallel-synchronous operation is
supervised by the maintenance unit, MAU
maximum processor performance. Since
then, the language, the compiler, the
instruction set and the hardware have
been developed further, with an aim to
optimise performance. The use of a
language, hardware and instruction set
that have been optimised to work together
yields superior capacity, compared with
general-purpose computer systems.
Because Ericsson and its customers have
made large investments in AXE 10
application software, they want to be able
to run existing software on new
processors.
CONCEPT
The methods used to increase capacity
are:
- to take advantage of faster components
- to achieve higher speed through integration
- to provide more extensive HW support
- to increase clock frequency.
One advantage of a proprietary processor
is that it can be customised to perform
specific tasks. In such cases it is necessary to know exactly what is executed in
the machine. To determine this, the executed instructions in different AXE 10
applications were recorded in exactly the
same order and frequency as they appear
in calls.
A model of the architecture was then built,
using a simulation language. In this model it was possible to change parameters,
such as queue length, cache size, memory access principles, etc.
By applying instructions from the recorded calls to the model, it was possible to
optimise the processor architecture and
provide necessary hardware support to
achieve the highest possible capacity for
switching applications.
ARCHITECTURE
The new architecture is based on proven
features from APZ 212 10 and APZ 2 1 1 ,
to which some new concepts have been
added. At the top level, APZ is made up of
two equivalent CP sides supervised and
controlled by a maintenance processor,
MAU, Fig. 2. The structure of each CP side
in APZ 212 is built up by three different
modules: the regional processor bus handler, RPH, the signal processor unit, SPU,
and the instruction processor unit, IPU.
6
Fig. 3
Hardware structure of the IPU and the SPU in
APZ 212 2 0
ACC
Address calculation circuit
ALU
Arithmetic and logic unit
BAH
BAS address handler
BAS
Base address store
CMAI
CPC maintenance interface
CM-E
Control memory - external
CM-I
Control memory- internal
C0NM
Control memory
CPC
Central processor circuit
DRS
Data and reference store
EXDB
External data bus
IPI
IPU interface
IPU
Instruction processor unit
JAM
Jump address memory
JBU
Job buffer unit
LMU
Load and measurement unit
MAI
Maintenance interface
MAU
Maintenance unit
MCC
Memory control circuit
MCU
Microprogram control unit
OPAB
Operand A bus
0PBB
Operand B bus
PCU
Priority control unit
PS
Program store
PSCM
Program store cache memory
PSH
Program store handler
RESB
Result bus
RPB
Regional processor bus
RPH
Regional processor handler
RPHI
RPH interface
SPU
Signal processor unit
TCU
Timer and counter unit
TRU
Trace unit
UMB
Update and match bus
UMU
Update and match unit
VDSH
Variable and data store handler
One RPH is used for each regional processor bus, instead of a single handler for all
buses, as in APZ 212 10. The RPHs are
connected to the SPU by an RPH bus,
RPHB. Today, the maximum number of
RPHBs in APZ 2 1 0 1 2 is thirty-two, but this
number can be increased and the associated hardware expanded if the need
arises.
The SPU transmits and receives signals
from the RPHs, analyses and assigns priority to incoming signals and prepares
them for execution in the IPU.
In the IPU, the basic structure used in
APZ 212 10 is retained, with two operand
buses (OPAB and OPBB) and one result
bus (RESB); a fourth bus, EXDB, is added, Fig. 3. Each bus carries 32 bits plus
parity. The EXDB connects a number of different ASICs, while OPAB, OPBB and RESB
are internal buses contained within an
ASIC. These buses are visible when data
is being processed in the machine. However, to obtain an efficient flow of instructions, there are also separate address and
data buses to the dedicated program
store, PS, and two similar buses to the
data and reference store, DRS.
The IPU is built up of eleven different logical modules. Seven of them are placed in
one ASIC, called CPC. Nine different memories are used to ensure high execution
speed.
7
Fig. 4
Instruction and data flow for an SCC Instruction
(substract character from register) (left), and an RS
Instruction (read from store) (right)
ALU
BAH
BAS
CMI
DRS
PSCM
PSH
RMU
MCU
VDSH
Arithmetic unit
Base address handler
Base address store
Central memory - internal
Data and reference store
Program store cache memory
Program store handler
Register memory unit
Microprogram control unit
Another way to describe the structure is
to sketch the flow of an instruction as it
is processed in the machine, Fig. 4. The
flow of a simple instruction, like SCC (subtract character constant from register),
starts by calculating the address of the
program memory (PSCM or PS). This step
takes one cycle, followed by one cycle to
access the PSCM; the instruction is then
prepared and decoded in the program
store handler, PSH, which requires two
cycles. In the actual execution cycle, a
read is made in the register memory, RM;
the constant is substracted in the arithmetic unit, ALU, and data is written back
to the RM, all in one cycle. In this case,
the instruction flow takes five cycles. However, due to the prefetch mechanism
described below, and the pipelining of the
flow, the effective load to execute the SCC
instruction lasts only one cycle.
Box A
Characteristics of CP hardware
Call handling capacity is four times that of
APZ 212 10, eight times that of APZ 2 1 1 1 0
Data store
Program store
Max number of RPs
8
max 4 Gword of 16 bit
max 256 Mword of 16 bit
256,000
An efficient pipeline becomes even more
important for instructions that require variable access, like RS (read from store).
Cycles one to three, forthis type of instructions, are identical with those just
described. In cycle four, the address of the
base address store, BAS, is calculated in
the base address handler, BAH. In cycle
five, access is gained to BAS. In cycle six,
the address of the data and reference
store, DRS, is calculated. Access to DRS
is then gained in cycles seven to nine;
error correction and variable extraction are
executed in cycle ten, and, finally, the data
is moved to the register memory, RM, in
cycle eleven. If the prefetch mechanism
has filled the pipeline, only cycle eleven
will load the system.
For the prefetch mechanism to be efficient, all logic and bus resources must be
independent of each other. In APZ 210 20,
the pipelining permits up to eleven different instructions at a time.
Description of hardware
As shown in Fig. 2, each CP side is divided into three hardware blocks: RPH, SPU
and IPU. Each of these blocks can be
described as an independent processor.
The RPH, which sends and receives signals on the RPB and to and from the SPU,
has its program coded in hardware. The
SPU analyses and prepares incoming signals, and assigns priority to these signals
before they are sent to the IPU. The task
to be performed by the SPU is more comEricsson Review No. 1 , 1995
BTQ
Fig. 5
Logical description of the instruction handler in the
program store handler. Data from the program store
is sorted in the input buffers, decoded in the
instruction decoder and then stored in the SQ
(sequential queue) or BTQ (branch target queue),
depending on which order is given by the queue
manager
plex, and it is therefore controlled by a
microprogram in a random access memory. The purpose of the RPH and SPU is to
decrease the load on the IPU, which executes the application software. The IPU is
thus the bottleneck in the processing
system, which explains why the greatest
emphasis in the design of APZ 210 20 was
placed on that hardware block.
Prefetch of variables
The data store, DRS, must be capable of
storing a large amount of data. Due to the
size and cost of the memory, DRAMs are
used. These are not as fast as other logic,
which makes DRS relatively slow. To
reduce the effect of a slow data store,
access to the DRS must be gained early,
by fetching the instructions from the
program store, PS, well in advance of
execution.
Instructions are read from the program
store, PS, continuously and more or less
independently of how fast they are
executed and stored in a FIFO queue. If an
instruction is fetched from the PS before
it is needed, it will be stored in a sixposition FIFO queue, called the sequential
queue, SQ, Fig. 5. The instruction is
decoded before it is stored in the SQ. If it
uses variables, the names of these
variables (the a-parameter) are extracted
and sent to BAH for addressing base
address table BAT in BAS. The information
from BAT is then loaded into a new queue,
called the BAS data queue, BDQ, Fig. 6.
The BDQ is explained in greater detail in
the section 'Pipeline interrupts'.
The BDQ is located in the address
calculating circuit, ACC, which calculates
the real address in the DRS, together with
Fig. 6
Data from BAS is temporarily stored in the BAS
data queue, BDQ. The address is calculated
together with the Index register slave, IRS, and the
pointer register slave, PRS.
Variable information to be used in the variable and
data store handler, VDSH, is stored in the variable
control queue, VCQ. The address itself is stored in
the variable address queue, VAQ; at the same time
it is decoded from a logical into a physical address.
A check is made to ensure that there is no collision
with ongoing accesses, whose addresses are stored
in the DRS access control queue, DACQ.
DPBAC and DTYPR contain the number of boards
and type of memory device that are used in DRS.
The addresses in VAC are used to check whether
there has been an earlier write access to the same
address. If so, the new access will not start until
the writing has been performed.
The MCU access queue, MACQ, is used to execute
write instructions efficiently and to support
accesses ordered by the MIP through the DRS read
address register, DRSRAR, and the DRS write
address register, DRSWAR
BDQ
BAS data queue
DACQ
DRS access control queue
DPBAC
DRS PBA Counter (number of PBAs in
DRS)
DRSRAR DRS read address register
DRSWAR DRS write address register
DTYPR
DRS type register (type of memory device
in DRS)
IRS
Index register slave
PRS
Pointer register slave
MACQ
MCU access queue
VAQVAQ Variable access queue
VCQ
Variable control queue
VDSH
9
an index and a pointer. This calculated
address is stored in the variable address
queue, VAQ, until previously ordered
memory access or refresh instructions
have been executed. Data retrieved from
the DRS is checked and corrected for
errors and then stored in the variable read
data queue, VARRDQ, in the variable and
data store handler, VDSH, Fig. 7. Since all
prefetched instructions may relate to
variables, the VARRDQ has the same
number of positions as the sequential
queue, SQ. Data in the DRS has a length
of 32 bits, and the variables range from
one to 128 bits. Special hardware in VDSH
is used to extract variables that are
shorter than 32 bits.
The variable is now ready for use when the
instruction is to be executed. The most
common instruction for variables is read
from store, RS, which moves the variable
to the register memory in the register
memory unit, RMU.
Fig. 7
Data from the DRS is first checked for errors in the
ECC and then corrected, if needed. If it is a variable
that has been requested, the data is stored in the
variable read data queue, VARRDQ. The variable is
then extracted from the data by control of signals
from the variable control queue, VCQ. Accesses
ordered by microprograms are stored in the
DRSRDQ, if they arrive before they are needed
DRSRDQ
ECC
DRS Read Data Queue
Error Correction and Check
For a write instruction, the complete word
is first read as described above. The
variable, if less than 32 bits in length, is
then inserted into the data word and
written back to the DRS. Thus, the steps
are the same as for a read instruction.
To execute a write instruction efficiently,
the address resulting from the address
calculation, Fig. 6, is also saved in the
MCU access queue, MACQ, for use when
the instruction is executed. After
execution, the instruction is written in the
background, to the DRS.
Pipeline interrupts
Great efforts have been made to handle
instructions for variables as effectively as
possible. When the pipeline is working,
only one cycle is used to execute a normal instruction, but building up the pipeline requires many cycles. Depending on
what interrupts the pipeline - internal
jump, jump to another block, pointer or
index changes - the number of cycles
required to restore the pipeline will vary.
If the pointer or index is changed after the
address has been calculated, a new
address must be calculated before the
instruction can be executed - but only if
any of the instructions in the queue use
pointer or index. Data for calculating the
address is fetched from the BDQ, Fig. 6.
The separate BAS data queue (BDQ)
makes it possible to recalculate the
10
address without having to regain access
to the BAT.
Jump support
The pipeline is interrupted by jumps. To
minimise the effects of local unconditional jumps, these jumps are exclusively handled by the PSH and transparent to the
rest of the system.
For a conditional jump loaded in the SQ it
is impossible to know where the execution
will continue. In this case, the first instruction (in the branch at which the jump is
targeted) is read from the PS and stored
in the branch target queue, BTQ, Fig. 5.
When this is done, the program store handler, PSH, continues to fill the SQ with
instructions. An instruction stored in the
BTQ is decoded in the same way as an
instruction retrieved from the SQ, but all
actions are not performed to completion.
If the instruction in the BTQ relates to a
variable, access is gained to the BAT, but
the address to the DRS is not calculated.
If the instruction specifies another jump,
or if there are more conditional jumps in
the SQ, then no action is taken.
When a conditional jump instruction is executed, the execution may continue with
either the instructions in the SQ o r - if the
jump is effected - in the BTQ. If the jump
is not effected, and there are more conditional jump instructions in the SQ, the target instruction for the next conditional
jump is read and stored in the BTQ. An
effected jump can save three to five
cycles, compared with a situation without
the BTQ.
Interleaving in DRAM memories
Prefetch logic is used to compensate for
the relatively slow operation of the DRAM
memories in the DRS, compared with the
logic. Normal memory access to the DRS
takes five cycles: one cycle to calculate
and transport the address to the memory, three cycles for the actual access, and
one cycle to transport, check for errors,
and to correct read-out data. This means
that one memory access can be made every sixth cycle.
To speed up access handling, the memory is split into banks which can be
accessed independently. Each memory
board contains eight banks. If consecutive
accesses address different banks,
access can be gained to as many as five
Fig. 8
The CPC ASIC with the memories JAM, CM-I and
RM plus 50k gates to implement the modules TRU,
CMAI, RMU, ALU, PSH, MCU and BAA
memory banks at the same time, and new
data can be delivered every cycle. The risk
of collision diminishes with the number of
banks available. The maximum number of
boards in the DRS is six, which makes 48
independently addressable banks. To
ensure that this interleaving capability is
efficiently used, an address scrambler is
employed to distribute the accesses evenly among the banks.
Cache for Program Store
Studies have shown that the average
number of instructions in a call-handling
code sequence without jumps is eight.
This means that the five cycles needed for
the first access is a very longtime. To build
up the prefetch queue, instructions from
the PS must come in faster than they are
executed. This is solved by using static
random
access
memory
(SRAM)
components for a cache memory for the
capacity-critical blocks. SRAM is smaller
and faster than DRAM; having an access
time of 10 ns, compared with DRAM's
60 ns. SRAM is located on the same board
as the PSH, which means that the time
required for complete access is reduced
to one cycle instead of five.
The program store cache, PSC, is located
in a memory called PSCM, Fig. 3. The
PSCM also holds tables used for signalsending instructions.
The PSC holds the complete program for
capacity-critical blocks. The content of the
PSC is updated once every day. To decide
which block should be held in the PSC, the
time each block is used is measured.
These measurements are made only at
traffic level and under the most loaded 3hour period every day. Both the PS and the
PSC are 32 bits wide. The instruction
length in APZ varies from 16 to 64 bits,
but the most common instructions are 16
or 32 bits long. This means that, on the
average, more than one instruction is read
from the PS each cycle. However, because
some instructions take more than one
cycle to execute, and because of the
waiting time that arises whenever the
pipeline is interrupted, it is possible to fill
the prefetch queues.
Cache for Base Address Table
Address calculation is part of the activities
associated with the prefetch of variables.
This calculation must be made in parallel
with the execution of other instructions.
The information needed to calculate the
address of a variable is stored in the base
address table, BAT. To do this in parallel,
information from the BAT, on the most frequently used variables, is placed in a
cache memory called BAS (base address
store). This memory is dimensioned to
handle information on all variables, even
for very large applications.
Support for signal-sending instructions
The signal-sending instructions are optimised for the PLEX language and supported by hardware to be executed as efficiently as possible. They are complex and take
more than one cycle to execute.
Box B
Component technology
ACC
CPC
MCC
UMC
SPC
RAC
0.7
0.7
0.8
0.7
0.7
1.0
Memories:
DRAM
SRAM
SRAM
4M * 4,
256k * 4,
32k * 8,
mfi GaAs,
m j i BiCMOS,
my. BiCMOS,
mp. BiCMOS.
my. BiCMOS,
m i l CMOS,
90,000 random gates
50,000 random gates,
4,000 random gates
20,000 random gates
96k bit SRAM
3.5k bit SRAM
30k bit SRAM
60 ns access time, used for PS and DRS
10 ns access time, used for BAS and PSCM
8 ns access time, used for CONM and CM-E
When a signal-sending instruction (in this
example SSN) is detected by the decoder, Fig. 5, and targeted at the SQ, special
hardware masks out the signal-sending
pointer, SSP, from the instruction. The
SSP is used together with the program
start address, PSA, to calculate the
address of the global signal number, GSN,
in the signal-sending table, SST. The GSN
is read, to be used on order by the microprogram. All this preparation is made by
the hardware before the instruction is executed.
When the instruction is executed, special
hardware works in parallel with the microprogram, MIP, to speed up execution.
When ordered by the MIP, this hardware
11
uses the GSN to calculate the address and
to access the global signal distribution
table, GSDT; it reads the four 32-bit words
from the reference table, RT, accesses the
local signal distribution table, SDT, calculates the address in the new block where
execution should begin, and starts prefetching instructions.
To reduce execution time as much as possible, all accesses to tables must be fast.
Both the GSDT and - for each block - f o u r
words from RT are therefore placed in the
fast SRAM on the same board as the PSH.
This same memory is used for the PSC,
Fig. 3. In the PSCM a 16k32 area is
reserved for the RTC, which is sufficient
for 4k software blocks. A 128k32 area is
used for the GSDT, and the rest for the
PSC. In the first version of APZ 212 20,
the PSCM will use 256k32, which leaves
112k32 (covering eight to twelve blocks)
for the PSC. The PSCM will be upgraded
to 1M32 when the next generation of
SRAM components become available in
the market.
PHYSICAL IMPLEMENTATION
The IPU is implemented on two K3 boards
(344x178 mm) containing three proprietary ASICs, SRAM chips and buffers. The
IPU hardware block also includes the PS
(built on one K3 board) and the DRS (built
on one to six K3 boards). The PS and DRS
uses the same type of board, built with
DRAM circuits and one ASIC. Each memory board has a capacity of 64M16 words.
12
This capacity can be increased when
enhanced memory components become
available.
The SPU is implemented on one K3 board
with one ASIC, SRAM and buffer components. The SPU and the IPU are located in
the same magazine. All functions in the
RPH are implemented in one ASIC, called
RAC. Moreover, all components needed to
handle the electrical transmission and
reception of signals on the RPB are placed
on the same board (RPIRS). The board size
is K2 (222x178 mm). One magazine
accommodates eight RPIRS, power supply, and an interface board to the SPU.
Using two K3 size boards, the MAU is built
around a standard microprocessor and
programmable logic device circuits.
CONCLUSION
New technology and a new architecture
make it possible to meet the demand for
increased processing capacity. The new
central processor for AXE 10 is the result
of continuous developments on the
APZ 212 concept. By investing in the development of customised circuits, faster and
more complex functions can be implemented in smaller hardware units, optimised forthe specific task of controlling telecom exchanges. The same strategy will be
used to keep new generations of APZ CPs
compatible with those in service today,
thereby allowing existing software to be
used in the future as well.
Ericsson Review No. 1,1995
Measuring Quality of Service in Public
Telecommunications Networks
Ragnar Huslende
For many users of telecommunication facilities, the quality of service is an
important factor in their choice of service provider. To measure this quality,
Ericsson has developed a system called NEAT. It covers both fixed and cellular networks and can be applied to a wide range of services including
basic telephony, virtual private network services, intelligent network services and international network services.
The author describes the NEAT system, shows how this system can be used
to measure quality of service in various networks, and comments on some
results from the use of NEAT.
Rapidly changing technologies and the
advent of new services have led to very
complex telecommunication networks.
Many different types - and even different
generations - of equipment are interconnected and must cooperate properly in
order to carry the services to the users on
an end-to-end basis. Both transmission
and switching equipment are involved,
which means that - although the physical
size of individual hardware components
has been reduced - t h e overall functional
complexity of a telecommunications network is increasing. This trend, reflected in
large software systems, sophisticated signalling procedures and various specialised service nodes, makes it extremely dif-
ficult to predict the quality of service (QoS)
by analytical methods. The best possible
measuring principle must therefore be
chosen.
Keeping in mind ITU's definition of quality
of service in Recommendation E.800:3
"The collective effect of service performances which determine the degree of
satisfaction of a user of the service",
it seems that at least three basic requirements should be fulfilled:
- Quality of service is defined from the
user's point of view and, therefore,
Fig. 1
A variety of reports on Quality-of Service parameters
and the different fault types are available to the
NEAT user
13
Box A
Abbreviations
ISDN
NCU
NEAT
NTC
NTU
PDH
PSTN
SDH
QoS
Integrated services digital network
NEAT communication unit
Network evaluation and test system
NEAT test centre
NEAT test unit
Plesiochronous digital hierarchy
Public switched telephone network
Synchronous digital hierarchy
Quality of service
should be measured accordingly. This
means that end-to-end measurements
will give the most accurate results
- According to a recognised principle in
the field of quality assurance, quality of
service should be measured by means
of equipment that is independent of the
traffic-carrying elements of the network
- The method should be universal in the
sense that it must produce comparable
results for all parts and regions of a network.
All these requirements can be fulfilled if a
system for automatic generation of test
traffic is used - a system that complies
with the principles recommended by ITU
in Rec. E.434.2. Typically, such a system
consists of one test centre and a number
of small test units as shown in Fig. 2. The
test units communicate via ordinary customer access interfaces; for example,
subscriber lines in the local exchanges or
the air interface of the radio base stations
in cellular networks.
Fig. 2
Scenario for QoS measurements by means of automatic generation of test calls
The test centre is in charge of the preparation and scheduling of tests. Post-processing and presentation and distribution
of test results are also carried out by the
test centre. The actual measurements
and observations are made during test
calls between the test units.
This approach to measuring quality of service has several important and unique
advantages. The system can measure
end-to-end transmission quality at the service level. The measured end-to-end connection may include a number of different
transit nodes and both PDH and SDH sections in the transmission network. During
test calls, the receiving level of defined
test tones, signal/noise ratio, idle channel noise, bit error ratio, etc. can be measured. A system based on test calls can
also measure the initial network response
delay (e.g. dial tone delay) quite accurately, and provides independent, regular monitoring of the call metering equipment.
Unlike a live subscriber, a test unit will
never be busy or absent when called: it
will answer by returning its unique identity
code. Thus, it can be positively verified
that the correct B-number has been
reached and that the call has been successful. Conversely, an unsuccessful call
will always be due to a network problem;
never to subscriber behaviour.
14
The system used to generate test calls
can serve as an independent, "neutral"
observer with overall responsibilityfor QoS
measurements in the network. The subscriber access interfaces represent a
stable standard valid throughout the network for long periods. Thus, one type of
system can cover the entire network
although the switching and transmission
equipment may differ from one region to
another, and although technological
changes may occur in the network from
time to time. The user of the system can
define a standard set of parameters and
report formats for service statistics and
fault reports. Measurements from various
parts of the network are immediately comparable, and quality trends can be
observed in an objective manner as the
network evolves.
The test units are connected to the network like ordinary telephone sets, which
makes installation very flexible and
straightforward. Since the measurements
are made from a subscriber's point of
view, the results can be intuitively well
understood, even by personnel without indepth technical knowledge of the network.
The statistical material supplied must be
comprehensive enough to reflect the various quality parameters with sufficient
confidence. Detailed mathematical elaboration and discussion of this kind of
material was presented at the 5th Nordic
Teletraffic Seminar.6 The results of the
discussions indicate that 500-1000 test
calls per traffic route will identify, with statistical significance, those parts of the network which may cause quality problems.
It is also shown that the higher the probability of faults (or lost calls), the fewer
test calls are required. This is a very
favourable effect. It means that serious
bottle-necks or trouble spots in the network, requiring quick corrective actions,
are detected after short periods of testing.
NEAT - FOR MEASURING QUALITY OF
SERVICE
Ericsson is marketing a modern implementation of a test call generating system
called NEAT. The system, which works as
illustrated in Fig. 2, can perform both routine measurements and on-demand measurements. A possible configuration of a
NEAT test centre, NTC, is shown in Fig. 3.
The NTC application has been designed to
be largely independent of the computer
platform. The current version runs on a
UNIX server supplied by Sun Microsystems with a Sybase relational database management system, but alternative
platforms can be used. The user interface
is based on Open Look or Motif. Standard
functions for window handling (move,
resize, iconise, quit, etc.), pull-down
menus and drag-and-drop functions are
used to create a user-friendly environment
for the NEAT operator.
Fig. 3
Example of configuration of a NEAT Test Centre
NTC is a multi-user system; various user
categories with different authorisation levels can be defined. It is equipped with one
or more communication units, NCUs, with
dial-up modems to communicate with test
units, the NTUs. The NTC application is
coded in C++ using recognised methods
for object-oriented design.
According to the ITU-T Recommendation
E.434, a test unit is a combined transponder/responder. The NEAT test units
are
very
compact
microprocessor
systems, in which advanced digital signal
processing techniques have been used to
implement the measurement functions.
Each NTU can be individually configured
with the capacity needed at the actual site.
Some key figures, specifying NEAT capacity, are given in Table 1 .
Configuring the system
Various data must be entered into the NTC
before tests are run. These data include
a network model with groups/subgroups
of exchanges, NTU telephone numbers,
definition of signalling tones, tariff zones
and rates, scaling factor for the test traffic, defined test types, etc.
Table 1
Number of lines for
test calls per NTU
Number of simultaneous
test calls per NTU
Max. number of NTUs per NTC
2-48
2-8
1000
Different types of tests can be defined:
- Quality tests. These tests produce QoS
and fault statistics for the various
groups/subgroups of exchanges.
- Metering tests. Test calls are generated within and between the various tariff
zones. Various schemes for generating
call metering pulses on the different
wires of the subscriber interface can be
checked. Multi-metering checks may
cover all the actual tariff zones, metering rates and switch-over times.
- Toll billing tests. The expected charge
for all calls generated from a given
A-number is accumulated. This accumulated sum can then be compared with
the amount actually specified on the bill
as produced by the service provider.
Thus, the entire billingchain can be monitored, including both the metering
system in the exchanges and the various post-processing systems involved
in producing the bills that are sent to the
subscribers.
- Fault trace tests and on-demand tests.
Used for special
investigations,
typically initiated when the above tests
have revealed certain network problems that require more detailed
follow-up.
When these basic configuration data have
been entered, the user of the system may
compose a test schedule that exactly
matches the needs for QoS measurements in the actual network. He can
choose among the defined test types,
define start time and duration, and schedule tests for weekly repetition, if desired.
Performing tests
Once the test schedule has been entered,
all the remaining tasks that are required
to execute the tests and report the results
are performed automatically, on time, by
the NEAT system. Each individual test is
carried out as an independent test
sequence.
During test execution, all NTUs work in parallel, synchronised by time slots. The algorithm for call-pattern generation is
designed to randomise test traffic to/from
each group. Weight factors per exchange
are used to create the desired test-traffic
profile. For instance, one objective may be
to generate test traffic with a distribution
similar to real subscriber traffic.
The algorithm for call pattern generation
is also designed to avoid call collision.
This ensures that no more than one test
call is generated to any one test number
in any time slot, which means that the
"busy B-number" case is avoided.
In the signalling phase of a test call, frequency, cadence and signal level of the
various signalling tones are checked
against specified tolerances. A number of
events may be detected and reported by
various event codes; for example:
- missing, delayed, illegal, unexpected or
out-of-sequence signalling tones
- congestion tone
- wrong B-number reached. Detected by
missing response (transmission test
tone) from the called party or by
15
-
To comply with ITU-T Recommendation
E.434, additional measurements are
implemented, e.g. round-trip propagation
delay, clipping, echo and impulsive noise.
"Call continuity" can also be tested by
detecting short signal interruptions during
the test call.
Fig. 4a
Testing the national network
receiving an incorrect identity code from
the called NTU.
- missing or incorrectly timed metering
pulse at B-answer
- metering pulse interval differing from
the expected, nominal value
- violation of transmission quality threshold.
During a test call, values of a number of
parameters are measured and recorded:
- dial tone delay
- ringing tone (post-dialling) delay
- level of dial tone and other detected
tones
- transmission
quality
parameters,
including:
a fixed test tone and three user-selectable speechband tones
idle channel noise
signal/noise ratio
all measurements are made end-toend both by the calling NTU and the
called NTU.
- metering pulse interval.
Fig. 4b
Testing regional networks (see explanatory text In
Fig. 4a)
16
The NTUs can also be instructed to supervise network elements. If a certain number of consecutive unsuccessful calls for
specified combinations of A-numbers/Bnumbers is registered, an alarm is emitted. This is also the case if the dial tone
is missing for a certain number of consecutive call attempts. Dial-tone supervision
can be performed even in periods when
no ordinary test calls are being made. As
a background test, the NTUs may then regularly check that dial tone is detected on
all test lines. This has proved to be a safeguard in cases where the exchange has
died a "silent death", which means that it
can neither perform normal call processing nor generate an alarm about the situation.
Using the test results
All the above data are post-processed in
NTC and formatted into different reports
with various degrees of detail. Some
reports can support the daily activities of
the operations and maintenance staff.
After each test sequence, a summary
report is created in NTC showing a "snapshot" of network status. Based on this picture, the system user can ask for full
measurement details of the interesting
test calls. Special fault statistics are also
accumulated during the month, to support
problem diagnosis.
Some reports are created to support the
strategic
planning of
maintenance
resources and investments in new network equipment. These reports are based
on statistics accumulated over one or
more statistical periods. They may show
total values of various QoS parameters for
the entire network or for the individual
administrative regions. Trends in these
parameters over the past months and
years can be shown. Some examples of
reports are:
- lost calls and ringing tone delay for the
various traffic routes during busy hour
- dial tone delay for the various exchanges during busy hour
- transmission quality figures for the various routes
Ericsson Review No. 1.1995
Fig. 4c
Testing local networks (see explanatory text in
Fig. 4a)
test traffic will support QoS measurements in the regional level subnetworks.
Fig. 4c illustrates a test type for the lowest level in this example. Test traffic is
generated only between the exchanges in
each local area, and internally in each
exchange.
- undercharging/overcharging percentages for the various network groups
- various trend statistics covering the previous/current month/year.
NETWORK APPLICATIONS
Thanks to the general network access
interface used by the NTUs and the flexible grouping feature in the NTC, the
system can be applied in several different
ways. Some examples are given in the following.
National PSTN/ISDN networks
Different types of test can be defined in
NTC to test the various levels of a national telecommunications network. Nested
groups/subgroups can be defined to support testing at an arbitrary number of network levels. Typically (but not necessarily), the groups can reflect the various
geographical regions of the network.
Another possibility is that of defining
groups according to the type of network
equipment. For example, all digital
exchanges from supplier X may share one
subgroup.
International networks
NEAT test configurations have been
installed to perform end-to-end tests of
international services. Test traffic is generated through international switching
centres in the participating countries. This
is useful, for one thing because it supports
the procedures recommended by ITU in
Rec. E.424. For example, regular tests are
being performed by a dedicated NEAT
system with test units located in Denmark,
Finland, Iceland, the Netherlands, Norway
and Sweden.
Mobile networks
Mobile services represent an area of very
rapid growth. Old and new service providers are competing for customers. QoS is
a crucial aspect for many reasons, e.g.:
- radio coverage differs from network to
network, and from one region to another
within the same network
- traffic is rapidly growing
- subscriber mobility may produce unexpected effects on traffic load distribution.
NEAT is used to generate test traffic
to/from various parts of the fixed network,
as well as mobile-to-mobile traffic.
Figs. 4a, b and c can be interpreted as a
network with three main regions. Each
region is further divided into a number of
local areas served by a few exchanges.
Each exchange has a number of dedicated test lines that are connected to NTUs.
Delays, transmission quality, call metering and various signalling events on the
air interface can be recorded. The following test configuration may be used:
- one NTU per cell installed at a fixed location to give a stable reference for measurements
- a number of additional NTUs installed in
vehicles to monitor service characteristics related to vehicle movement within
a cell, handover between cells, and
roaming.
A test type for the national level of the network will only generate test calls between
the main geographical regions (Fig. 4a).
Test traffic is long-distance traffic routed
via the national level trunk network. A test
type defined for the next lower level will
only generate test calls between the subgroups within each region (Fig. 4b). The
Corporate telecommunications
networks
Many business customers, by virtue of
their size, may require guaranteed minimum figures for QoS to be stated in contracts with the service provider. The possibility for a service provider to monitor
and report on the quality of the delivered
17
Fig. 5
- Temporary supervision: A number of
transportable NEAT test units are moved
between the PABX/centrex sites. Thus,
the various corporate networks can be
tested for limited periods whenever necessary. Such tests may be performed
after major reconfigurations of a network, or triggered by customer complaints.
QoS measurements by means of NEAT in a virtual
private network
services, in order to verify that the agreed
quality levels are being maintained, will
provide a competitive advantage and
sometimes even be a necessity from a
legal point of view.
Corporate telecommunications networks
can be built in several different ways. Typically, a combination of public and private
network resources are configured to constitute a virtual private network as shown
in Fig. 5.
Both PABXs and exchanges providing centrex services may be included and
equipped with NEAT test units. QoS measurements can then be made for internal
traffic in the corporate network and fortraffic to/from the public network. Depending
on customer requirements, NEAT can be
used in various ways, e.g.:
- Continuous supervision: The test units
are permanently connected to the
PABX/centrex lines that belong to a specific customer. QoS reports, including
trend reports, are produced on a regular basis
Intelligent networks
Intelligent network, IN, services are now
offered in many countries worldwide.
Examples of IN services are: Calls charged
to the called party (often referred to as
Green number or Freephone service), routing to time/day dependent B-number, private numbering plan, etc. Quality monitoring of IN services is very important, for
several reasons, e.g.:
- complex signalling procedures during
call set-up and release. IN may be integrated into both fixed networks and
mobile networks. Complex interworking
situations may occur, possibly with
unexpected effects on quality. This
means that end-to-end measurements
are needed
- dynamic service environment (new services, distribution of service logic, etc)
- differentiated charging for the various
services
- demanding customers; an example is a
private numbering plan service used as
the basis for a corporate network,
Fig. 5.
A scenario for testing IN services is shown
in Fig. 6. NEAT test units are defined as
IN subscribers in the IN service control
point, SCP. When test calls are made
towards these IN numbers, normal call
processing takes place in the SCP. Problems such as congestion, delay, charging
faults and calls lost due to various technical faults can then be monitored by
NEAT.
STRATEGIES FOR THE USE OF TEST
RESULTS
Fig. 6
IN QoS measurement scenario
SCP
Service control point
SSP
Service switching point
18
NEAT ties in very well with current philosophies in network O&M and planning. One
interesting possibility is a managementby-objectives approach to selected QoS
issues in the network, Fig. 7. When NEAT
measurements are introduced, the user
organisation sets future goals for the quality norms and stipulates when these
norms should be fulfilled. The goals can
be broken down into the various quality
Fig. 7
NEAT supports a management-by-objectives
approach to pursuing future quality objectives
tive approach. This is a strategy based on
continuous supervision of the network's
quality parameters. As long as the specified service norms are met, no action is
taken. Only when these norms are violated, corrective measures are introduced,
Fig. 8. In other words, a certain amount of
faults and problems are allowed in the network at any time.
Controlled corrective maintenance and
management by objectives are related
approaches which both require some efficient means for QoS measurements and
problem identification. NEAT has proved
to be a valuable tool in both cases.
parameters and the various parts of the
network. Since all test traffic is generated
on an end-to-end basis, the local area network is always included. This is true even
of tests at the highest levels of the network. Therefore, it may be an advisable
strategy to start a QoS improvement programme by first focusing on the lower levels of the network. When any problems
found there have been analysed and
solved, the system user can successively go on to test the higher levels.
When an approach like this one has been
used systematically for some time, it is
likely thatthe given quality objectives have
been achieved. Then, the next challenge
is to maintain the desired minimum quality level. In the past, preventive maintenance was widely used in telecommunications networks. But as networks grew in
size and complexity, this became very
resource-demanding, and new technology
also required other methods. Today, socalled controlled corrective maintenance,
CCM, is often considered as a more attrac-
Fig. 8
NEAT supports a controlled corrective maintenance
strategy
SOME EXPERIENCES
ITU's general definition of quality of service was stated in the introduction. To support quantitative measurements in the
network, a more explicit and constrained
definition has been worked out by
Norwegian Telecom, a long-time user of
NEAT:
"The percentage of calls that - with a
defined transmission quality in both directions - within a defined time reaches the
correct B-number and is correctly
charged".
In order to evaluate the quality of service
according to this definition, a number of
parameters must be measured. Assuming
that the telecommunications services are
the service provider's end product, NEAT
tests can be compared to the final quality
control inspection in a manufacturing
plant.
Another point emphasised by the operator is the value of NEAT tests from a marketing point of view. Reduced quality problems and more efficient handling of
complaints result in a more positive public image. In fact, quality norms for the
national telephone service, as verified by
NEAT, are now published in the telephone
directories in Norway and distributed to all
subscribers.
When NEAT was introduced, it produced
some quite unexpected results. In many
cases, NEAT indicated a significantly lower
QoS than previous measurement techniques, such as manual test calls. This
difference has mainly been ascribed to the
more objective and comprehensive tests
performed by NEAT. The number of faults
19
Examples of technical problems
uncovered by NEAT.
- Oscillator fault in radio link station 1800 channels (noise)
- Ice problems on radio link antenna
- Antenna coverage jradome) fault, 140 Mbit/s
system, bit fault
- Noise on 60-groups in a 300-group
- High room temperature in a radio link station
- Fading on radio links (over lakes in special
weather conditions)
- Bad connectors in transmission equipment
- Bit faults on 2, 8 and 34 Mbit/s systems
- Faults in hybrids
- Faults in coaxial cables
- Impedance mismatch
- No voice transmitted in a time-slot in a 2 Mbit/s
system
- Coil loading (pupinisation)
- Humid and wet cables in the ground
- Bad/missing solderings
- Crosstalk (noise) caused by bad insulation in
jumper fields and distribution frames
- Attenuation/frequency distortion (highest frequency in the speech band is too low)
- Noise on some coaxial cables caused by
tiers
in a supposedly high-quality network was
surprisingly large. Many different types of
faults that it would otherwise have been
difficult (and expensive) to find, have been
corrected by detailed follow-up of NEAT
test results. Some published examples4
are shown in Box B.
Reduction of non-paying call attempts
It is commonly assumed that a certain percentage of unsuccessful calls in a network
Flg.9
Busy hour traffic loss percentages. National and
regional network level. Measured from the beginning of regular NEAT testing 2nd Quarter 1 9 8 7 to
1st Quarter 1993
20
- Missing dial tone (fault in registers or other
equipment in local exchange)
- Unknown tone (incorrect frequency or other
tones on the line)
Metering faults:
- No B-answer signal on certain lines
- Metering pulse before B-answer signal
- Too long/short metering interval
- No metering for a new area code
- No metering for certain area codes due to softrftware fault in digital exchange
Congestion and mtscellaneo
- Faults in relays, selectors and MFC equipment
- Faults in digital trunk modules
- Faults in wiring and cable connections
- No alternative route for certain area codes
- Misrouting
- Sticking (remanence) of relays and selectors
- Faults on SS#7 signalling route
- Vibration in building caused by street traffic
(interrupting calls)
- Bundles too small during busy hour
- Silicone (insulation from coils) on relay contacts
- 2 Mbit/s system out of service
- 2 Mbit/s system in loop
Wrong B-number reached:
- Misrouting in selectors
- Register faults (counting chain/code receivers)
- Faults in MFC equipment
represents losttraffic and, hence, lost revenues. If we assume that the success rate
is increased by two percentage points,
and that 20% of unsuccessful calls can be
considered as lost traffic, then the volume
of revenue-generating traffic is increased
by 0.4%. In the Norwegian network, the
percentage of successful calls has been
increased, as reported by Norwegian Telecom. 5 Some of these results are shown in
Fig. 9. In the measurement period, the
objective for the service norm has been
redefined in several steps from 5% to 2%
unsuccessful calls. Especially during the
first two years of testing, a very high
improvement rate was achieved. The quality is now being kept relatively constant,
in a controlled manner.
Increased utilisation of the network
Continuous supervision and fault detection by NEAT makes for more efficient utilisation of the network. After some time of
active use of the test results in the daily
O&M activities, the capacity available for
revenue-generating subscriber traffic can
be increased, Fig. 10; for example, by
- eliminating latent faults in the network
- removing bottle-necks by more optimal
traffic routing
Ericsson Review No. 1 , 1 9 9 5
- reducing the volume of repeated call
attempts.
Fig. 10
Utilisation of network capacity: The effect of using
test traffic measurements
Of course, this must be weighed against
the capacity required by the test traffic
itself. Experience has shown that a net
capacity gain of 1-2 % can easily be
achieved, Rg. 10. Thus, more traffic can
be handled - resulting in increased revenues without the need for immediate
investments in network equipment.
CONCLUSION
Systems that automatically generate test
calls - like NEAT - offer an effective way
of measuring quality of service in telecommunications networks. Since observa-
tions are made at the subscriber access
interface, the measurement concept is
simple and general in nature. A variety of
services in different networks (fixed and
mobile) can be monitored, including services that involve interworking between
several networks. End-to-end measurements, as recommended by ITU, imply that
the quality is measured from a
subscriber's point of view. Experience has
shown that charging complaints are
reduced and that a number of economic
benefits can be derived from the information provided by a system like NEAT.
Increased revenues are obtained by
increasing the probability of successful
service completion.
References
1. Test Calls. ITU-T Recommendation
E.424, 1992.
2. Subscriber-to-Subscriber Measurement
of Public Switched Telephone Network.
ITU-T Recommendation E.434, 1992.
3. Quality of Service and Dependability
Vocabulary. ITU-T Recommendation
E.800,1993.
4. Network failures in the Norwegian telecommunications network detected by
the 'NEAT' system. ISCC, Norwegian
Telecom, Sept. 91.
5. Loss in busy hours, trunk network, May
'87 - May '94. ISCC, Norwegian Telecom, 1994
6. Huslende, R.: Traffic Route Test
System as a Tool in Network Operations, Maintenance and Planning. 5th
Nordic Teletraffic Seminar, Trondheim,
Norway, June 5-7,1984.
21
Leif Hakansson, Bjom Kihlblom and Hans Lundberg
The world is rapidly changing into an information society based on electronic communication. Telecom networks are necessary and strategic
assets in this development, which requires telecom equipment to evolve in
order to support continuously expanding traffic, as well as new and more
complex functions.
The authors describe the driving factors behind this development, and its
influence on the continuing enhancement of the AXE 1 0 control system.
The real-time capacity of a computer in the
switching network is often measured by the
number of instructions it can execute per
second (millions of instructions per second, MIPS), or specified by the clock frequency of the processor. In truth, however,
such figures give very little information
about the processor's ability to execute different tasks. For a telecom switching
system, the number of call attempts that
can be handled per second, or per hour
(BHCA), is a more important characteristic.
Fig. 1
The increased use of specified billing and powerful
inter-exchange signalling has swelled the number of
instructions required to handle even ordinary calls.
The figure depicts the evolution in a metropolitan
exchange, during a 20-years period, for local (black)
and transit (red) calls. 1990 • 100%
Historically, the most demanding steps of
development with respect to capacity have
included the replacement of analog technology with digital, processor-controlled
functions, and the transition from decadic
and multi-frequency signalling systems to
packet mode digital signalling. At the same
time, protocols for inter-exchange communication have become much more powerful and complex, from MFC via TUP to ISUP,
together with MAP and TCAP, Box A. Similarly, the trend to change charging meth-
ods from pulse metering to specified billing (TT) and the introduction of ISDN subscribers in the public network markedly
raised the requirement for processing
capacity. The increasing number of instructions required to execute a typical call,
Fig. 1, is a measure of how complex functions and systems have become.
The amount of memory per subscriber line
or trunk line is another measurement of
complexity in telecom applications. As
Fig. 2 shows, the trend is the same in this
area, although the demand for memory is
seldom a limiting factor for system capacity, except when multiple storage of program and data is needed. The rapid growth
in memory demand is more accentuated
for transit exchanges than for local
exchanges, mainly because the introduction of ISUP trunk signalling is faster than
the introduction of ISDN subscribers.
A third measurement of a switching
system's capacity is the volume of data
transferred from one switch node to
another, such as other switching nodes,
billing centres, network databases, or
nodes for centralised operation or postprocessing of statistics. The number of
data transfers increased slowly until the
middle of the 1980s. Since then the
demand for signalling link capacity has
been steadily rising, due to the introduction of large signal transfer points, STPs,
for System 7 signalling. The demand for
higher data transfer capacity from exchanges to billing centres has also grown,
because of the increased use of detailed
billing, and because a higher rate of transfer is needed to reload the increased
amount of data when major failures have
occurred in exchanges.
Multi-purpose, open-interface protocols
have several obvious advantages, but also
create an overhead that loads the processing system substantially. The most impor-
22
tional local call between two non-ISDN
subscribers.
Fig. 2
The requirement to store information related to subscribers has increased in the same manner as the
number of instructions. The black curve represents
local exchanges; the red curve represents transit
exchanges. 1989 = 100%
tant examples are ISUP, TCAP, MAP, SCCP
and MTP, Box A.
Protocol handling requires considerable
capacity for mapping, screening, and syntax checking, as illustrated by the following three examples of powerful protocols.
Example 1. The number of instructions
required for a typical transit call that uses
an ISUP trunk is around 2.4 times as great
as for the same call via a TUP trunk.
Example 2. Atypical local call between two
subscribers in ISDN requires nearly three
times as many instructions as a conven-
BoxA
The relationship between different protocols is
shown in two examples. In both cases, the message transfer part, MTP, and the service connection control part, SCCP, form the basis of other
protocols. In communication from a subscriber in
ISDN to a mobile subscriber, the ISDN user part,
ISUP, is used at the originating side, and the
mobile application part, MAP, and the terminal control application part, TCAP, are used at the terminating side. In communication between two subscribers who use an intelligent network, IN, (for
example, a virtual private network, VPN, service)
the intelligent network application part, INAP, is
used together with the TCAP.
Example 3. A simple intelligent network
(IN) service that uses the TCAP protocol
between the service control point, SCP,
and the service switching point, SSP,
requires about 1.8 times as many instructions as when the same IN service is
implemented in a combined SSP/SCP
node, whose SSP and SCP software communicates via an internal protocol. In this
example, the execution of the protocol in both the SSP and the SCP - has been
included. The complexity of the IN protocols is also illustrated by the fact that MTP,
SCCP, TCAP and INAP together represent
40% of the total number of instructions
executed in the SSP for this call.
The intelligence in the emerging network
is mainly allocated to centralised databases for subscriber, terminal and service
data. Because these databases communicate with the traffic handling exchanges
- using powerful protocols, such as TCAP
-they require substantial capacity at both
ends. A growing portion of calls will use at
least one such database, which considerably increases the processing capacity
required for an average call.
In mobile telephony, functions for roaming, paging, activation and deactivation,
handover, and communication with the
home location register, HLR, have added
to the demands for processing capacity
when compared with conventional calls in
PSTN. The node with the highest capacity
requirements in mobile telephony networks is the mobile switching centre,
MSC.
In terms of capacity, the major difference
between digital GSM technology and analog mobile telephony is that GSM uses
more complex protocols for communication, which means that more syntactic and
semantic checks are required. The
demand for, and the use of, subscriber
services is also greater for GSM subscribers than for subscribers to analog mobile
telephony services or non-mobile services, which also makes normal GSM calls
more complex. For example, the number
of instructions used in a GSM mobile
switching centre for a call from a mobile
subscriber towards an ordinary subscriber is 1.5 to 2.0 times greater than what
is required for a corresponding call in a
23
Flg.3
Logical model of the application modularity
concept. The system consists of
one system platform
one application platform which provides access
to shared system resources
one or more application modules which consist of
software only and implement most of the
functions of telecommunications applications.
in general, the amount of data per charging record is increasing dramatically, from
30 bytes in the early 1980s to an expected 500 bytes for GSM calls in 1998.
Example 1. The number of instructions
needed to provide detailed billing is sometimes as large as ten times the number of
instructions required for pulse metering.
Example 2. By 1998, it is expected that
the amount of data that must be transferred from an exchange to a billing centre will be between 0.1 and 0.2 Mbyte/s.
switching centre in the analog TACS
system.
The PSTN and ISDN subscribers are dependent on network databases for their mobility. The disadvantage of using centrally
stored data is increased signalling in the
network, as well as increased processor
load, both for handling protocols for communication between the local exchange
and the database and for transferring subscriber data between local exchanges, as
subscribers move from one exchange area
to another.
The widespread use of detailed billing
greatly influences the demand for processing power, since each call requires that
more data be recorded and transferred to
the billing centre.
In GSM, more than one detailed bill may
be made for each call. For detailed billing
Fig. 4a
Application programs register the details of a call in
software records. Each function Mock reports the
specific individual used for the call to the forlopp
manager. This information Is stored in a software
record identified by the forlopp identity, FID
24
The output of statistical counters for
measurement and supervision increases
the processor load only slightly. However,
if a large number of counters are used,
they will substantially increase the volume
of data transferred from the switch to the
statistical post-processing centres. By
1998, 500,000 to 1 million counters can
be expected in large transit exchanges.
The deployment of processor-controlled
switches in the commercially most interesting parts of the networks has increased
the flexibility of telecom services. The rapid introduction of new and enhanced services has become an important factor
when competing for subscribers and traffic. Another trend is intensified efforts to
reduce operational costs by keeping down
the demand for labour and increasing the
size of switching nodes, to reduce their
total number.
Both these factors imply an increasing
need for processing power. One effect of
new services is that the requirements for
detailed billing have been accentuated,
not only to specify the duration and destination of calls but also to specify the types
of service invoked during each call.
The architecture of the AXE 10 software
has been enhanced to meet the demands
for shorter time-to-market, improved inservice performance and reduced cost of
operation. Two major developments in this
software architecture that will have an
impact on the system's processing capacity are better support for application modularity and the introduction of the "Forlopp" concept.
The introduction of application modules,
AM, which are used to model applications
on a common system resource - t h e appliEricsson Review No. 1 , 1 9 9 5
Fig. 4 *
If a fault occurs, the operating system can release
the system resources affected try the fault and save
the relevant information for analysis. The release is
initiated by the foriopp manager, which instructs
the affected function blocks to use normal discon-
ence, it seems fair to assume that in the
foreseeable future the requirements for
real-time capacity will double every two
years. This assumption is valid for the
entire system, and will thus influence every part of the AXE 10 control system. The
central processing subsystem, the regional processing subsystem and the I/O handling parts will all be subject to demands
for upgraded capacity.
Due to the increased complexity of the
switching nodes, more program code must
be stored in each node, and this software
will be subject to more frequent upgrades.
The introduction of new features does not
necessarily entail loading new program
code, but the logic of the switch will have
to be changed in a flexible and efficient
manner.
cation platform - has significantly
enhanced the support of modular applications in AXE 10. Communication between
AMs is strictly limited to protocols that are
similar to network protocols. Fig. 3. This
will increase the signalling within the
system, both between AMs and to common system resources, but it will also permit a more flexible mixture of applications
in the same switch. Increased signalling,
of course, means increased internal
demands for processing capacity.
The Foriopp concept allows the software
to tie data items that take part in the same
process to a unique identifier, the foriopp
identity, Fig. 4a. This feature is used for
fault recovery actions. When the operating system or the application program
detects a fatal error in a process that uses
a foriopp identity, a foriopp release is
initiated, Fig. 4b. Through an interface
dedicated to the foriopp identity, the operating system requests all blocks that take
part in the foriopp to initiate release of all
data items connected to the foriopp. The
foriopp concept adds work to each process - for example a call or call attempt
- and thereby increases the demands for
system processing capacity.
IMPACT ON THE AXE 10 CONTROL
SYSTEM
The increasing demands for rapid introduction of new features, larger nodes and
improved billing facilities all put greater
demands on the AXE control system. The
most important requirement is for
increased real-time capacity. From experi-
The switches are also increasing in size
which means that data for an increasing
number of subscribers and traffic-carrying
devices must be stored. Since this data
must be readily available during call setup, the use of anything but random access
memories, RAM, is excluded.
Storage of charging data, on the other
hand, is affected by other requirements.
Even a major power failure must not be
allowed to destroy this information until it
has been safely stored in a post-processing centre. Larger switching nodes and
more detailed charging requirements
imply that even secondary storage on
semi-permanent media, such as hard
disks, will be subject to capacity
upgrades.
The increasing volumes of data needed for
charging and statistics, and the necessary
ability to recover from lengthy outages of
data communication facilities, put great
demands on data transfer capacity. This
means that not only faster communication
links will be required, but also that the
number of data links connected to the
switching node will increase.
Reducing the number of switching nodes
means that the same number of subscribers will be distributed over fewer nodes.
The addressingcapabilities of the present
AXE 10 control system allow over one million subscribers to be handled in an efficient way. It is likely that, in the future,
this limit must be extended, especially in
mobile applications.
25
Table 1
Historical evolution of APZ system capacity
Characteristic
APZ 210 04
APZ 212 02
APZ 212 1 1
Primary memory(*)
Relative capacity
Release date
12MW16b*»
0.12
1977
48MW16b*«
0.5
1984
225MW16b**
1.0
1989
**
in the latest version of the hardware platform
* * 1 MW16b = 1048576 16-bit words
Please observe that the relative capacity figures are only valid within the table in which they appear and not
between tables.
The cost of the access network - that is,
the connection of subscribers to the
nodes - must not rise, even though the
size of the switch increases. The remote
subscriber switch (RSS) concept used in
AXE 10, to concentrate traffic close to the
subscriber, will be the solution to this
problem. Each RSS can connect up to
2048 subscribers. The present limitation
of 256 remote units per exchange will be
extended as the need arises.
ments must therefore be more stringent
for larger nodes.
With larger nodes, the telecom network
becomes vulnerable to faults, in the sense
that a single fault may cause disruption of
service to more subscribers than in
present networks. Availability require-
However, recovery from software faults
can be improved further. Faults can be isolated to affect only parts of the switch,
thereby reducing the scope of the recovery action. 1
Today, recovery from hardware faults is
almost transparent to users. The duplication of all AXE 10 control system equipment at all levels means that every single
fault is completely masked to the application. In the rare event of multiple faults,
these must affect the same part of the
system to have an impact on the service.
Other disturbances that occur during
extensions or functional upgrades of the
switch will also be reduced. The long-term
goal is to perfect the system to the point
that hardware or software faults will not
cause any service disruption, and to
achieve this without a major cost penalty.
Table 2
Evolution of RP characteristics
RP release
1976
1984
1994
Memory
20 kW16b
PROM
1
Assembly
19 boards
Proprietary
12 bits
5 MHz
256 kW16b
Loadable
2
Assembly
3 boards
Proprietary
16 bits
5 MHz
256 kW16b
Loadable
2
Assembly
1 board
Proprietary
16 bits
5 MHz
Relative capacity
Language
Size(*)
Processor
Internal bus
Ciock frequency
* excluding regional processor bus interface PCB
between tables.
26
THE CURRENT AXE 10 CONTROL
SYSTEM
The development of processing capacity
in the central processor, CP, and the available CP memory is illustrated in Table 1 .
Due to the large number of regional processors (RP), the design of earlier versions aimed primarily at reducing cost and
size. Capacity requirements caused no
problems, since the tasks to be performed
by the processors were fairly simple. However, the advent of ISDN subscribers and
complex protocol handling has changed
this situation. The processing and the
storage capacities of the regional processors have evolved rapidly, and it is now
Table 3
Evolution of RPD characteristics
RPD release
1991
1995
Memory
Relative capacity
Language
Size(*)
Processor
Internal bus
Clock frequency
4MW8b
1
C++/C
1 board
Commercial
32 bits
25 MHz
16 MW8b
12
C++/C
1 board
Commercial
32 bits
>50 MHz
* excluding regional processor bus interface PCB
between tables.
feasible to move complex functions from
the CP level to the regional processor level, which requires simple handling of program and data and the use of high-level
program languages in these processors.
Table 4
Evolution of EMRPD characteristics
EMRPD release
Memory
Relative capacity
Language
Size
Processor
Internal bus
Clock frequency
1991
5MW8b
1
C++/C
1 board
Commercial
32 bits
20 MHz
1995
16 MW8b
1.5
C++/C
1 board
Commercial
32 bits
>33MHz
Please observe that the relative capacity figures
are only valid within the table in which they appear
and not between tables.
The 1976 version of the regional processor had memory in PROM. In 1984, RAM
was introduced, loadable from the CP.
High-level programs were introduced in a
new regional processor, the EMRP (extension module regional processor), in 1980,
and in two other versions, the RPD (regional processor, device-specific), and the
EMRPD (EMRP, device-specific), in 1991.
These latter processors in particular have
required greater capacity, especially when
used as signalling terminal controllers in
an application.
The evolution of regional processors is
shown in Tables 3 to 5.
When AXE 10 was introduced in 1976, the
input/output (I/O) system mainly had two
functions: to provide a man-machine interface and to store the system's backup
copy. Consequently, by present standards, a rather primitive system was sufficient. The RP was used for control, and
a non-volatile storage medium was provided by magnetic tape cassettes. Capacity
requirements were typically much less
than one transaction per second and, for
throughput capacity, far below one kilobyte per second.
With increasing requirements for itemised
billing, statistics and centralised operation, the I/O system was developed into
a proprietary and dedicated processing
platform, based on the so-called support
processor, SP. The SP provides an I/O
system with improved capabilities and
availability and is only loosely coupled to
Table 5
Evolution of EMRP characteristics
EMRP release
1981
1984
1986
1989
Memory
Relative capacity
Language
Size
Processor
Internal bus
Clock frequency
128 kW8b
1
Plex-M
5 boards
Commercial
8 bits
1.5 MHz
128 kW8b
1
Plex-M
4 boards
Commercial
8 bits
1.5 MHz
256 kW8b
1.3
Plex-M
2 boards
Commercial
8 bits
2 MHz
256 kW8b
1.3
Plex-M
1 board
Commercial
8 bits
2 MHz
between tables.
Ericsson Review No. 1 . 1995
27
functional modularity- based on the function block, where each block contains both
its own programs and its own data - allows
functions to be allocated to different processors, or even to different systems.
Table 6
Evolution of I/O characteristics
I/O release
1978
1987
1990
1994
1997
Secondary storage
Access
Throughput
Language
Processor
Redundancy
100 MB
Sequential
N/A
Assembly
RP
Hot standby
268 MB
Direct
3kbit/s
Eripascal
Commercial
Hot standby
1.2 GB
Direct
6kbit/s
Eripascal
Commercial
Hot standby
2.1GB
Direct
20kbit/s
Eripascal
Commercial
Hot standby
>4GB
Direct
150-200 kbit/ s
Eripascal
Commercial
Hot Standby
OGtWCCn i3Ot0S-
the switching functions. Moreover, a
sophisticated file management system
that handles hard disks instead of magnetic tape cassettes has been added to
the SP functions.
The evolution of the support processor is
shown in Table 6.
CONTINUED DEVELOPMENT OF THE
AXE 1 0 CONTROL SYSTEM
With respect to central and regional processors, there are basically two ways to
improve the processing capacity of AXE 10:
to apply technological improvements, and
to develop a more advanced architecture.
The open-endedness of the system also
makes it possible to distribute functions
to other platforms. The AXE concept of
As in all processors, the processing capacity of the central processor is dependent
on the clock frequency and the memory
access time. For AXE 10, the relative processing capacity of each consecutive generation of processors has nearly quadrupled. Approximately half of each increase
stems from doubling the clock frequency,
and the other half comes from architectural developments aimed at optimising the
use of memory. Table 7 shows the current
and projected growth in processing capacity, as it relates to clock frequency and
memory components.
The architectural developments referred
to above have been achieved through the
introduction of more parallelism and pipelining into the current architecture, which
creates opportunities for the compiler to
optimise memory accesses.
Throughout this path of evolution, a constant evaluation of other architectures has
been made. This is exemplified by the
introduction of common microprocessor
families and multi-processing concepts.
To date, however, microprocessor components have only been introduced at the
regional processor level. Multi-processing
has always been considered as an alternative in the evolution of the AXE 10 con-
iMmmmmmMmmu^mmtm
Table 7
Evolution of APZ central processor capacity
Characteristic
APZ 212 1 1
APZ 212 2 0
Next generation
Clock frequency
Memory components
1 0 MHz
4 Mbit DRAM
256 kbit SRAM
no pipelining and
no cache; microprogram ore-fetch
4 0 MHz
16 Mbit DRAM
1 Mbit SRAM
use of
pipelining and
cache; program
instruction pre-fetch
1(1 CMOS
1
1989
0.7 u BICMOS/CMOS
4
1995
>80MHz
64 Mbit DRAM
16 Mbit SRAM
extended use of
pipelining and cache;
parallel handling of
instructions and
signalling data
0,5 u CMOS
16
1998
Architecture
ASIC technology
Relative capacity
Release date
between tables.
28
trol system. In fact, even though it was
never used in the first generation of processors, the APZ 210 was designed to permit multi-processing.
So far, proprietary development of single
processors (although internally duplicated
to improve dependability) has outperformed all alternative evolution paths,
including those mentioned above. This
might change in the future. New architectures are continuously being evaluated, to
ensure the required processing capacity
in AXE 10.
The evolution of capacity at the regional
processor level is linked to the evolution
of microprocessors in general, see Tables
3 to 5.
The main task of the RPs today is to control the application hardware, which may
be very simple in some cases and very
complex in others. The distribution of functions between the CP and the RP is determined by factors such as reliability, capacity and cost.
One way to increase the processing capacity of the system is to move as much of
the device control as possible to the RP.
Another way is to move entire functions
from the central processor to the regional processor level. However, the functions
residing in RPs must ensure high reliability and data consistency. In the CP, the
duplicated hardware ensures that hardware faults are transparent to the application, whereas the reliability of RPs is in
part achieved through software. Currently, several possible ways of migrating functions from the CP to RPs are being evaluated.
The open-endedness of a system can
improve the system's processing capacity by allowing functions to be migrated to
other platforms. Recently, a new subsystem - the open communication subsystem, OCS - has been introduced in AXE
10. The OCS provides data communication, according to common standards,
between applications in AXE 10 and external computer systems and supports the
Internet protocols TCP/IP and Ethernet
links.
One of the first examples of using the OCS
connection for data communication to
external computer systems is the adjunct
processor, AP. The AP is based on a faulttolerant, high-availability, commercial
UNIX computer platform, to which - initiall y - parts of the AXE 10 charging functions
will be migrated. The AP, which is also an
extension to the I/O system, offers very
high transaction capability, typically more
than 100 transactions per second, and
throughputcapacitythatexceeds 100 kilobytes per second.
CONCLUSION
The evolution of AXE 10 processor hardware is characterised by developments to
minimise costs and risk, while complying
with new demands for functions and
capacity.
The AXE 10 system architecture - while
retaining the original, underlying principles
- has been developed to meet new
demands. It has not only been possible,
but feasible to continue the development
of the system to keep it modern and based
on state-of-the-art technology.
The future will no doubt bring new requirements forfurther developments of AXE 10,
but even today a number of solutions are
feasible. As always, the objective will be
to select a path that minimises technological risks and at the same time ensures
that capacity is available when Ericsson's
customers need it.
References
1. Englund, T.: AXE 10 Dependability.
Ericsson Review 72 (1995):1,
pp. 42-50.
2. Johansson, 6., Nord, C: Using
Predictions to Improve Software
Reliability. Ericsson Review 72
(1995):1, pp. 30-35.
29
Using Predictions to
Improve Software Reliability
Ojvind Johansson and Camilla Nord
High software reliability is becoming increasingly important. To meet the
demands for rapid development of reliable software, Ericsson is now
increasing the use of prediction methods in the software design process.
When a software system is designed, the majority of faults are often found
in a few of its modules. By early identification of these modules, software
design management - and thereby software reliability - can be considerably
improved.
The authors describe predictors capable of identifying fault-prone modules,
how these predictors can be evaluated, and how software projects can benefit from predictions.
Fig. 1
Typical breakdown of software costs. The cost of
testing equals the cost of design and coding, and
maintenance accounts for 50 percent of life time
costs
As hardware is becoming cheaper and
more reliable, the importance of software
reliability is increasing. Software applications are more complex than ever and this
leads to stricter demands on the software
development process. Fig. 1 shows the
distribution of costs during a typical software life cycle. Maintenance costs - that
is, post-installation costs of error correction and minor changes - account for
approximately 50 percent. It is therefore
important to improve the software design
process by providing the designers with
means of avoiding faults, and by discovering faults as early in the design process
as possible. Fig. 2 provides a structure of
how this can be obtained.
Early fault prevention and detection is
more easily achieved if designers and project management know where faults are
likely to occur. In software systems, the
Fig. 2
An important goal is to improve software reliability
through fault prevention; that is, to improve the way
the design process handles real and potential
faults. Fault tolerance, on the other hand, is related
to how the system handles run-time failures
30
Fig. 3
The new and modified modules in a project have
been arranged in a descending order by the ratio of
function test trouble reports to code statements.
This shows that 80 percent of the faults were found
in modules containing only 40 percent of the total
amount of code
majority of faults are often found in a few
modules. Fig. 3 illustrates, with figures
from a recent project, that the most faultdense modules - which together made up
20 percent of the total code - caused 55
percent of the function test trouble
reports. Here, a predictor capable of pointing out modules likely to contain many
faults would have been a valuable tool the earlier in the design process, the more
valuable.
The search for predictors will also provide
more profound knowledge of the development process and code characteristics,
which is useful when improving and standardising the design process through
enhanced design rules, training, etc.
Finding predictors of fault-prone
modules
The objective is to find methods of pointing out the individual modules where
faults are likely to be introduced during
design and coding. It must be possible to
use these methods in a design project
based on an existing system where some
software modules are kept as they are,
some are modified, and new modules are
added.
Two studies 35 , made in order to find suitable prediction methods, have focused on
finding good predictors for the number of
trouble reports related to each module in
the function test. The reason for this
approach is that the function test is performed in a standardised manner, is rigorously reported, and reveals a large proportion of existing faults. The studies
cover several design projects and hundreds of new and modified software modules.
The results are interesting. One would
expect there to be a close relationship
between the number of modified or new
statements (M) and the number of trouble
reports. But many of the modules with high
values of M had few faults, Fig. 4a. The
same is true of the total number of statements (S); many large modules were
Fig. 4a
Plot of the number of function test trouble reports
versus the number of new or modified source code
statements. The plot indicates no close relationship
between the number of new code statements and
the number of trouble reports generated. The modules are identical with those shown in Fig. 3
31
Fig. 4b
The relationship between module size, measured as
the total number of source code statements, and
the number of trouble reports for the same modules
as those shown in Fig. 4a
almost free from faults, Fig. 4b. Both the
number of implementation proposals (IP)
referring to a module and the number of
new or modified signals (SigFF) affecting
a module showed a closer correspondence with the number of trouble reports.
Fig. 4c
Function test trouble reports versus predictor
S* SigFF. A high predictor value points out modules
that are likely to contain many faults
(An implementation proposal is a document that describes how a requirement
influences a source system. Signals are
the primary means of ordering execution
of code. They may contain data and can
be sent between modules or within them.)
The products S*IP and S*SigFFturned out
to be even more useful. For example, modules with high values of S*SigFF were
shown to be likely sources of many trouble reports in the function test, Fig. 4c.
This is exactly the kind of results the studies were aiming at - a tool that enables
project management to effectively allocate design, inspection, and test
resources.
When predictions are made at an early
stage, the exact number of statements (S)
that a module will have when it is coded
cannot possibly be known. Estimates have
to be used. But when the modules' flow
charts have been designed, data obtained
from these flow charts can be used
instead of estimates of S. It has been
found that in predictor S* SigFF, S can be
replaced by one of several measures on
the flow chart, for example McCabe's
Cyclomatic Complexity Measure (see
Box A), the number of decision points, or
the total number of decision alternatives.
Interpretation of predictors requires great
carefulness, since variables may be correlated if they do not affect each other
directly. However, for the above-mentioned predictors there are some interesting possible explanations. In the case of
predictor S*IP, it seems reasonable that
the likelihood of faults is more or less pro-
32
RfrA
The figure depicts the different paths through a program, hi the graph, each circle is a statement, or a
number of statements that are always executed
Box A
McCabe's Cyclomatic Complexity
Measure
For a program consisting of one component only
(a program without subroutines), the McCabe
cyclomatic complexity measure is the number of
basic paths through the program.
For example, the graph in Fig. A represents a onecomponent program with cyclomatic complexity 4.
For a program consisting of several components,
such as a main component and subroutines.
Table 1
Meant.
A
B
c
D
Modate
B
A
c
D
.
^
•
•
^
D
C
B
A
hi IICS
50(50)
30(80)
1K91)
9(100)
Predl
(30)
(80)
(91)
(100»
Pmd2
(9)
(20)
(50)
(100)
For each module in the example, the number of
trouble reports is shown without parentheses.
Numbers within parentheses are accumulated
trouble reports when modules are sorted in an
actual or predicted descending order
Hfr5
Example from a study5 in which the accuracy of two
predictors was compared by means of an Alberg diagram. The accumulated percentage of trouble
reports is plotted starting with the module that the
prediction points out as the most fault-prone. For
the curve FTTH, the modules are in the order given
by the actual number of trouble reports
McCabe defined the cyclomatic measure as the
sum of measures for each of the individual components.
The cyclomatic complexity V(S) is given by the for
mula:
V(S)=e^2p
where:
e
n
p
number of edges
number of nodes
number of connected components
In the example shown in thefigure,V(S)= 10-8+2=4
portional to the amount of new requirements for a module (measured by the number of implementation proposals), and
also increasing with the amount of code
(S) that may be affected by these requirements. Predictor S*SigFF is similar to
S*IP. Signals make up the interfaces
between modules, and it is not surprising
that SigFF, the number of new or changed
signals, correlates with IP, the number of
implementation proposals. A paper presented at the 11th Nordic Teletraffic Seminar, 1993, 1 gives some more specific
explanations of the relationship between
a high occurrence of faults and high values of SigFF. For example, the presence
of a large number of signals indicates high
coupling (the module interacts with its
environment in many ways), which means
that the system is difficult to understand.
A higher value of SigFF indicates that the
module designer may have to communicate more frequently with other designers,
possibly in other countries, which involves
a greater risk of misunderstanding. Also,
careless mistakes are easily made with
the signals themselves.
Evaluating predictors
The Alberg diagram can be used to show
and evaluate the usefulness of various
predictors. The data need not have a normal distribution, which is favourable
because software measurement data is
very often found to be highly skewed and
to contain many outliers. This method is
easy to use and does not require any complicated mathematical modelling.
The easiest way of explaining the Alberg
diagram is through an example, taken
from one of the studies5:
Assume that a design project contains
four modules - A , B, C and D - mentioned
in a descending order by the number of
function test trouble reports. Table 1
shows the individual and accumulated
number of trouble reports (FTTR) for these
modules.
Two predictors, Predl and Pred2, are to
be evaluated. Assume that Predl points
out module B as the most fault-prone module, followed by A, C and D. The actual (not
predicted) values of FTTR and the module
order predicted by Predl are used to calculate the accumulated FTTR values for
Predl. Then assume that Pred2 indicates
module D as the most risky, and calculate
the accumulated FTTR values for this predictor, too. The result is shown in Table 1 .
An Alberg diagram, Fig. 5, is created by
using the y axis for the accumulated percentage of trouble reports, and the x axis
for the accumulated percentage of modules. The accumulated percentage of trouble reports is plotted, both when the modules are sorted according to the actual
outcome and when in the order given by
the predictors. Thus, the Alberg diagram
shows the ability of the predictors to rank
the modules in much the same order as
that of the actual outcome. For a predictor to be considered good, its curve should
lie close to the uppermost curve, which
represents the actual outcome.
33
Fig. 6
An Alberg diagram for predictor S*SigFF. The modules are sorted in a descending order by the number
of actual (black curve) or predicted (red curve) trouble reports per source code statement. In both cases, the actual number of trouble reports is used for
the accumulated percentage on the vertical axis
The Alberg diagram in Fig. 6 shows the
usefulness of the predictor S*SigFF. In
this case, the x-axis does not show the
accumulated percentage of modules but
the accumulated percentage of code.
Therefore, the modules have not been
sorted as described above, but in a
descending order by the number of trouble reports (actual and predicted) per code
statement. The diagram shows that the
predictor makes it possible to point out,
at an early stage, a 20-percent portion of
the final code which is expected to generate around 40 percent of the failures.
Using predictors in software projects
Fault avoidance and early fault detection
are of the utmost importance for software
productivity and quality. This means that
good predictors are of great value in project planning, where time and resources
are critical factors. Any project will benefit from early fault detection, simply
because late changes are more expensive
than those made early. If the most faultprone modules can be predicted at the
beginning of a software development project, closer attention can be paid to these
modules in the early phases of design and
testing, which will reduce the number of
remaining faults.
In general, the total software life cycle can
be divided into five phases, or steps: definition of requirements, system and software design, implementation and unit
testing, integration and system testing
and, finally, operation and maintenance,
Fig. 7. Those phases in Ericsson's software development process which correspond to steps two, three and four of the
life cycle are shown in a simplified way.
Fig. 7 shows where in the software development process predictions can be made,
and on which parameters these predictions are based.
Fig. 7
General software life cycle model related t o
Ericsson's software development process. After the
system analysis phase, the S 'IP predictor can be
calculated. The number of implementation proposals related to each module can be identified by
reading the different implementation proposals.
Estimates of the size of new and modified modules
are also available at this stage
After the function design phase, the S*SigFF predictor can be calculated by searching in the signal
coordination register for the number of new and
modified signals. Estimates of the size of new or
modified modules are also required
After block design, flowcharts and source code metrics can be used to derive other predictors
34
Fig. 8
The development process. Experiences from each
step in the cycle are used for continuous improvement and standardisation of all parts of the process
After system study and analysis, a first
prediction of the most fault-prone blocks
can be made using the S*IP predictor. A
block is a software unit that can be
designed and maintained by one designer. Hence, prediction-based planning can
be made at a detailed, individual level. The
project manager can use this prediction
when planning designer and tester training, assigning the most fault-prone blocks
to the most experienced designers, and
when requesting a group of experts to
inspect these blocks. Also refer to the
means shown in Fig. 2.
All possible test cases for complex
systems cannot be performed or even
specified. But by predicting which blocks
contain the largest number of faults, extra
resources for the testing of these blocks
can be allocated. In this way, overall reliability can be significantly improved within a given margin of expenditure.
The S*SigFF predictor can be seen as a
complement to the S*IP predictor and
used to improve project planning. Other
possible predictions could be based on
flowcharts or on the source code itself.
Software development is a continuous
process; new experience is gained from
each new iteration, Fig. 8. Trouble report
analysis is one means of acquiring such
knowledge, which can be used to improve
and standardise the different steps in the
design process.
When the design process is modified, the
prediction formulas may have to be mod-
ified too. Minor changes can be compensated through model parameter correction, but when the design process is considerably improved or changed, a thorough
model revision may have to be made.
A positive side effect is that the use of
prediction stresses the importance of
thorough and accurate measurements,
which are essential in order to monitor
progress and to verify the effect of process improvements.
CONCLUSIONS
Generally, the majority of faults in a software system are found in only a few of its
modules. The use of predictors makes it
possible to point out which modules are
likely to be the most fault-prone. For
instance, predictors S*IP and S*SigFF
can be used for predicting the number of
trouble reports written for each module in
the function test. These predictors are
useful, because they enable predictions
early in the design process.
An Alberg diagram is an efficient tool when
evaluating and comparing different predictors, especially for data that does not have
a normal distribution - a phenomenon of
frequent occurrence in software projects.
Predictions can be used for planning the
software design process such that extra
resources are allocated for design and
testing of the most fault-prone modules.
In this way, software reliability can be
improved, both through fault avoidance
and by fault detection.
References
1. Alberg, H., Johansson, 6. and
Ohlsson, N.: Predicting Error-prone Software Modules, 11th Nordic Teletrafflc
Seminar (1993).
2. Ericsson Telecom AB, Software Reliability Handbook, EN/LZG 205 603 R2,
(1993).
3. Johansson, 6.: Software Reliability, TRITA/MAT-93/0015, Royal Institute of
Technology, Stockholm (1993).
4. Myers, 6. J.: Software Reliability-Principles & Practices, John Wiley & Sons,
New York (1976).
5. Ohlsson, N.: Predicting Error-prone Software Modules in Telephone Switches,
Industriserien, LiTH-IDA-Ex-9346,
Linkoping University.
6. Shepperd, M.: A Critique of Cyclomatic
Complexity as a Software Metric, Software Engineering Journal, March 1988,
pp 30-36.
35
Test Marketing of
Mobile Intelligent Network Services
Tina Sutton, Stephen Crombie and Rima Qureshi
The competitive nature of the telecommunications industry makes it
increasingly necessary for network operators and equipment suppliers to
work as strategic partners. Often this means that the equipment supplier
must be involved with end users very early in the product cycle, even at the
conceptual stage.
The authors describe how services derived from mobile IN technology are
being tested in the New Zealand market and how these tests are used to
support further development of Ericsson's and Telecom Mobile's marketing
and product strategies.
Telecom Mobile Communications Ltd, a
cellular network operator who employs
Ericsson's CMS 88 D-AMPS system in
New Zealand, is cooperating with Ericsson
on ajoint project. The objective of this project is to provide a greater understanding
Fig. 1
Mobile IN is providing advanced end user services
to subscribers in New Zealand
of end-user and implementation aspects
of intelligent network (IN) services in
mobile cellular networks. Increasing competition and demands for advanced, customised end-user services mean that differentiation and diversification of end-user
service offerings is becoming critically
important. The test marketing project
launched by Telecom Mobile and Ericsson
offers the advantage of early market entry
with new services, and both companies
gain a considerable amount of knowledge
about the marketing and deployment of
Ericsson Mobile IN services. The project
also assists Ericsson in the development
of its Mobile IN concept for the CMS 88
Cellular System. Telecom Mobile and
Ericsson both have their own service providers which furnish a wealth of information about cellular user behaviour.
APPLICATION OF IN TECHNOLOGY IN
DAMPS NETWORKS
Intelligent network technology promisesto
deliver end-user services that allow a
higher degree of:
- service diversity
- customisation at the user or business
group level
- rapid design and deployment of new services
- service provider control.
The rapid development and deployment of
cellular networks offers new opportunities
for utilising IN technology to derive benefits similar to those already seen in the
wired telecommunications environment.
Catering for terminal mobility adds complexity to the implementation of IN technology in cellular networks. The interaction between existing end-user services
and new IN-derived services must also be
considered. In the CMS 88 mobile IN platform, these issues are dealt with through
the use of a combined Home Location Reg-
36
Fig. 2
The mobile intelligent network architecture consists
of a collocated HLR/SCP, an MSC and SMAS.
SMAS provides a graphical user interface for IN service script development. Service scripts are translated by SMAS into man-machine language (MML)
commands that are sent to the SCP via an X.25
link.
The MSC communicates with the HLR/SCP using
MTUP or IS-41+. Communication between the
HLR/SCP is according to a TCAP-based protocol
similar to IS-41. IN subscribers are assigned IN categories in the HLR, depending on the type of service
(originating, terminating or transfer) that they subscribe to
ister and IN Service Control Point,
HLR/SCP, Fig. 2. During call processing,
the mobile switching centre, MSC, queries the HLR for information about the cellular subscriber, such as location, enduser services and other supplementary
information in order for the call to be progressed. In the case of a call that requires
an IN service to be invoked, the HLR
detects this and passes the request on to
the SCP. The SCP responds accordingly,
depending on what IN services are
invoked, and forwards this information to
the HLR which, in turn, forwards it to the
MSC.
Private numbering plan
The private numbering plan provides an
abbreviated-dialling facility for cellular
phone users.
THE NETWORK STRUCTURE OF THE
CMS 88 MOBILE IN PLATFORM
In the current implementation of CMS 88
mobile IN, service information is passed
between the HLR/SCP and the MSC using
IS 41+ signalling - a signalling protocol
used in the American standard cellular
system. The interface between HLR and
SCP is a proprietary one, but it resides on
a standard signalling system 7 platform to
allow physical separation of HLR and SCP,
if required. IN call triggers within the existing CMS 88 IN call model are limited to originating calls, terminating calls and call
transfer. In later product releases, this will
be extended as the service switching function, SSF, in the MSC and MSC to HLR/SCP
protocols are enhanced.
Selective call rejection
Selective call rejection - based on a subscriber-specific A-number restriction list —
prevents certain calls from being forwarded to a mobile.
Telecom Mobile has implemented an
HLR/SCP in Auckland, to support its entire
New Zealand network. The HLR functions
are managed by means of the existing service provisioning and management
system developed by Telecom Mobile. The
IN services are provisioned and managed
with Ericsson's SMAS - the service management application system for IN - and
the subscriber management application,
SMA. The SMAwill provide the "front end"
to Telecom Mobile's customer services
personnel to allow them to provision and
administer IN services in an efficient manner. The SMA is implemented through windows-based extensions to Telecom
Mobile's provisioning and management
system.
INITIAL MOBILE IN SERVICES
Ericsson provided seven initial services on
the IN platform at the time of installation.
These services would form the basis for
the first phase of the test marketing project. The initial mobile IN services are:
Timed call diversion
Timed call diversion - based on time of
day, day of week or special day - allows
calls to be redirected to other numbers.
Outgoing call restriction
Outgoing call restriction - based on a subscriber-specific restriction list - prevents
certain numbers from being called from a
mobile phone.
Selective call acceptance
Selective call acceptance - based on a
subscriber-specific A-number allowance
list - allows only certain calls to be forwarded to a mobile.
Cellular business group
The cellular business group service comprises a set of services which can be made
available to mobile phone users in the
same organisation. Normally a unique
short number, similar to an extension
number on a PBX, would be assigned each
mobile phone in the cellular business
group. Users belonging to the same group
can call each other by dialling the short
number. Users belonging to a cellular business group can have access to:
- private numbering plan numbers
- selective call acceptance service
- selective call rejection service
- outgoing call restriction service.
The above-mentioned services are common to every user in the cellular business
group. In addition, each user in the group
can have individualised timed call diversion service.
0800-type service
This is a network-oriented service - based
on time of day, day of week or special day
- which allows calls to be redirected to
other numbers.
THE TEST MARKETING PROCESS
Telecom Mobile and Ericsson have
defined IN service test marketing as: "A
37
Beta Test. An external test of the service
using 50 to 500 real end users.
Fig. 3
Test marketing can be seen as a separate information gathering process, working parallel with the
overall service creation process. In test marketing,
information is collected primarily from the test service deployment process
systematic approach to obtaining empirical data about cellular subscriber behaviour and implementation issues through
deployment of mobile IN services in the
marketplace prior to commercial product
introduction", Fig. 3.
The test marketing process enables Telecom Mobile and Ericsson to:
- gain competitive advantage by providing
advanced end-user services
- optimise service offerings so that they
meet proven and well-defined end-user
requirements
- improve time to market by optimising
the integration and deployment processes
- improve customer support by optimising
the provisioning and administration processes prior to commercial product
launch; and:
- gain information about appropriate target markets on how to optimise pricing,
market positioning and promotion of the
services.
The test marketing project is being undertaken in a number of phases as new
mobile IN services are developed.
The test service deployment process,
which is designed to expose the service
to actual end users, has two phases:
Fig. 4
In this application, cellular business group services
are used to integrate the numbering plans of a PBX
network and cellular phones and to provide access
to mobile data facilities
Alpha Test. An in-house test of the service
with 10-20 internal users.
Information from the test marketing process is used to facilitate service launching and then provide feedback to service
definition and development processes.
SERVICE APPLICATIONS USED FOR
TEST MARKETING
Telecom Mobile has identified a number
of service applications which are used as
a basis fortest marketing research. These
applications are:
Field force
Telecom New Zealand's national fault and
work management project has provided
the opportunity to test-deploy the cellular
business group service. The overall objective of the project was to improve customer service by equipping Telecom's fault
and maintenance personnel with a wireless handheld data terminal that allows
field staff to receive information about
customer faults. The cellular network is
used to transport data to the mobile data
terminal and provide integrated communications with Telecom's PBX network, Fig. 4.
The integration of PBX and cellular communications is achieved through the use
of a linked numbering plan for cellular
phones and PBX extensions. A cellular
phone user only needs to dial a five-digit
extension number to reach a PBX extension or a cellular phone. Similarly, a PBX
user can reach any one of the field force
cellular phone users by dialling a five-digit number. Calls from the public are always
routed through the PBX before they reach
the cellular phone user.
Information about this application of IN
was gathered through interviews with the
implementation team (Telecom Mobile
and Ericsson) and through interviews with
users and their managers. The outcome
of the initial research was that
- number translation services could be
enhanced to make the service application more flexible and easy to administer
- enhancements of the billing systems
were required, to enable the services to
be billed individually
- improvements in graphical user interfaces were required, for service provisioning and administration
- the impact on existing billing, provision-
38
corporation in the trial makes it possible
to investigate end-user aspects of providing PBX-like features to corporate users.
An important technology related to the provision of PCS is microcell/picocell radio
access. Through the use of microcell
technology and IN services, Telecom and
Ericsson have been able to provide PBXlike features on a cellular phone with location-dependent tariffs.
Fig. 5
The application of mobile IN services in combination
with microcell is tested with a large corporate user.
A microcell has been placed on the customer's
premises, and the cellular business group feature
provides "PBX-like" services
ing and management systems needs to
be taken into consideration
- the processes in the service supply
chain need adapting to customised service applications
- the integration of PBX and cellular phone
services has wide application for businesses to make communications easier for the user.
This project is continuing to provide information on the application of cellular business group services.
Personal communications service
Telecom New Zealand's PCS (personal
communications service) trial began early 1995 in Auckland and will continue for
12 months. It is a market-oriented rather
than a technological trial, seeking to
establish approximate demand, pricing,
migration from other services, provisioning across fixed and mobile (cellular and
paging) networks, and cost and revenue
attributes of PCS. The attempt to integrate
some aspects of the existing mobile and
fixed networks has proved a significant
challenge. Three market segments have
been identified for testing PCS services:
residential users, small business users,
and a large corporate customer.
Each PCS customer will be provided with
a single PCS number. By utilising IN functionality, users can be contacted, whether they are on their mobile phone, at work
or at home. The standard PCS package
consists of a mobile phone, common voice
mailbox, messaging alert, timed call diversions, call diversion override, divert on
busy, home cell calling and twelve-hour
help desk support.
Large corporate trial
Although the PCS service is targeted at
the mass market, the inclusion of a large
The objective, from the large corporate's
point of view, is to improve call completion in a cost-effective manner by using
microcell technology and location-based
tariffs - calls within the microcell being
tariffed relative to PBX costs, Fig. 5. The
IN features being used (cellular business
group and timed call diversion) allow for
integration of the PBX and the cellular
phone services including extension dialling from PBX to PCS mobiles and vice versa. Users within the large corporation are
taking part in telephone interviews, indepth interviews and focus groups to
obtain a user perspective on PCS as
applied to a business environment.
Further research
Focus groups demonstrating IN features
were held, and this has meant that a number of other service applications have been
identified. Specific, favoured applications
are based on the cellular business group
and outgoing call restriction services.
INITIAL RESULTS
Although the test marketing process has
only been in place since mid-94, a considerable amount of new information has
been gained. This information falls into
four main categories: new service requirements, impact on organisation, impact on
processes, and marketing data.
New service requirements
The following new requirements have been
identified and will appear as new IN services in the near future:
- enhanced number translation functions
- enhanced screening services, i.e. selective call diversion
- selective forwarding of calls
- location-dependent call forwarding
- improvements to voice announcements
- services to support fixed and mobile
integration, i.e. personal communication service, PCS, and universal personal telecommunication, UPT
- enhanced billing
39
Fig. 6
The service creation process starts with market
research to identify end user requirements. These
requirements are specified by the network operator
and subsequently developed by Ericsson. Services
are delivered as AXE software or IN service scripts
to the operator who provisions the service to end
users via service providers and dealers
The significance of this list lies in its having been derived from and tested against
real end-user requirements.
Impact on organisation and processes
When attemptingto increase flexibility and
reduce time-to-market of new end-user
services, the organisational and process
issues become critically
important
throughout the service creation process,
Fig. 6. By undertaking the test marketing
process, Telecom Mobile and Ericsson
identified these issues at an early stage.
In summary, the areas of prime importance were:
- an increase in operator control over new
service development, allowing the market to be the driver rather than waiting
for new services to be provided in a generic software package
- a requirement for closer operational and
strategic relationships between the end
user, the network operator and the service developer
- a requirementfor improvement of service
creation processes to ensure that timeto-market goals are achieved
- significant changes to implementation
and post-implementation support processes, to take advantage of the possibility of customised service applications
- the emergence of more highly-trained distribution channels to market and sell new
services, particularly customised service
applications.
Along with the development of new mobile
IN services and of the platform itself, there
is a manifest requirement for change and
development of the organisation and its
processes. This is essential for the flexibility and time-to-market goals of mobile
IN to be attained.
Marketing aspects
Information from end users has provided
the most valuable feedback regarding the
marketing of mobile IN features. What has
been gathered to date will be validated
through further research as the test marketing project continues. Some observations follow.
As the mobile market moves from business to consumer, the ability to retain the
relatively high-revenue business custom-
40
er will become increasingly important to
network operators. The customisation of
services (to meet the requirements of a
specific user) which IN facilitates, will
become a key competitive differentiator
and a means of protecting that business
revenue stream.
Pricing
Business
customers
are
seeking
increased functionality; at the same time
they expect telecommunications costs to
fall and are looking for features that will
cut their cellular service costs. The end
user is not necessarily willing to pay for
the extra functionality IN services provide.
After all, a feature such as PBX-extension
dialling from a cellular phone - which
makes the cellular phone operate in a
more "PBX-like" fashion-willcauseacustomer to consider similarities to
PBX/PSTN/VPN pricing rather than to
mobile pricing. In this scenario, customers are not only unwilling to pay more for
the functionality of a cellular business
group; they often expect to pay less.
The corporate user
The large corporate or national customer
is an obvious target for the features of a
cellular business group. Clearly, there are
certain features that the customer would
like all the members of the group to have
(such as extension dialling), but other features may need to be tailored to the specific individual (timed call diversion and
outgoing call restriction by time of day, for
example). A truck driver might only be
allowed to make calls from his cellular
phone during working-hours, while senior
executives should be able to use their cellular phones for outgoing calls anytime of
the day. By controlling who can make calls
to which destinations, the company will
also control its costs. IN functionality must
therefore cater for both the individual and
generic requirements.
Providing microcell/picocell technology in
a corporate office environment may primarily be a question of relieving network
congestion. However, the creation of a
specific system area gives the network
operator an opportunity to develop special
tariff options for "internal" cellular calls.
The "home cell" tariffing of the PCS environment may effectively be transported to
a cellular business environment to benefit both the network operator and the customer.
The small business
Advanced call forwarding services are particularly attractive to small business users
who want to be in contact with their customers at anytime of the day; for instance,
calls during the day being forwarded to
voice-mail if the customer is unavailable,
and calls after working-hours being
forwarded to their home number. The
small business user wants a standard call
forwarding profile for most of the year, but
must have control of how this profile is set
and the ability to change it at will while he
is on vacation or when special circumstances arise. The optimal situation is
that the customer has control, but it must
be made simple for him. This is especially the case as the market moves into the
consumer area and its less sophisticated
telecommunications user.
Positioning and promotion
Customers were interested in understanding how IN services could provide them
with competitive advantage. In attempting
to market IN services, network operators
must position them in a way that shows
benefits to the customers in terms of their
own business situation. Customisation
and control are key components here along with flexibility and ease of use. This
should be kept in mind as IN services are
further developed.
SUMMARY
The test marketing project has completed
its first phase; investigation of new IN services and marketing issues is now being
planned. The project is seen as an ongoing and evolutionary process, designed to
gain a better understanding of end users
and their requirements.
Telecom Mobile and Ericsson have
acquired a considerable amount of information about the implementation and
marketing aspects of mobile IN services.
This information will be a great help in the
further development of services, because
it shows the way organisations and processes have to change to meet the
increasing demands of end users.
41
Karl-Axel Englund
Because telecommunication exchanges are expected to work 2 4 hours a
day, 3 6 5 days a year, with negligible down time, they must have an
extremely high level of dependability.
The author outlines the fault tolerance, maintainability functions and the different tasks and activities that are emphasised during the design and development of the AXE 1 0 system, to ensure high reliability and maintainability.
Today's users of telecommunication networks and services expect superior quality of the services offered. They require
high performance of the network during
the establishment and retention of calls,
and they expect network operators to provide good service support in terms of short
installation time, quality of user instructions and charging integrity. Moreover,
services have to be provided at a reasonable cost for the users and with a reasonable profit for the operator.
Today, when transmission costs are
decreasing, thanks in part to increased
use of optical fibres in the network,
requests to reduce costs further will have
to focus on the switching elements. This
requires exchanges that are very reliable
and easy to maintain.
Fig. 1
The Quality of Service concept and its relation to
dependability according to ITU-T recommendation
E.800 (revised edition)
AXE 10 has been designed with stringent
requirements for reliability, maintainability and quality of service. These requirements apply to software, hardware, documentation
and
the
man-machine
interfaces. The proven low failure rates
and the philosophy of integrated operation
and maintenance, which incorporates
scheduled corrective maintenance, contribute to the comparatively low life-cycle
cost of AXE 10.
Predictions and estimations of reliability
and maintainability performance and quality of service are important parts of assurance activities. Fig. 1 shows the qualityof-service concept and its related
performances. Estimates based on field
observations provide necessary feedback
for creation and update of databases with
the information needed for future, more
comprehensive and accurate predictions.
These predictions also provide operators
with the input they need to plan maintenance
support.
The
technological
enhancements - continuously introduced
into the AXE 10 system, thanks to its
unique modularity - can easily be covered
by updated predictions based on comprehensive field studies.
DEPENDABILITY OBJECTIVES
In recent years, the in-service performance of telecom exchanges has
improved. Demands for higher reliability
have also increased, as has the need for
larger switching nodes capable of providing more functionality to a growing number of subscribers. In terms of reliability
and unavailability, the mean accumulated
down-time objective for an exchange is
approaching a few minutes per exchange
and year. The past few years have seen a
rapid growth of interest in network survivability, especially in the US. In the standardisation work, this has been accompanied by more stringent requirements for
node switches.
42
Fault tolerance
The architecture chosen heavily influences the reliability of the system. The
exchanges in the telecom network must
have adequate redundancy, to limit the
effects on the service if major faults occur.
A primary characteristic of AXE 10 is that
hardware redundancy is provided for all
equipment handling more than
- 128 lines (POTS)
- 32 trunks (one PCM system)
- 64 ISDN basic access lines (2B+D)
- 4 ISDN primary access lines (30B+D)
The central processor (CP) is duplicated.
Its two sides work in parallel-synchronous
mode and continuously compare the program execution in both sides, which
ensures high system hardware availability. Error-correcting codes in program and
data store units ensure that single-bit
errors do not cause an outage of any of
the CP sides.
Redundancy is provided for the decentralised regional processors, the bus systems
between the central processor and the
regional processors, and between the
regional processors and their subunits.
The reliability performance of the digital
group switch is ensured by the duplicated
switching network and the triplicated clock
modules for synchronisation.
Fig. 2
When a hardware fault occurs, the supervisory functions will initiate a series of actions that lead to the
removal of the faulty printed board assembly.
Recovery in the case of software errors is accomplished through the most appropriate action with
respect to the fault situation. The range of the automatic recovery actions depends on the application,
from low level recovery by forlopp release to a major
restart with reload. The faulty function block is corrected after analysis and testing
HARDWARE RELIABILITY
Reliable equipment performance necessitates components that conform to defined
quality and performance requirements
and meet the demands for long-term reliability.
To ensure that only high-quality and reliable components are used, a comprehensive component reliability assurance program, consisting of the following
elements, is essential:
- lot-to-lot control
- vendor qualification
- process qualification
- family qualification
- type approval
- standardisation
- feedback and corrective actions.
The power supply system is sectioned in
such a way that converters and the distribution follow the redundancy principles
and structure of the powered units. The
configuration of the I/O subsystems is
flexible and can be adapted to the performance requirements that apply to the network.
MAINTAINABILITY
Extensive maintenance functions are built
into the system to supervise hardware and
software functions and to support maintenance activities. These functions automatically supervise connections through the
exchange and check that the quality-of-service performance values remain within prescribed limits. Real traffic is supervised.
Fig. 2 shows a flow diagram of the maintenance actions when faults occur.
Software supervision
The methods used to detect software
errors in AXE 10 range from direct super43
- consistency check of reference store
and data store.
Fig. 3
Typical recovery from a hardware fault in the central
processor. In normal operation, the central processors work in parallel: the two CP sides (CP-A and
CP-B) execute the same program at the microprogram level. However, only one side - CP-A - is executive and handles traffic. If a hardware failure
occurs in one CP side, a mismatch between CP-A
and CP-B is detected. The faulty CP side is either
directly pointed out or Identified by a side-determining test program. If the failure occurs In the executive side, a switchover will immediately take place.
The faulty CP side is then stopped. Traffic handling
is not disturbed by this procedure
vision, such as plausibility checks, to
indirect supervision of time, pointers, signals, etc.
Full-coverage supervisory circuits in the
central processor check that program handling is started from a certain point, at regular intervals. Plausibility checks of microprograms are made to discover program
faults before they disperse incorrect data.
Each time data is written in - or read from
- t h e data store, the index and pointer values used to calculate the absolute store
address are checked, to ensure that they
will not cause over-addressing. Automatic
actions are initiated when errors are
detected.
When a program error has been traced to
a function block, relevant data about the
cause of the error is stored in the memory, for subsequent printout. If serious
errors in the execution can be traced to
central software units, the system is
restarted. If the fault is contained within
a single process - for example, a call process adapted to the forlopp handling feature-then a low-level fault recovery mechanism called forlopp release will clear the
faulty process without affecting any other
process in the system. The word 'forlopp'
is used to denote a sequence of events.
Audit functions
Audit functions are implemented in APZ in
order to detect latent errors; for example,
those caused by software errors (bugs).
When an audit function detects an error,
an alarm is issued to inform operators
about errors that require manual intervention. The alarm condition ceases after correction of the error. Currently, the following audit functions are available:
- detection and correction of processor
store errors
- check summing of program store
- detection and reporting of data errors
44
Hardware supervision
Some of the characteristics of hardware
maintainability are:
- automatic supervision of each connection through the exchange, combined
with checks to ensure that the serveability remains within prescribed limits
- automatic reconfiguration in the case of
hardware failures
- automatic fault localisation
- advanced built-in functions for diagnostics and repair.
Supervision of the switching system is in
the form of supervision of the real traffic
it handles. Traffic circuits, such as trunk
lines, are supervised by means of statistical methods; an alarm is issued when
abnormal behaviour is detected. The processor hardware is checked by means of
various supervisory functions, such as:
- CP-side matching
- parity checks
- check summing of memories
- ECC (error correcting code)
- micro-programmed addressing checks
- watchdog function
- routine tests.
Other fault-detection mechanisms, essentially used to activate low frequent operating circuits and functions, operate on a
scheduled basis. The APZ fault-detection
functions are implemented in hardware,
software and microprograms. The supervisory functions detect both permanent
and transient faults. The time to fault
detection is short, and a clear indication
of the faulty unit is given in a diagnostic
printout.
Error-correcting code
The need for large memory volumes in APZ
has affected the hardware failure intensity, which is closely related to the number of memory circuits used in an application, and to their failure rate. For this
reason, all APZ processors are provided
with error-correcting codes, on a per word
basis, in the CPS store unit, to deal with
both permanent and transient failures.
The CPS has been designed to be tolerant to single-memory fault; it can continue to handle traffic without taking the
faulty CP side out of service. Replacement
of faulty memory boards can be postponed
until other repair activities in the affected
Forlopp release
The forlopp handling mechanism provides
correction at the call process level. This
means that higher-order recovery procedures, such as restarts, can be avoided
for most faults.
Fig. 4
The AXE system is provided with various protective
mechanisms to handle software-related errors, in
order to keep traffic handling intact to the greatest
possible extent. Depending on the fault code and
recovery Implementation, faulty program execution
may affect only a single call attempt or all call processing. The system has built-in facilities for escalation from low-level recovery, e.g. a forlopp release,
to higher-level recovery, e.g. a system restart, If
necessary
side have been performed, which extends
the time between repairs.
Hardware reconfiguration
For traffic handling not to be affected, the
hardware must be reconfigured quickly.
The reconfiguration of hardware is implemented at different levels of the AXE
system. In the RPs that control the traffic
devices, redundancy is implemented
through a pair of RPs working in load-sharing mode, whereas switching is controlled
by RPs that work in active/standby mode.
A transparent recovery mechanism - that
is, transparent to the handling of traffic is used in the event of a hardware failure
in the central processors, Fig. 3. The faulty
CP side is taken out of service without
affecting the traffic handling process; the
fault-free CP side continues to work. Once
the faulty side is repaired, the two sides
are brought to work in parallel again.
Software recovery
Large software systems are never
completely free of faults. They must therefore have protective mechanisms that
eliminate or limit the effects of faults in
the handling of traffic. Fig. 4 illustrates the
system recovery mechanisms used in
AXE 10.
Automatic system restarts have long been
the only mechanism to recover from
severe software errors, restoring the
system to a predefined stable state, from
which the execution of programs can be
continued. Although this is a simple and
robust recovery mechanism, the need for
a more sophisticated and selective one
has become necessary. Today, the
system is gradually being upgraded with
new, low-level fault recovery mechanisms. Depending on the location and
severity of the error, the control system
can use the protective mechanisms
described in the following.
The design of forlopp handling, to enable
forlopp release in an application system,
is supported by the APZ system. The forlopp handling mechanism can also be
used as a test tool that can manage hanging devices (hung lines or trunks) and
report abnormalities in the call process.
When neither forlopp release nor selective
restart can handle a fault (for example, job
buffer congestion), an unconditional
system restart is performed.
However, the forlopp release adequately
handles the most frequent faults and
substantially improves in-service performance.
Selective restart
A selective restart functions as follows: If
a software error is detected in a block that
is less important to the traffic process, the
system restart can be either terminated
or delayed until traffic is low. The recovery
action is determined by
- the fault code
- the block category
- time of day.
A selective restart can be activated or
deactivated by command. It is controlled
by the block category, which is also set by
command or contained in the exchange
data. Each time an error interrupt does not
restart the system, an error intensity counter is incremented, until a maximum limit
is reached. This maximum limit can be set
by command; when it is reached, the
system is restarted. The error information
is preserved, and an alarm is initiated for
each recovery.
The error intensity counter is automatically decremented at pre-defined intervals.
The available block categories and their
actions are:
Category 0
Error is ignored
Category 1
Delayed system restart
Category 2
Immediate system restart
Category 3
Delayed reload with major
restart
In cases where not all processes in the
system are adapted to the forlopp han45
detailed information about the contents of
the registers at the time the error occurred
can be printed out by command.
Fig. 5
Calculation tools have been integrated into a
system that contains both databases and programs.
The tools are used for predictions in the design
phase, dimensioning of spare parts inventory, and
estimates based on in-service performance statistics. The system is called DEPEND (short for
"dependability")
RELIABILITY AND MAINTAINABILITY
ASSURANCE DURING DESIGN
dling mechanism, the selective restart
complements the forlopp release. The
capability to assign block categories for
restart purposes greatly reduces the
impact of minor software errors on the AXE
in-service performance.
Minimal restart
If the control system can determine by the
fault code that the regional software in
RPs and EMRPs is not involved in the faulty
process, these units will not be updated
when a restart is initiated. This kind of
restart is called minimal restart and is
shorter than ordinary minor restarts.
Restart escalation
Previously, when the time interval between
two consecutive program errors was less
than a fixed ten-minute interval, the restart
level was automatically escalated to a
higher level. The ten-minute restart window has been improved in later APZ versions, and it can now be changed by command. The number of consecutive restarts
before escalation to a higher restart level
can also be stated. The possibility of
changing the restart window reduces the
risk of cyclic restarts before escalation.
Box A
Quality of Service
"The collective effect of service performances
which determine the degree of satisfaction of a
user of the service.
Note 1 . - The quality of service is characterised
by the combined aspects of service support performance, service operability performance, serveability performance, service integrity and other factors specific to each service.
Note 2. - The term "quality of service" is not used
to express a degree of excellence in a comparative sense, nor is it used in a quantitative sense
for technical evaluations. In these cases a qualifying adjective (modifier) should be used."
From ITU-T Recommendation E.8O0
46
Repair facilities
The repair function includes alarm handling and support for repairs. An alarm is
issued when a permanent hardware fault
has been found and localised, or when the
number of transient errors has reached
the specified alarm level. Support functions locate the faulty printed board
assembly. The system also guides operators through the actions they need to take
when intervening manually. A repair check
is always made after a board is replaced,
and a printout acknowledgement states
thatthe alarm has been removed and that
the system is restored to the normal working mode.
In rare situations, when the normal procedures and diagnostics are insufficient,
expert support may be necessary. To
assist consulting experts, an error log with
Dependability plan
Reliability and maintainability performance analyses are integral parts of the
design process. The specified reliability
requirements are analysed, and low-level
requirements are set for each subunit. For
example, in a feasibility study, various
alternatives of the product structure are
studied and the reliability and maintainability performances of the different solutions are predicted.
When a new hardware project is started,
a dependability plan describes the activities and resources that will be included
during the design process to ensure reliability and maintainability performance.
Reliability and maintainability performance are predicted to check compliance
with requirements. The results of these
predictions, including necessary analyses
(stress analyses, fault modes and effects
analyses, FMEA), are presented in a formal document according to recommended standards.
Within the project, review meetings are
held to evaluate the results and to compare them with the stated requirements,
for the purpose of providing input to the
continued design. Predictions often comprise a large amount of data. The results
of the complex calculations must be available in time to support the ongoing design
activities. A set of computerised tools has
therefore been created.
Component reliability predictions
At an early stage of the design process,
failure rates are predicted for components
and printed board assemblies, to provide
project management with the information
they need to ensure that reliability and
maintainability performance meet the
specified requirements. All predictions
are based on Ericsson's database for
component failure rates, TILDA. A component reliability prediction model, based on
findings from extensive field studies, is
also used. Ericsson Telecom has gathered
this kind of data for several years and sees
progresses, the predictions are refined to
provide more adequate reliability assessments.
Fig. 6
Failure rate prediction for a printed board assembly
using one of the SYPREX programs. In this case,
the components are listed in the order of their contributions to the total failure rate. A default template with parameter values for a typical environment and application is provided. For temperature
sensitivity analysis, the parameters can be changed
by the user. Follow-up of "old" prediction data for a
certain unit (magazine or printed board assembly) is
possible by changing the cut-over date (cut date).
The position and failure rate of each individual component on the board can also be presented. Reports
can be recorded on files for use by other tools
Fault modes and effects analysis,
FMEA
FMEA is a method used during the design
and development phase to analyse reliability by identifying failures that significantly affect system performance. How
the analysis is made depends on the specific purposes for which its results are
needed. These results are used to produce reliability block diagrams and state
diagrams.
a consistent and ongoing downward trend
in the failure rates for microcircuits.
The designer can choose the most optimal design by selecting appropriate components and acting on available information from production, testing and
operation.
In particular, vital components with high
integration, such as ASICs, are followed
up with reliability tests before they are
released. Manufacturers' life-cycle test
results are thoroughly analysed and
reviewed. Information on reliability is further used by allocating failure rate data to
the component type concerned.
To select new components, the design
centres are supported by the material
analysis laboratory. Components based
on new technology, such as BICMOS, are
studied carefully in order to achieve good
reliability assessments. Potential fault
modes are assessed, and the resulting
data is used in subsequent fault modes
and effects analyses of the boards.
If necessary, the design and layout of printed board assemblies are analysed using
temperature simulations. Reliability versus ambient temperature is calculated, to
provide a basis for improved performance.
Weak components and components that
severely affect reliability performance will
be identified.
Initial predictions are achieved by roughly
calculating the overall failure rates for the
different boards, using the parts- count
method; that is, by adding the failure rates
of all components that are used. The
results can verify that an appropriate reliability structure is feasible. As the design
Prediction tools and databases
Most of the tools that are used for ordinary reliability work are integrated into the
data application system available on host
processors in Ericsson's corporate network. These tools have been collected into
a system called DEPEND, Fig. 5, which
contains databases and calculation programs. They are accessible on-line from
workstations and closely connected to
other Ericsson databases used for design,
ordering and manufacture.
Prediction tools for complex reliability
models and other work based on statistical theory are accessible as separate programs on the workstations.
TILDA
Prediction of hardware failure rates used
in the reliability models is based on the
component failure rates accessible in the
TILDA database. This database also contains information on temperature range
and stress level for each component type.
All data in TILDA is continuously updated
by a committee, whose job it is to ensure
that the information in the database
reflects the most recent development and
knowledge of the components.
SYPREX
A set of programs called SYPREX is used
by hardware designers to predict the reliability performance of components, printed board assemblies, magazines and
other units containing components. Predictions can be based on either results of
detailed stress analyses and fault modes
and effects analyses, or on typical stress
data stored in TILDA (tentative predictions). Fig. 6 shows an example of a circuit reliability prediction for a printed
board assembly. Prediction results for
47
hardware units are automatically stored in
FELIX (see below) for later retrieval.
FELIX
A database called FELIX contains predicted failure rates of printed board assemblies and other items. These values can
be retrieved for use in more complexly connected items, for calculations of spare
parts stores, or for comparison with field
data.
RESEX
A set of programs called RESEX is used in
the support process to calculate repair
costs and to dimension spare parts inventories (components, printed board assemblies, and other replacement units) for
single- and multi-level spare parts store
organisations.
VERA
To follow up the in-service performance of
hardware, a set of programs - VERA - is
used to estimate the failure rates of components, printed board assemblies, and
other replaceable units. Statistical methods are used to compare the predictions
with the observed data, adding precision to
the estimates.
FIDEX
FIDEX is a collective term for the databases that contain information on failed and
replaced hardware equipment reported to
a repair centre, and information gathered
from analyses of faults and details.
Dependability modelling
Various types of dependability modelling
techniques are used, depending on the
subsystem, type of measure to be predicted, and availability of data.
Dependability models are based on the
initial prediction during design; they are
further updated as the design progresses
and when the predictions are updated during later life-cycle phases. A large number
of calculations are made on the basis of
state-transition diagrams and Markovian
modelling techniques, Fig. 7, using the
SYPREX calculation programs.
Maintainability analysis
In a very early phase of the design, when
the layout of the HW and SW for maintenance functions has been outlined, a
maintainability analysis is made. The
focus of this analysis is on the system's
capability for failure detection, fault localisation and repair. The analysis enables
management to assess, for example, how
well the system meets the diagnostic
accuracy requirement, and whether
improvements in the design are necessary. The precision of the analysis is
dependent on the quality of component
Fig. 7
An example of a dependability model using Markovian modelling technique. The tool Is used for
designing and examining Markov chains in an inter*
active way. The diagram, which is drawn directly in
the window, shows the expected states of a central
processor configuration with redundancy, and how
the system is handled in fault situations. An arc
between states represent a transition as a result of
a failure, logistic delay, repair, or recovery. Each
transition has a parameter value. The result of a calculation contains steady-state probabilities indicated for each state, mean recurrence time and the
mean time spent by the system in each state, or in
a combined state, e.g. a system fault state
48
Fig. 8
Evaluation of maintainability and reliability performances through fault simulation. The basic data
that has to be prepared before the actual fault simulation starts at a test site includes a list of faults
randomly selected according to a specification.
Test probes are connected to circuit pin positions
on the circuit boards when a fault is simulated.
Fault simulation is performed by fault simulator
equipment controlled by PC programs that enable
triggering of single or multiple test points to correspond to fault modes. The behaviour of the system
when a failure occurs is recorded. The observed
data are evaluated against requirements, and
improvements are introduced, if needed. The results
are also used for prediction of reliability and maintainability measures
failure rate data and the depth of the analysis. The FMEA may provide a basis for an
analysis of this kind, if the relevant information is available.
Maintainability and reliability system
architecture testing
The test used to verify that maintainability performance complies with requirements must be performed in conditions
as similar to real operation as possible.
Therefore, hardware faults are simulated
at a test site to study how the system
behaves when the hardware fails. The purpose of the test is to evaluate the maintainability performance and the reliability
structure of the system, Fig. 8. This is
done by simulating both single and multiple fault situations. The hardware failures
are chosen at random, in accordance with
their expected relative occurrence in normal operation. All data is based on the
dependability predicted in the design
phase.
The system's reactions to the simulated
hardware faults are observed and classified accordingto specified rules. Main performance parameters are evaluated statistically, to provide the characteristic,
expected in-service performance, and to
verify that the system reacts properly and
can be maintained according to the
49
Telefonaktiebolaget L M Ericsson
S-126 25 Stockholm, Sweden
Phone: +46 8 7190000
Fax: +46 8 6812710
ISSN 0014-0171
Ljungforetagen, Orebro 1995

NO 1, 1995 - Ericssonhistory.com

Transcription

Similar documents

How to Tie a Woggle

RÉSoNANSS: a regional contribution to

PrimeFilm 1800 AFL Low-Cost 35mm Slide/Film Scanning Through

IC-910 SDR Mod

The new RS-35M, June, 2010 - Stu Martin, [email protected]

BALTICA 10

MODEL 324 ADDENDUM Please make the ,following changes in

Статья

Germanium FET for Low Power CMOS Technology Th-P.205 Th