Commissione di studio sul trattamento dei dati ai fini dell

Transcription

Commissione di studio sul trattamento dei dati ai fini dell
Commissione di studio
sul trattamento
dei dati ai fini dell’analisi
congiunturale
Incaricata di formulare
proposte relative
alle strategie da utilizzare
per la disaggregazione
temporale nei conti
economici trimestrali
Rapporto finale
Ottobre 2005
Istituto nazionale
di statistica
Indice dei lavori della Commissione (versione preliminare)
Il lavoro svolto e i risultati ottenuti
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova)
3
Procedure di Disaggregazione Temporale utilizzate dall’ISTAT per la Stima dei Conti
Economici Trimestrali
di Roberto Astolfi e Marco Marini (ISTAT)
21
Beyond Chow-Lin. A Review and some Technical Remarks
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova)
49
Temporal Disaggregation by State Space Methods: Dynamic Regression Methods Revisited
di Tommaso Proietti (Dipartimento di Scienze Statistiche, Università di Udine)
85
Temporal Disaggregation Techniques of Time Series by Related Series: a Comparison
by a Monte Carlo experiment
di Anna Ciammola, Francesca Di Palma e Marco Marini (ISTAT)
117
Temporal Disaggregation Procedures: a Forecast-based Evaluation Experiment on some
Istat series
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova), Miguel Jerez
(Universidad Complutense de Madrid) e Filippo Moauro (ISTAT)
155
New Features for Time Series Temporal Disaggregation in the Modeleasy+ Environment
di Giuseppe Bruno e Giancarlo Marra (Banca d’Italia)
157
The Starting Conditions in Fernàndez and Litterman Models of Temporal Disaggregation by Related Series
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova)
Temporal Disaggregation and Seasonal Adjustment
1
173
di Tommaso Proietti (Dipartimento di Scienze Statistiche, Università di Udine) e Filippo Moauro
(ISTAT)
181
Appendice. Le Banche Dati utilizzate dalla Commissione
di Francesca Di Palma e Marco Marini (ISTAT)
2
193
Il lavoro svolto e i risultati ottenuti
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova)
Abstract
In questa nota si dà conto del lavoro svolto e dei risultati ottenuti dalla Commissione di
studio sul trattamento dei dati ai fini dell’analisi congiunturale incaricata dall’ISTAT di
formulare proposte relative alle strategie da utilizzare per la disaggregazione temporale nei
conti economici trimestrali.
1
Premessa
Nel dicembre 2003 l’Istituto Nazionale di Statistica (ISTAT) ha nominato una ‘Commissione di studio
sul trattamento dei dati ai fini dell’analisi congiunturale’, incaricata di formulare proposte relative alle
strategie da utilizzare per la disaggregazione temporale nei conti economici trimestrali1 .
Le motivazioni alla base della decisione assunta dall’ISTAT sono essenzialmente due, e precisamente:
1. la necessità di una valutazione critica della performance del metodo di disaggregazione temporale attualmente in uso dall’ISTAT per la produzione dei Conti Economici Trimestrali, rispetto a
tecniche più recenti;
2. l’esigenza che la procedura di disaggregazione temporale sia fondata su una metodologia statistica
aggiornata e coerente con le finalità di un Ente pubblico di ricerca.
Quest’ultimo punto viene inoltre ‘irrobustito’ dalla precisazione che la procedura di stima indiretta degli
aggregati di contabilità nazionale trimestrale deve ‘assicurare stabilità dei risultati, cogliere e replicare
le caratteristiche cicliche, stocastiche e stagionali delle serie di riferimento’.
Nel corso della propria attività la Commissione ha favorito scambi di informazioni e discussioni con quanti
nella comunità scientifica hanno negli ultimi anni animato il dibattito e fornito contributi sul tema delle
tecniche statistiche di disaggregazione temporale di serie economiche. Questa scelta ha consentito alla
Commissione di operare in un contesto particolarmente aggiornato sia sul piano degli sviluppi teorici sia
1 Della Commissione hanno fatto parte studiosi provenienti dall’Università (T. Di Fonzo, presidente della Commissione,
e T. Proietti), da Enti produttori/utilizzatori di statistiche congiunturali (R. Barcellan, di Eurostat, Giuseppe Bruno,
della Banca d’Italia, Giancarlo Bruno, dell’Isae), e dall’ISTAT, rappresentato da una nutrita pattuglia di ricercatori e
personale tecnico, impegnati essenzialmente nell’attività corrente di produzione degli aggregati dei conti economici annuali
e trimestrali.
3
su quello della implementazione informatica delle più recenti tecniche proposte in letteratura. Vanno al
riguardo sottolineati i continui e proficui contatti con Eurostat, che nell’ultimo decennio è stato senza dubbio l’ente statistico maggiormente impegnato sui fronti della riflessione metodologica, dell’introduzione di
innovazioni e della loro implementazione informatica2 per ciò che riguarda le tecniche di disaggregazione
temporale e di benchmarking.
In linea con questo approccio aperto al confronto scientifico, alcuni dei documenti prodotti nell’ambito dei
lavori della Commissione sono stati oggetto di presentazione a seminari e convegni, anche internazionali.
Va in particolare ricordato il Workshop on frontiers in benchmarking techniques and their application
to official statistics, organizzato da Eurostat e tenutosi nell’aprile di quest’anno in Lussemburgo, dove
tre dei lavori che fanno parte dei materiali prodotti dalla Commissione (Ciammola et al., 2005, Proietti,
2005, e Proietti e Moauro, 2005) sono stati presentati e discussi davanti ad una platea di studiosi e
addetti ai lavori provenienti da ogni parte del mondo.
La presente relazione si propone di ripercorrere tutti gli aspetti oggetto di approfondimento nei contributi
prodotti dai membri della Commissione, con l’obiettivo di stimolare una riflessione ad ampio spettro
sulla situazione attuale e sulle prospettive delle tecniche statistiche usate per la stima dei conti economici
trimestrali. Analisi e riflessioni sono state poi tradotte in considerazioni di merito, suggerimenti operativi
e proposte di cambiamento, che la Commissione offre all’ISTAT come risultato conclusivo della propria
attività.
Il lavoro è organizzato nel modo seguente. Nella sez. 2 vengono rappresentate le ragioni dell’opportunità
di un riesame critico del metodo di trimestralizzazione attualmente in uso presso l’ISTAT e viene delineato
l’attuale impianto dei conti economici trimestrali, fortemente condizionato dalle esigenze di dettaglio
informativo e di tempestività espresse dagli utilizzatori e da Eurostat. Nella sez. 3, definiti gli obiettivi
della Commissione, essenzialmente commisurati ai vantaggi attesi dall’ISTAT in termini di qualità e
affidabilità delle stime, viene rapidamente passata in rassegna la letteratura più recente sulle tecniche di
disaggregazione temporale mediante indicatori di riferimento. La sez. 4 introduce i contributi prodotti dai
membri della Commissione. La trattazione è il più possibile discorsiva, rimandando ai singoli documenti
contenuti nel rapporto finale per approfondimenti di natura tecnica e per ulteriori precisazioni. Le
considerazioni di sintesi e i suggerimenti operativi che la Commissione avanza all’ISTAT alla luce delle
risultanze emerse nel corso della sua attività vengono infine presentate nella sez. 5.
2
I Conti economici trimestrali in una prospettiva storica
Il sistema attualmente seguito nella compilazione dei Conti economici trimestrali è il risultato di una lunga
operazione di miglioramento di fonti, definizioni e metodi di calcolo che, se in gran parte ha avuto origine
dai miglioramenti apportati al sistema dei conti annuali, in parte ha avuto vita propria, beneficiando
dell’uso di procedure di stima più accurate e più vicine a quelle annuali, di tecniche statistiche più
robuste, di indicatori di riferimento maggiormente affidabili.
A partire dal 1985, anno in cui il sistema di contabilità trimestrale italiano assunse connotazioni paragonabili - non tanto per livello di dettaglio quanto per metodologie di stima - alle attuali, si sono avute
undici revisioni straordinarie dei Conti economici trimestrali (tavola 1).
2 Eurostat
diffonde Ecotrim, un programma di trimestralizzazione usato da numerosi Paesi. Inoltre, sempre per conto di
Eurostat è stata sviluppata Dynchow, una routine scritta in Gauss contenente aspetti innovativi (estensione dinamica del
modello di trimestralizzazione, possibilità di operare con modelli nei logaritmi) che sono stati oggetto di approfondimento
da parte della Commissione. La disponibilità di tale routine, di cui siamo grati ad Eurostat, ha agevolato in misura
4
Tavola 1: Revisioni storiche dei conti economici annuali e trimestrali
Periodo di riferimento
Dati annuali Dati trimestrali
Tipo di revisione
Nuova procedura di stima dei Conti economici
trimestrali (ISTAT, 1985)
Completamento della revisione del 1985 (ISTAT, 1987)
Utilizzo dei dati censuari. Nuove stime dell’occupazione
e del sommerso. Cambio dell’anno base. Nuove procedure di
stima per i prezzi costanti. Tavola input-output del 1982
Indagini sulle piccole imprese 1985-1986
Stime retrospettive annuali e trimestrali dal 1970 (ISTAT, 1992)
1
2
3
4
5
6
7
8
9
10
11
Tavole input-output del 1985 e del 1988. Indagini sulle piccole
imprese per diversi anni. Cambio dell’anno base
Revisione delle procedure di stima dei Conti economici
trimestrali. Completamento della revisione del 1991
Nuovi dati di Bilancia dei pagamenti. Stima dei fitti imputati.
Bilanciamento dei Conti annuali. Cambio dell’anno base
Introduzione del SEC95 (ISTAT, 1999)
Stime retrospettive annuali e trimestrali dal 1982
Stime retrospettive annuali e trimestrali dal 1970
Introduzione della correzione per il numero di
giorni lavorativi (ISTAT, 2003)
···
1970.1 - 1984.4
···
1980 - 1986
1970.1 - 1986.1
1980.1 - 1986.4
1983
1970
1983
1970
1983.1 - 1987.4
1970.1 - 1988.4
-
1987
1981
1988
1990
1980.1 - 1991.3
···
1970.1 - 1992.2
1970 - 1996
1970.1 - 1996.2
1988
1982
1970
1980
1988.1
1982.1
1970.1
1980.1
-
1998
1997
2000
2002
-
1999.1
1999.3
2000.4
2003.1
· · · : pubblicazione delle sole stime correnti.
Tra tali revisioni, alcune sono prevalentemente attribuibili all’esigenza di completare le valutazioni retrospettive dei conti, mentre altre hanno comportato cambiamenti importanti nel processo di produzione
delle stime e nelle fonti di dati.
La revisione del giugno 1999, legata all’introduzione del SEC95, ha comportato un cambiamento epocale
nel sistema dei conti economici trimestrali: per la prima volta, un quadro metodologico e contabile di
riferimento e un regolamento comunitario hanno orientato e disciplinato la costruzione del sistema di
Contabilità nazionale trimestrale. In effetti, mentre l’architettura attuale è strettamente disciplinata dal
regolamento comunitario, che impone il rispetto della trasmissione - secondo contenuti, modalità e tempi
definiti - degli aggregati trimestrali, per il quadro di riferimento metodologico esistono al più indicazioni,
suggerimenti o ‘intervalli’ entro i quali gli Istituti di statistica dell’Unione possono indirizzare le proprie
scelte (Eurostat, 1999).
Nel complesso, comunque, i lavori di revisione delle serie trimestrali non sono mai stati finalizzati alla
modifica delle tecniche di trimestralizzazione, ma hanno avuto piuttosto l’obiettivo di ampliare la base
informativa trimestrale e di migliorare l’insieme di indicatori di riferimento utilizzati. In sostanza, lo
sforzo compiuto negli ultimi venti anni è consistito nell’omogeneizzare il più possibile le fonti e gli schemi
contabili di riferimento dei conti trimestrali a quelli annuali, nonché di adeguare le procedure di stime
alle esigenze avanzate su vari fronti da enti e organismi internazionali.
Per avere un’idea delle dimensioni del problema, va ricordato che in Italia le stime sono condotte ad
un ampio livello di dettaglio, in particolare 32 branche di attività economica per il valore aggiunto,
l’occupazione e i redditi, 24 funzioni di consumo e 14 settori di investimento. Per elaborare tali stime il
sistema di conti trimestrali utilizza circa 1.000 serie di indicatori di riferimento (mensili o trimestrali).
Inoltre, come da regolamento comunitario, le stime sono effettuate a prezzi correnti e a prezzi costanti,
grezzi e destagionalizzati sia corretti per il numero di giorni lavorativi, sia non corretti.
considerevole il nostro lavoro.
5
In questo contesto, a quasi vent’anni dalla prima pubblicazione dei conti economici trimestrali e in vista
dei lavori previsti per il benchmark del 2006, l’ISTAT ritiene importante riesaminare le procedure di
stima dei conti economici trimestrali allo scopo di valutarne possibili avanzamenti e perfezionamenti.
2.1
Le procedure di stima attualmente in uso per i Conti economici trimestrali
I metodi statistici utilizzati per l’elaborazione dei conti trimestrali possono idealmente essere classificati
in due grandi categorie: approcci che utilizzano metodologie e fonti analoghe a quelle utilizzate per la
stima dei dati annuali (metodi diretti) e approcci statistico-matematici in cui si opera una disaggregazione
temporale dei dati annuali (metodi indiretti).
La decisione di utilizzare l’approccio diretto dipende sostanzialmente dalla disponibilità a livello trimestrale di fonti statistiche complete e paragonabili a quelle utilizzate a livello annuale. Molto spesso, infatti,
la scarsità di informazioni dirette a cadenza infrannuale sugli aggregati dei conti nazionali impone l’uso
di metodi matematici o statistici per la disaggregazione temporale dei dati annuali. Per tale ragione,
l’ISTAT ha da sempre utilizzato un approccio indiretto, in particolare il metodo di disaggregazione
temporale derivato dal lavoro di Chow e Lin (1971).
La scelta del metodo di stima di Chow e Lin si fonda sul lavoro di revisione metodologica svolto da una
commissione congiunta con la Banca d’Italia le cui risultanze sono state pubblicate nel 1985 (ISTAT,
1985)3 . Tale metodo, adattato alla trattazione di dati trimestrali da Barbone et al. (1981), si fonda
su tecniche che modellano l’andamento delle serie trimestralizzate sulla base della relazione, stimata
econometricamente, esistente tra uno specifico indicatore ed il corrispondente dato annuale.
Si tratta di un metodo particolarmente diffuso tra gli istituti di statistica di tutto il mondo4 , e non
soltanto per la stima dei conti trimestrali, a causa della sua semplicità d’uso e della sua robustezza
in diversi contesti applicativi. Va tuttavia segnalato che gli sviluppi nelle tecniche statistiche per la
modellazione delle serie economiche ed un’accresciuta capacità di valutazione critica delle ipotesi su
cui la tecnica in questione si fonda, hanno favorito l’emergere di approcci più flessibili e/o alternativi.
Tali approcci hanno dimostrato, in alcuni casi, una maggiore stabilità dei risultati (revisioni minori). A
tutt’oggi, tuttavia, numerose nuove procedure di disaggregazione temporale - particolarmente interessanti
dal punto di vista dei presupposti teorici su cui si fondano - sono state poco (o affatto) sperimentate in
un contesto operativo, qual è quello di interesse per l’ISTAT, caratterizzato da una produzione massiccia
e intensiva di serie storiche.
La bontà del metodo seguito dall’ISTAT dipende essenzialmente dal buon accostamento esistente a
cadenza annuale tra l’indicatore di riferimento e il dato di contabilità nazionale da trimestralizzare. Tale
accostamento viene migliorato tramite opportune specificazioni delle equazioni di regressione al fine di
fornire i migliori risultati in estrapolazione, vale a dire per quei trimestri dell’anno corrente per cui non
si dispone del vincolo annuo.
Prima di procedere oltre con la disamina della situazione attuale, conviene soffermarsi brevemente su
alcune problematiche derivanti dal rapporto esistente tra le metodologie di calcolo annuale e trimestrale.
3 In precedenza in Italia erano già stati svolti lavori che avevano condotto alla produzione di conti trimestrali (Cipolletta,
1968, Antonello, 1979) fondati sul metodo proposto da Bassie (1958). Nel 1985, in occasione della pubblicazione delle nuove
serie a cadenza trimestrale, fu condotta una rassegna critica di varie procedure di trimestralizzazione allora disponibili
(ISTAT, 1985, pp. 13-19), e proprio in base a tale rassegna fu motivata la scelta metodologica effettuata.
4 Si noti tuttavia che a livello di istituzioni internazionali l’approccio indiretto secondo Chow e Lin non trova unanime
accoglienza. Se Eurostat (1999) lo approva e ne auspica la diffusione, tutt’altra opinione viene manifestata dall’International
Monetary Fund (Bloem et al., 2001).
6
All’inizio di ciascun anno, tra febbraio e marzo, l’ISTAT effettua le stime degli aggregati di contabilità
nazionale per l’anno appena concluso e rivede i calcoli per alcuni anni precedenti. Le tecniche seguite
si fondano sull’aggregazione di dati di bilancio desunti da rilevazioni censuarie e campionarie e da fonti
puntuali. Con riferimento ai conti trimestrali, però, l’aspetto di maggior rilievo risiede nel fatto che
l’insieme di informazioni utilizzato per determinare l’ultimo anno è molto più ristretto rispetto a quello
su cui si fonda l’intera serie storica. Durante l’anno, inoltre, l’ISTAT elabora le stime dei primi tre
trimestri non disponendo ancora del vincolo annuo; il quarto trimestre, invece, viene calcolato in un
momento successivo rispetto ai dati annuali e pertanto quadra con essi per definizione.
Riassumendo, si può affermare che in corso d’anno vengono pubblicati tre trimestri che si considerano
‘provvisori’ fin quando non viene effettuato, in modo autonomo rispetto alla procedura trimestrale, il
calcolo dell’ultimo anno che stabilisce il vincolo da rispettare. Quindi, il principale problema per chi si
occupa di conti trimestrali è quello di prevedere al meglio il dato annuo futuro. In relazione a ciò, è
prassi usuale, nel momento in cui vengono specificate le equazioni di trimestralizzazione, condurre uno
studio apposito sugli errori di previsione con quattro trimestri di estrapolazione.
Se ne deduce, pertanto, che il problema maggiormente sentito da parte dell’ISTAT riguardo la contabilità
trimestrale è quello di migliorare le capacità previsive del dato annuale nel corso della stima dei tre
trimestri ‘provvisori’. Tale circostanza ha alcune importanti conseguenze operative. Secondo Eurostat
(1999), infatti, essendo i conti trimestrali principalmente utilizzati per l’analisi economica a breve termine,
la scelta fra le differenti procedure deve basarsi soprattutto sulla minimizzazione dell’errore di previsione
per l’anno in corso.
3
Gli obiettivi della Commissione
Il principale compito che la Commissione si è dato è stato quello di valutare la performance del metodo
di disaggregazione temporale attualmente utilizzato dall’ISTAT rispetto alle proposte più recenti della
letteratura, con lo scopo di fornire suggerimenti per l’applicazione nell’ambito delle stime correnti dei
conti economici trimestrali. I vantaggi attesi per l’ISTAT derivano dal fatto che il processo di produzione
dei conti trimestrali dovrebbe poter beneficiare delle proposte della Commissione, con miglioramento
della qualità e affidabilità delle stime nonché della visibilità scientifica dell’Istituto.
Volendo schematizzare, la scelta di una procedura di disaggregazione temporale mediante indicatori di
riferimento va effettuata sulla base di alcuni criteri di fondamentale importanza nell’ambito dei conti
trimestrali:
1. stabilità dei risultati (minimizzazione delle revisioni);
2. comportamento del metodo in estrapolazione (accuratezza delle previsioni in corso d’anno);
3. capacità di cogliere e replicare le caratteristiche cicliche, stocastiche e stagionali delle serie di
riferimento;
4. facilità di utilizzo (facilità di implementazione, velocità delle procedure di calcolo, ecc.).
In particolare, in linea con l’ultimo dei punti appena citati, data l’ampiezza dell’impianto dei conti
trimestrali e, quindi, del numero di serie da trattare in tempi molto stretti, la Commissione ha deciso
di occuparsi, oltre che di questioni prettamente metodologiche, anche di temi più strettamente operativi, relativi alla implementazione informatica delle attuali e delle nuove procedure di stima prese in
7
considerazione. In generale si è privilegiato lo sviluppo di procedure in Modeleasy+, che è l’ambiente di
programmazione e calcolo correntemente usato dalle strutture di ricerca dell’ISTAT, ma in taluni casi
sono state oggetto di valutazione e/o di sviluppo procedure scritte con linguaggi diversi, quali Gauss, Ox
e Matlab.
3.1
Diversi approcci alla stima indiretta di serie trimestrali
I metodi per la stima indiretta di serie trimestrali a partire dal noto dato annuale costituiscono una
particolare applicazione di metodi più generali, sviluppati per ricavare da serie osservate a cadenza
‘bassa’ serie storiche aventi frequenza s volte più alta e coerenti con le osservazioni di partenza.
Nei metodi che fanno uso di indicatori di riferimento, il profilo trimestrale di una serie disponibile solo a
cadenza annuale viene ricostruito con l’ausilio di informazioni trimestrali parziali e indirette fornite da
serie economiche correntemente disponibili e logicamente legate alla serie d’interesse.
Le ipotesi su cui si fondano tali metodi sono molto stringenti. Non sempre, infatti, le informazioni
indirette e parziali di cui si dispone sono adeguate allo scopo. Inoltre, la nota serie annuale può essa
stessa fornire valutazioni del fenomeno scarsamente attendibili, vanificando cosı̀ anche un’eventuale elevata rappresentatività dell’indicatore. Ovviamente questi sono due aspetti cruciali per il buon esito di
qualunque metodo di trimestralizzazione, e tale circostanza trova conferma nelle numerose revisioni delle
informazioni annuali e trimestrali su cui si fondano i conti trimestrali italiani, citate nella sez. 3.
Può tuttavia darsi la situazione opposta, e cioè che, a fronte di informazioni di base adeguate, si faccia
uso di metodi scarsamente affidabili. Ciò ha spinto numerosi autori a ricercare i modi più appropriati
per usare le informazioni indirette fornite dagli indicatori nella riduzione a cadenza trimestrale di dati
disponibili a cadenza annuale.
Storicamente, ha avuto notevole importanza il filone di metodi che producono stime coerenti coi dati
annuali nel quadro di un approccio a due stadi (Di Fonzo, 1987), che consta (i) di una fase di stima
preliminare, in cui la serie trimestrale risultante, pur sintetizzando le informazioni dell’indicatore, non
soddisfa il vincolo con la serie annuale, e (ii) di una fase di aggiustamento, in cui le stime preliminari
precedentemente ottenute vengono opportunamente rettificate per soddisfare il vincolo di aggregazione
(Bassie, 1958, Vangrevelinghe, 1966, Denton, 1971, Ginsburgh, 1973, Guerrero, 1990).
Metodi di questo genere sono usati ancora oggi, ad esempio, in Francia (INSEE, 2005), e sono stati usati
per un certo periodo anche in Italia. Essi si caratterizzano generalmente per l’assunto che il legame
tra variabile d’interesse e indicatore di riferimento sia esprimibile mediante un modello di regressione
lineare , in cui la serie annuale da trimestralizzare viene trattata come variabile dipendente e gli indicatori,
aggregati a livello annuale, come regressori. Le stime dei parametri del modello vengono quindi utilizzate
per calcolare la serie preliminare, che in un secondo momento viene aggiustata mediante tecniche ad hoc.
Il ricorso ad un modello di regressione sta alla base anche del secondo, più diffuso filone di metodi di
trimestralizzazione mediante indicatori di riferimento, derivante dalla tecnica di disaggregazione temporale ottimale di una serie storica sviluppata da Chow e Lin (1971). Punto di forza dell’approccio è la
formulazione in termini di modello di regressione del legame esistente, a livello trimestrale, tra la serie da
stimare e gli indicatori di riferimento. Il modello di regressione annuale si ottiene allora per aggregazione,
e a partire da esso si possono ricavare le stime trimestrali desiderate, sotto forma di previsioni a minima
varianza.
8
È tuttavia ben noto che il risultato di Chow e Lin (1971) vale nell’ipotesi che sia nota la matrice di
covarianza dei disturbi trimestrali del modello di regressione. Quando ciò non si dia, il che accade
nella larghissima maggioranza delle situazioni reali, si pongono due problemi di non semplice soluzione
e strettamente interrelati:
1. Come identificare il processo generatore dei disturbi trimestrali, posto che si dispone dei soli disturbi
stimati col modello di regressione annuale?
2. Come stimare in modo consistente i parametri che caratterizzano il processo generatore dei disturbi
trimestrali a partire dal processo aggregato annualmente?
Idealmente, la versione stimata del modello disaggregato dovrebbe essere coerente con la specificazione
suggerita dai dati (aggregati) disponibili, in base ad un principio di consistenza empirica (Casals et al.,
2005) tra modello disaggregato (a frequenza alta) e modello aggregato (a frequenza bassa). Per gran parte
delle tecniche, ed in particolare per quelle regression-based, il processo è invece esattamente opposto: il
modello disaggregato viene imposto ai dati, il modello aggregato che consente di stimare i parametri si
ottiene come meccanica conseguenza dell’aggregazione temporale, e le stime vanno accettate cosı̀ come
sono, anche quando il modello aggregato stimato (nella forma imposta dal modello di base postulato)
manifesta chiari problemi di adattamento.
Un altro elemento di debolezza è rappresentato dall’assunto di esogeneità dell’indicatore su cui tale approccio si basa. Grazie ad esso, infatti, non sussistono particolari problemi per ’trasferire’ la dinamica
congiunturale dell’indicatore all’indicato. In molte situazioni concrete, tuttavia, indicato e indicatore
sono variabili fortemente interrelate, caratterizzate semmai da nessi causali bidirezionali o, più semplicemente, soggette alle medesime influenze dell’ambiente economico da cui vengono generate. In casi del
genere, l’uso di modelli di regressione, semplici o con componenti dinamiche di qualche tipo, rischia di
essere un’operazione forzata e, al limite, arbitraria.
Come è evidente, si tratta di questioni molto delicate, che nel caso della procedura di Chow e Lin, e
più in generale delle tecniche che da essa direttamente discendono, vengono per lo più affrontate con
semplificazioni motivate da ragioni pratiche e con argomentazioni euristiche.
Nel corso del tempo il metodo di Chow e Lin ha dato luogo ad una serie di sviluppi secondo la linea
originariamente tracciata dagli autori (Bournay e Laroque, 1979, Fernández, 1981, Litterman, 1983, Wei
e Stram, 1990), a riflessioni critiche sulla ratio econometrica della procedura (Lupi e Parigi, 1996) e ad
estensioni in grado di gestire semplici dinamiche e trasformazioni non lineari (Tserkezos, 1991, Pinheiro
e Coimbra, 1993, Gregoir, 1995, Salazar et al., 1997, Santos Silva e Cardoso, 2001, Di Fonzo, 2003,
Mitchell et al., 2005).
Sono state inoltre sviluppate altre procedure di disaggregazione temporale (Harvey e Pierse, 1984, AlOsh, 1989, Guerrero e Martinez, 1995, Gudmundsson, 1999, Hotta e Vasconcellos, 1999, Proietti, 1999,
Cuche e Hess, 2000, Harvey e Chung, 2000, Liu e Hall, 2001, Angelini et al., 2003, Proietti e Moauro, 2003,
Moauro e Savio, 2005) che muovono da approcci differenti, spesso anche in maniera abbastanza marcata,
da quello di Chow e Lin, i cui interessanti presupposti metodologici meritano un approfondimento e le
cui performance vanno adeguatamente esplorate.
9
4
I contributi della Commissione
I lavori che la Commissione ha inserito in questo rapporto finale riguardano in pratica tutti gli argomenti affrontati nel corso delle varie riunioni. Ogni contributo è self-contained, sicché qua e là nella
lettura possono comparire ripetizioni; inoltre, le notazioni usate nei vari lavori possono non essere omogenee. Ciononostante, la Commissione ritiene che questi lavori siano tra loro connessi da un filo solido
e ben visibile, e che considerati nel loro complesso possano costituire una base a cui riferire riflessioni,
considerazioni di merito, conclusioni e suggerimenti operativi.
La presentazione segue un ordine logico che prende le mosse, com’è ovvio, da una descrizione dettagliata
delle procedure di disaggregazione temporale usate dall’ISTAT per la stima dei Conti economici trimestrali (Astolfi e Marini, 2005). In questo lavoro viene inoltre descritta la procedura TRIME, correntemente utilizzata dall’ISTAT per la trimestralizzazione degli aggregati di contabilità annuale. Trattandosi
dell’estratto da un più ampio documento usato internamente dall’ISTAT per corsi di aggiornamento, il
lavoro di Astolfi e Marini offre un quadro estremamente realistico del processo di produzione degli aggregati trimestrali attualmente in uso in ISTAT. Grazie ad esso, la Commissione ha potuto svolgere
un’attività di ‘manutenzione ordinaria’ della procedura di Chow e Lin, rilevando (e correggendo) anomalie nella formula per la stima dei valori in corso d’anno e nell’espressione della matrice di covarianza del
processo AR(1) usata nel programma di calcolo.
Nel lavoro successivo (Di Fonzo, 2005a) vengono presentate alcune estensioni della tecnica di Chow e Lin.
Nella prima parte viene analizzato il metodo di Fernàndez (1981) e, dopo aver descritto una procedura
che permette di modellare i logaritmi delle serie all’interno dei modelli di disaggregazione temporale, se ne
fornisce una semplice interpretazione economica. La seconda parte del lavoro contiene invece un’analisi
dettagliata del modello dinamico di disaggregazione temporale proposto da Santos Silva e Cardoso (2001)
e delle sue connessioni con analoghe proposte fatte da altri autori (Gregoir, 1995, Salazar et al., 1997,
Mitchell et al., 2005). Si noti che il lavoro contiene tutte le specifiche tecniche implementate da Bruno e
Marra (2005) nelle routine di disaggregazione temporale per Modeleasy+ di cui si darà conto tra breve.
Il lavoro di Proietti (2005) riprende e estende l’approfondimento delle tecniche di disaggregazione regressionbased viste in precedenza, ‘rileggendole’ secondo una formulazione state-space. L’attenzione continua
dunque ad essere concentrata su modelli la cui dinamica dipende da un singolo parametro autoregressivo: per tali modelli viene discusso il ruolo delle condizioni iniziali5 e si studiano le proprietà delle stime
di massima verosimiglianza dei parametri; inoltre se ne fornisce una rappresentazione (i) in grado di
‘annidare’ anche i modelli classici di disaggregazione temporale (tra cui quello di Chow e Lin) e, grazie a
ciò, (ii) di derivare stringenti indicazioni sulla specificazione da adottare attraverso procedure inferenziali
standard. Ulteriori risultati di interesse pratico sono dati da una ulteriore procedura per la modellazione
nei logaritmi e dalla definizione di un insieme di diagnostiche real-time sulla qualità dell’output delle
tecniche di disaggregazione e sulle revisioni che i dati subiscono al variare del periodo di disponibilità dei
dati annuali. Il lavoro si segnala inoltre per una articolata e documentata discussione critica delle proprietà del metodo di Litterman, di cui vengono evidenziati punti critici e debolezze per ciò che riguarda
la stima del parametro autoregressivo che ne governa la dinamica.
I due lavori successivi (Ciammola et al., 2005, e Di Fonzo et al., 2005) offrono risultanze empiriche sulle
performance previsive in corso d’anno - avendo come termine di confronto il dato annuale ottenuto come
somma dei 4 valori estrapolati di vari metodi - basate su simulazioni Monte Carlo nel primo caso e su
serie reali fornite dall’ISTAT (Di Palma e Marini, 2005), nel secondo.
5 Su
questo specifico punto, si veda anche Di Fonzo (2005b).
10
Gli esperimenti Monte Carlo sono stati condotti ipotizzando quasi 150 scenari diversi, per far emergere le
condizioni che permettono alle diverse tecniche regression-based di ottenere stime affidabili dei parametri
del modello usato e delle serie di interesse. In questo contesto sono emerse alcune criticità connesse alla
stima secondo i minimi quadrati generalizzati stimati dei parametri del modello di Chow e Lin, nonché
una certa debolezza anche delle stime di massima verosimiglianza, a testimonianza del pesante effetto
che l’aggregazione temporale ha sulle capacità di stima dei parametri del modello disaggregato a partire
dalle osservazioni disponibili.
Le applicazioni su serie reali, provenienti da due banche dati appositamente approntate per la Commissione (Di Palma e Marini, 2005), hanno fatto riferimento ad una più estesa tipologia di procedure di
disaggregazione, comprendente anche le estensioni del modello dinamico proposte da Proietti (2005) e
la tecnica multivariata di Moauro e Savio (2005). Come spesso accade in esperimenti di questo tipo, le
risultanze non sono univoche, cioè tali da indirizzare in maniera chiara verso una particolare procedura.
Emergono tuttavia ‘regolarità’ di comportamento che, limitatamente alle serie analizzate, consentono
quanto meno di (i) esprimere un giudizio non completamente negativo nei riguardi delle performance
offerte dalla tecnica di Chow e Lin, (ii) registrare i buoni esiti della tecnica di Fernàndez e dei modelli
SUTSE di Moauro e Savio (2005) e, infine, (iii) rimarcare anche empiricamente le debolezze proprie della
tecnica di Litterman (1983).
Il lavoro di Bruno e Marra (2005) presenta un ‘pacchetto’ di innovazioni introdotte nel comando DISAGGR, disponibile nel software Modeleasy+, che permette di effettuare la disaggregazione temporale
mediante indicatori di riferimento secondo l’approccio BLU di Chow e Lin. Rispetto alla precedente,
l’attuale versione del comando si presenta notevolmente trasformata e ampliata. Essa ingloba infatti
tutte le variazioni alla procedura di Chow e Lin di cui si è dato conto in precedenza, e arricchisce il
repertorio di procedure disponibili all’utente aggiungendovi le procedure di Fernàndez (1981), Litterman
(1983) e Santos Silva e Cardoso (2001), quest’ultima nella formulazione matriciale proposta da Di Fonzo
(2005a). Viene inoltre resa disponibile l’opzione di modellazione nei logaritmi, nella formulazione di
Aadland (2000, si veda anche Di Fonzo, 2005a). Elemento di novità nel panorama dei software per la
disaggregazione disponibili a livello internazionale6 è l’implementazione in DISAGGR della tecnica di
disaggregazione temporale di Guerrero (1990), integrata con le procedure di selezione del modello e di
destagionalizzazione offerte dai programmi TRAMO e SEATS ed utilizzabili per il trattamento delle serie
indicatrici.
L’ultimo lavoro (Proietti e Moauro, 2005), presenta una procedura, fondata su un modello strutturale
bivariato (Harvey, 1989, Harvey e Chung, 2000) formulato a cadenza temporale alta, che tenendo conto
esplicitamente delle componenti stagionali e di calendario, permette di effettuare simultaneamente la
disaggregazione temporale e la destagionalizzazione di una serie economica. La ‘filosofia’ su cui poggia il metodo è diversa da quella che informa l’attuale pratica di stima delle serie destagionalizzate
dell’ISTAT, che produce le serie destagionalizzate trimestralizzando il dato annuo mediante un indicatore di riferimento preventivamente destagionalizzato7. Anche in questo caso la questione si presenta
abbastanza delicata per i risvolti operativi e organizzativi connessi ad eventuali cambiamenti di prassi
nell’elaborazione di grandi moli di dati. Tuttavia, i risultati ottenuti portano a suggerire che, quantomeno nel medio periodo, l’ISTAT rimetta in discussione modalità di produzione e diffusione dei dati
con particolare riferimento al trattamento della componente stagionale e di calendario.
6 Si
veda anche Quilis (2004a, 2004b).
ragioni che hanno a suo tempo portato a questa modalità operativa si veda ISTAT (1985, p. 21).
7 Sulle
11
5
Le conclusioni della Commissione
Dopo aver riassunto i principali risultati che la Commissione ritiene di aver ottenuto, in questa sezione
sono presentate le conclusioni tratte da tali risultati e le indicazioni/suggerimenti che ne conseguono.
1. La Commissione ha concentrato la sua attenzione essenzialmente sulle problematiche statistiche, e
sui connessi risvolti di implementazione in programmi di calcolo, delle tecniche di disaggregazione
temporale mediante indicatori di riferimento, da usare nella stima indiretta dei conti economici
trimestrali. Vale a dire che né le problematiche connesse alla possibilità di adottare un approccio
diretto alla stima delle serie trimestrali né altre questioni connesse alla scelta degli indicatori di
riferimento, alle fonti ed al loro trattamento sono mai state oggetto di discussione. In accordo col
mandato ricevuto, la Commissione ha assunto come date le scelte fatte dall’ISTAT su questi due
aspetti. Inoltre, tenuto conto del tempo a disposizione e delle priorità dell’ISTAT, la Commissione
ha deciso di non trattare le problematiche relative al benchmarking multivariato, termine con cui si
intende l’insieme di tecniche statistiche finalizzate alla disaggregazione di più serie storiche legate
simultaneamente da vincoli di aggregazione temporale e contemporanea, situazione usuale per
variabili appartenenti ad un sistema di conti8 .
2. La disamina delle tecniche attualmente utilizzate dall’ISTAT ha portato ad un’attività di ‘manutenzione ordinaria’ dell’esistente, che ha prodotto come risultato la correzione della formula di stima
dei valori trimestrali in corso d’anno e la corretta espressione della matrice di covarianza aggregata del modello di regressione annuale usata nell’ambito della procedura di Chow e Lin per
l’ottimizzazione della funzione obiettivo in fase di stima dei parametri.
3. Sempre approfondendo la pratica corrente dell’ISTAT, sulla scorta di un’estesa sperimentazione
con metodi di simulazione Monte Carlo, sono emerse alcune criticità della procedura di stima dei
minimi quadrati generalizzati stimati (sistematica distorsione verso l’alto delle stime del parametro
autoregressivo, ρ, che caratterizza il metodo di Chow e Lin). Tale circostanza non era emersa
nell’esperienza degli anni passati per effetto dell’errata espressione della matrice di covarianza
citata al punto precedente. Va peraltro segnalato come col metodo di Chow e Lin permangano
comunque dei problemi di affidabilità delle stime di ρ anche adottando la tecnica di stima della
massima verosimiglianza.
4. La pratica corrente dell’ISTAT fa largo uso di variabili dummy e più in generale di variabili di
intervento finalizzate a catturare le rotture strutturali del modello di Chow-Lin che intervengono nel
tempo. Ciò rende poco trasparente l’impiego della procedura ed i risultati della trimestralizzazione e
dovrebbe essere interpretato piuttosto come un sintomo di inadeguatezza del modello, che dovrebbe
indurre alla ricerca delle cause. Nuclei di regressione deterministici andrebbero infatti usati con
parsimonia, e a ragion veduta, ed il loro ruolo andrebbe sempre valutato con attenzione (si pensi,
ad esempio, alla possibilità di includere polinomi del tempo fino al secondo ordine nel caso della
disaggregazione di aggregati espressi a prezzi correnti).
5. Dai lavori della Commissione (Proietti, 2005) emerge l’importanza di valutare sistematicamente la
bontà dell’adattamento del modello sottostante la disaggregazione, utilizzando le usuali diagnostiche per serie temporali, segnatamente le innovazioni. Particolarmente rilevante risulta, inoltre, il
8 Su quest’ultimo punto va segnalata l’attenzione di Eurostat, che ha promosso lo sviluppo di tecniche di benchmarking
multivariato e dei connessi programmi di calcolo, ed ha ospitato diversi contributi sull’argomento nel Workshop on frontiers
in benchmarking techniques and their application to official statistics organizzato lo scorso aprile, tra cui quello di due
membri della Commissione (Di Fonzo e Marini, 2005).
12
monitoraggio in tempo reale della capacità previsiva al di fuori del periodo campionario, valutata
con riferimento alla capacità di prevedere il valore totale dell’aggregato annuale.
6. La rassegna della letteratura, concentratasi all’inizio sulle tecniche di disaggregazione ‘regressionbased’ secondo un approccio ottimale (Best Linear Unbiased, BLU) - delle quali fa parte la procedura
di Chow e Lin - ha portato ad un approfondimento delle caratteristiche di due classiche (o, perlomeno, ben note agli esperti del settore) procedure di disaggregazione temporale, che si devono a
Fernàndez (1981) e Litterman (1983). Nel primo caso, anche grazie ad una ovvia reinterpretazione
del modello statistico su cui la procedura si fonda, si è avuto modo di apprezzare la semplicità
concettuale della logica economica che il modello sottende, che trae ulteriore forza dalla possibilità
- studiata, sviluppata e implementata dalla Commissione - di modellare i logaritmi. Nel secondo
caso, invece, si è avuto modo di evidenziare i limiti logici e statistici del modello, che è stato
oggetto di approfondimenti teorici e di esperimenti di simulazione dai quali sono emerse debolezze
non trascurabili, tali da sconsigliarne l’uso nella pratica di produzione corrente delle serie di contabilità nazionale trimestrale. Va anche detto che tali risultanze hanno trovato ulteriore conferma
negli esperimenti di previsione in corso d’anno effettuati su varie serie di contabilità nazionale
trimestrale italiana messe a disposizione dall’ISTAT. Va infine ricordato che la Commissione ha
provveduto a chiarire alcuni aspetti di dettaglio relativi al trattamento delle condizioni iniziali che
hanno permesso di interpretare e, se del caso, superare le assunzioni restrittive originariamente
fatte dagli autori di tali metodi.
7. Una prima, semplice estensione dinamica del modello adottato per la trimestralizzazione, proposta
da Santos Silva e Cardoso (2001), è stata quindi oggetto di vari approfondimenti, sia in chiave
di formulazione matriciale classica del modello di regressione, sia alla luce delle possibilità offerte
dalla formulazione state-space del problema di disaggregazione temporale mediante indicatori di
riferimento. Nel primo caso, si è avuto modo di evidenziare le analogie, quanto meno formali, tra
questa procedura di disaggregazione temporale e la formulazione standard delle procedure BLU,
sottolineando però le minori criticità dal punto di vista econometrico che caratterizzano questa
procedura. Nel secondo caso, invece, si è avuto modo di evidenziare la flessibilità della procedura,
la sua possibilità di ‘annidare’ i modelli classici (tra cui quello di Chow e Lin) e, grazie a ciò,
di derivare stringenti indicazioni sulla specificazione da adottare attraverso procedure inferenziali
standard. Entrambe le formulazioni dell’estensione dinamica del modello di disaggregazione temporale classico sono state oggetto di implementazione informatica: nel primo caso, in ambiente
Modeleasy+, immediatamente utilizzabile dall’ISTAT per l’attività di produzione corrente delle
serie trimestrali, mentre nel secondo caso i relativi programmi sono stati scritti in linguaggio Ox,
ed hanno dunque, rispetto alle esigenze dell’ISTAT, una minore fruibilità per l’attività corrente.
La Commissione suggerisce pertanto che l’ISTAT valuti la possibilità di sviluppare routine in grado
di gestire l’approccio state-space all’interno del proprio ambiente di programmazione e calcolo9.
8. La Commissione ha anche avuto modo di valutare due ulteriori approcci alla disaggregazione temporale mediante indicatori di riferimento, sviluppati rispettivamente da Guerrero (1990) e Moauro
e Savio (2005).
8.1 Nel primo caso si tratta di una procedura di ‘data-based benchmarking’, ossia una procedura
di stima in due passi, in cui una stima preliminare della serie trimestrale viene aggiustata in
maniera da essere in linea con i noti valori annuali sulla base di una matrice di covarianza degli
errori desunta dal modello identificato e stimato per l’indicatore di riferimento. Tale procedura è stata implementata in ambiente Modeleasy+ ed integrata con i programmi TRAMO e
9 Molto
interessanti sono, in questo senso, le routine di calcolo recentemente approntate da Palate (2005).
13
SEATS per la modellazione automatica dell’indicatore trimestrale. Interessante per la ‘logica’
che la sottende - è la dinamica trimestrale dell’indicatore di riferimento che guida l’utente nella
ripartizione delle discrepanze tra stime trimestrali preliminari e dato annuale -, tale procedura
mostra tuttavia qualche debolezza nei casi in cui la modellazione automatica dell’indicatore
non dovesse risultare adeguata.
8.2 Affatto diversa rispetto alle tecniche fin qui prese in considerazione, tutte essenzialmente fondate sul modello di regressione lineare, è la tecnica proposta da Moauro e Savio (2005). In
questo caso l’ambito teorico di riferimento è dato dalla classe dei modelli strutturali multivariati (Seemingly Unrelated Time Series Equations, SUTSE, si veda Harvey, 1989), il cui
pregio essenziale sta nel non ipotizzare, come invece fanno tutte le tecniche viste fin qui, una
relazione asimmetrica tra indicato e indicatore, ma piuttosto nell’operare in un quadro di
interrelazioni in base alle quali, più che di indicato e di indicatore, si deve parlare di serie
che condividono un framework comune ma sono disponibili a cadenza temporale diversa, aspetto di cui va tenuto conto in fase di modellazione multivariata10 . Per questo metodo sono
disponibili routine di calcolo scritte nel linguaggio Ox, riguardo le quali la Commissione esprime l’auspicio che l’ISTAT si attivi per trasportarle nel proprio ambiente di programmazione
e calcolo. Va peraltro aggiunto che la Commissione, pur apprezzando le qualità della tecnica
in questione, ritiene (i) che le abilità necessarie ad usare tale tecnica su base routinaria, per la
produzione corrente di un gran numero di serie storiche, siano a un tempo complesse e elevate,
e (ii) che lo sforzo di aggiornamento necessario sia troppo oneroso perché questa tecnica - a
differenza delle altre fin qui citate - possa essere presa in considerazione nell’immediato. In
una prospettiva di innovazione di medio periodo, invece, è auspicabile che l’ISTAT favorisca
la riflessione su tali tecniche, approfondendone gli aspetti metodologici e applicativi, nonché,
come è ovvio, le prevedibili implicazioni sull’attività di produzione corrente.
Va inoltre segnalato che un approccio concettualmente simile a quello appena citato (modellazione
multivariata di serie aventi diversa cadenza temporale), sviluppato da Casals et al. (2005), è
stato oggetto di approfondimento da parte della Commissione. In questo caso l’utente lavora
all’interno della classe dei modelli VARIMA con variabili osservate a diversa frequenza temporale.
Le tecniche di disaggregazione temporale basate sul modello di regressione, purché la classe di
modelli considerata venga estesa per comprendere variabili esogene (VARIMAX), sono ricomprese
come caso particolare all’interno della formulazione più generale. Tale approccio, che si segnala
anche per la disponibilità di apposite routine di calcolo sviluppate in linguaggio Matlab (Terceiro
et al., 2000), permette di analizzare in dettaglio i legami tra il modello ipotizzato a frequenza
trimestrale ed il corrispondente modello (osservabile) annuale, ed ha suggerito lo sviluppo di una
estensione del metodo di Fernàndez che si caratterizza per una funzione di previsione più flessibile
di quella del modello originale.
9. Coerentemente alle esigenze dell’utenza, agli orientamenti espressi in sede europea (Eurostat, 1999)
ed alle indicazioni provenienti dalla letteratura specializzata, la Commissione ha condotto un certo
numero di esperimenti di simulazione e su serie reali fornite dall’ISTAT per valutare le performance
previsive (ossia con riguardo alle stime in corso d’anno) dei metodi presi in considerazione. Anche
in questo caso sono emerse risultanze di un certo interesse.
10 La flessibilità di questa classe di modelli è testimoniata anche dalle opportunità che essa offre per il trattamento
simultaneo di problematiche connesse alla disaggregazione temporale e alla destagionalizzazione. Nella parte conclusiva dei
suoi lavori la Commissione ha iniziato a riflettere su tale questione (Proietti e Moauro, 2005), molto intrigante dal punto di
vista metodologico e che pone problemi non semplici da affrontare sul piano logico, interrogandosi sulla ‘filosofia’ di stima
dei conti grezzi e destagionalizzati attualmente adottata dall’ISTAT. Alla luce dei primi risultati ottenuti, la Commissione
suggerisce che, nel medio periodo, l’ISTAT si interroghi sulle modalità di produzione e diffusione dei dati con particolare
riferimento al trattamento della componente stagionale e di calendario.
14
9.1 Le stime trimestrali ottenute col metodo di Chow e Lin si collocano in una posizione intermedia dei vari ranking stilati sulla base di indici delle performance previsive quali l’Akaike
Information Criterion (AIC) e il Root Mean Squared Percentage Error (RMSPE). Va peraltro
segnalato che nell’esercizio di simulazione le risultanze offerte dal metodo di Chow e Lin non
sono state disprezzabili.
9.2 Complessivamente le performance migliori sono offerte dalle stime ottenute coi modelli SUTSE
e con la procedura di Fernàndez, quest’ultima eventualmente modellando i logaritmi.
9.3 I modelli dinamici a ritardi distribuiti, tanto nella formulazione regression-based di Santos
Silva e Cardoso (2001) quanto nella versione estesa basata su una rappresentazione state
space (Proietti, 2005), si caratterizzano per esiti alquanto polarizzati: i risultati si collocano,
cioè, nelle fasce estreme del ranking. Va comunque sottolineato che, per limiti di tempo,
nell’esperimento non è stata adottata alcuna strategia di ricerca della specificazione più adatta,
e che quindi questa circostanza potrebbe aver avuto effetti penalizzanti sui risultati finali.
9.4 Le performance fatte registrare dal metodo di Litterman non sono particolarmente rimarchevoli,
specie se si tiene conto del numero di casi in cui la procedura di stima si è segnalata per stime
del parametro autoregressivo particolarmente dubbie.
Alla luce di quanto si è fin qui esposto, la Commissione ritiene di poter suggerire all’ISTAT quanto segue:
(i) di accogliere le correzioni alla formula di estrapolazione ed alla matrice di covarianza per la procedura
di Chow e Lin citate al punto 1;
(ii) di dotarsi di un repertorio di tecniche statistiche di disaggregazione temporale più vasto di quello
attuale, acquisendo anzitutto le routine sviluppate dalla Commissione stessa in ambiente Modeleasy+;
(iii) di affiancare la ‘storica’ procedura di Chow e Lin quanto meno con la procedura di Fernàndez,
in entrambi i casi con la possibilità di modellare i logaritmi delle serie. Nello specifico, pare
ragionevole, e relativamente poco oneroso stante l’attuale disponibilità di software e l’esperienza
accumulata nel corso dei lavori della Commissione, che per ciascuna serie storica la scelta del metodo
di disaggregazione temporale da applicare venga fatta dipendere da una valutazione comparata e
in sede storica della qualità delle stime in corso d’anno fatte registrare da queste due procedure;
(iv) per un gruppo selezionato di variabili, caratterizzato da situazioni ‘stabili’ quanto a disponibilità,
qualità e rilevanza degli indicatori (si pensi, ad esempio, all’articolato, ma al momento ben consolidato processo di stima del valore aggiunto dell’industria) la Commissione suggerisce di estendere
le sperimentazioni anche ad altre procedure di disaggregazione temporale, scelte tra quelle rese
disponibili in ambiente Modeleasy+ e, compatibilmente alla disponibilità degli appropriati programmi di calcolo, anche a quelle - più convincenti sul piano della logica statistico-econometrica
che li sottende - fondate su modelli multivariati che non postulino nessi di causalità asimmetrica.
A margine di queste osservazioni, e alla luce dell’esperienza fatta, la Commissione auspica inoltre che
l’ISTAT continui e potenzi, estendendola a tutte le serie oggetto di trimestralizzazione, la lodevole prassi
di archiviazione dei dati di input e delle risultanze delle procedure statistiche di stima. Si tratta infatti di
informazioni cruciali per permettere analisi real-time, utili a monitorare la qualità, e l’eventuale degrado
nel tempo, dei processi di produzione delle informazioni. In particolare, l’obiettivo a cui tendere è
un sistema informatizzato che permetta di accedere rapidamente a tutte le versioni delle serie annuali,
degli indicatori di riferimento e delle equazioni di trimestralizzazione via via utilizzate nel processo di
produzione delle serie di contabilità trimestrale.
15
Riferimenti bibliografici
Aadland D.M. (2000), Distribution and interpolation using transformed data, Journal of Applied Statistics, 27: 141-156.
Al-Osh M. (1989), A dynamic linear model approach for disaggregating time series data, Journal of
Forecasting, 8: 85-96
Angelini E., J. Henry e M. Marcellino (2003), Interpolation and backdating with a large information set,
European Central Bank, Working Paper No. 252.
Antonello P. (1979), La costruzione dei conti economici trimestrali dal 1954 al 1971, in D. Da Empoli, V.
Siesto e P. Antonello (a cura di), Finanza pubblica e contabilità nazionale su base trimestrale, Padova,
Cedam.
Barbone L., G. Bodo e I. Visco (1981), Costi e profitti in senso stretto: un’analisi su serie trimestrali,
1970-1980, Bollettino della Banca d’Italia, 36, numero unico.
Bassie V.L. (1958), Economic forecasting, New York, Mc Graw-Hill.
Bloem A., R.J. Dippelsman e N.Ø. Mæhle (2001), Quarterly National Accounts Manual. Concepts, data
sources, and compilation, Washington DC, International Monetary Fund.
Bournay J. e G. Laroque (1979), Réflexions sur la méthode d’élaboration des comptes trimestriels,
Annales de l’INSEE, 36: 3-30.
Bruno G. e G. Marra (2005), New features for time series temporal disaggregation in the Modeleasy+
environment, Rapporto finale della Commissione di studio sul trattamento dei dati ai fini dell’analisi
congiunturale, ISTAT.
Casals, M. Jerez e S. Sotoca (2005), Empirical modeling of time series sampled at different frequencies,
comunicazione presentata al ’Workshop on frontiers in benchmarking techniques and their application
to official statistics’, Lussemburgo, 7-8 Aprile 2005..
Chow G. e A.L. Lin (1971), Best linear unbiased interpolation, distribution and extrapolation of time
series by related series, The Review of Economics and Statistics, 53: 372-375.
Ciammola A., F. Di Palma e M. Marini (2005), Temporal disaggregation techniques of time series by
related series: a comparison by a Monte Carlo experiment, Rapporto finale della Commissione di studio
sul trattamento dei dati ai fini dell’analisi congiunturale, ISTAT.
Cipolletta I. (1968) Indicatori di tendenza delle poste della contabilità nazionale a cadenza trimestrale,
ISCO, Rassegna dei lavori interni dell’Istituto, 14.
Cuche N.A. e M.K. Hess (2000), Estimating monthly GDP in a general Kalman filter framework: evidence
from Switzerland, Economic & Financial Modelling, Winter 2000: 153-193.
Denton F.T. (1971), Adjustment of monthly or quarterly series to annual totals: An approach based on
quadratic minimization, Journal of the American Statistical Association, 66: 99-102.
Di Fonzo T. (1987), La stima indiretta di serie economiche trimestrali, Padova, Cleup.
16
Di Fonzo T. (2003), Temporal disaggregation using related series: log-transformation and dynamic extensions, Rivista Internazionale di Scienze Economiche e Commerciali, 50, 3: 371-400.
Di Fonzo T. (2005a), Beyond Chow-Lin. A review and some technical remarks, Rapporto finale della
Commissione di studio sul trattamento dei dati ai fini dell’analisi congiunturale, ISTAT.
Di Fonzo T. (2005b), The starting conditions in Fernàndez and Litterman models of temporal disaggregation by related series, Rapporto finale della Commissione di studio sul trattamento dei dati ai fini
dell’analisi congiunturale, ISTAT.
Di Fonzo T., M. Jerez e F. Moauro (2005), Tecniche di disaggregazione temporale mediante indicatori
di riferimento: un confronto della capacità previsiva su alcune serie dell’ISTAT, Rapporto finale della
Commissione di studio sul trattamento dei dati ai fini dell’analisi congiunturale, ISTAT.
Di Fonzo T. e M. Marini (2005), Benchmarking a system of time series: Denton’s movement preservation
principle vs. a data based procedure, comunicazione presentata al ‘Workshop on frontiers in benchmarking
techniques and their application to official statistics’, Lussemburgo, 7-8 Aprile 2005.
Di Palma F. e M. Marini (2005), Appendice. Le banche dati usate dalla commissione, Rapporto finale
della Commissione di studio sul trattamento dei dati ai fini dell’analisi congiunturale, ISTAT.
Eurostat (1999), Handbook of quarterly national accounts, Luxembourg, European Commission.
Fernàndez R.B. (1981), A methodological note on the estimation of time series, The Review of Economics
and Statistics, 63: 471-478.
Ginsburgh V.A. (1973), A further note on the derivation of quarterly figures consistent with annual data,
Applied Statistics, 22: 368-374
Gregoir S. (1995), Propositions pour une désagrégation temporelle basée sur des modèles dynamiques
simples, INSEE (mimeo).
Gudmundsson G. (1999), Disaggregation of annual flow data with multiplicative trends, Journal of
Forecasting, 18:33-37.
Guerrero V.M. (1990), Temporal disaggregation of time series: an ARIMA-based approach, International
Statistical Review, 58: 29-46.
Guerrero V.M. e J. Martinez (1995), A recursive ARIMA-based procedure for disaggregating a time
series variable using concurrent data, TEST, 2: 359-376.
Harvey A.C. (1989), Forecasting, structural time series models and the Kalman filter, Cambridge, Cambridge University Press.
Harvey A. e C.H. Chung (2000), Estimating the underlying change in unemployment in the UK, Journal
of the Royal Statistical Society, A, 163: 303-328.
Harvey A.C. e R.G. Pierse (1984), Estimating missing observations in economic time series, Journal of
the American Statistical Association, 79: 125-131.
Hotta L.K. e K.L. Vasconcellos (1999), Aggregation and disaggregation of structural time series models,
Journal of Time Series Analysis, 20: 155-171.
17
INSEE (2005), Méthodologie des comptes trimestriels, INSEE Méthodes, 108.
ISTAT (1985), I conti economici trimestrali, anni 1970-1984, Supplemento al Bollettino Mensile di Statistica, 12.
ISTAT (1987), Miglioramenti apportati ai conti economici trimestrali. Serie con base dei prezzi 1970,
Collana d’Informazione, 4.
ISTAT (1992), I conti economici trimestrali con base 1980, Note e relazioni, 1.
ISTAT (1999), Revisione dei conti nazionali e adozione del SEC95, Note rapide, 4.
ISTAT (2003), Principali caratteristiche della correzione per i giorni lavorativi dei conti economici trimestrali (mimeo).
Litterman R.B. (1983), A random walk, Markov model for the distribution of time series, Journal of
Business and Economic Statistics, 1: 169-173.
Liu H. e S.G. Hall (2001), Creating high-frequency National Accounts with state-space modelling: a
Monte Carlo experiment, Journal of Forecasting, 20: 441-449.
Lupi C. e G. Parigi G. (1996), La disaggregazione temporale di serie economiche: un approccio econometrico, ISTAT, Quaderni di Ricerca, 3.
Mitchell J., R.J. Smith, M.R. Weale, S. Wright e E.L. Salazar (2005), An Indicator of Monthly GDP
and an Early Estimate of Quarterly GDP Growth, The Economic Journal, 115: F108-F129.
Moauro F. e G. Savio (2005), Temporal disaggregation using multivariate structural time series models,
Econometrics Journal, 8: 214-234.
Palate J. (2005), Reusable components for benchmarking using Kalman filters, comunicazione presentata al ‘Workshop on frontiers in benchmarking techniques and their application to official statistics’,
Lussemburgo, 7-8 Aprile 2005.
Pinheiro M. e C. Coimbra (1993), Distribution and Extrapolation of Time Series by Related Series Using
Logarithms and Smoothing Penalties, Economia, 17: 359-374.
Proietti T. (1999), Distribution and interpolation revisited: a structural approach, Statistica, 58: 411432.
Proietti T. (2005), Temporal disaggregation by state space methods: dynamic regression methods revisited,
Rapporto finale della Commissione di studio sul trattamento dei dati ai fini dell’analisi congiunturale,
ISTAT.
Proietti T. e F. Moauro (2003), Dynamic factor analysis with nonlinear temporal aggregation constraints
(mimeo).
Proietti T. e F. Moauro (2005), Temporal disaggregation and seasonal adjustment, Rapporto finale della
Commissione di studio sul trattamento dei dati ai fini dell’analisi congiunturale, ISTAT.
Quilis E.M. (2004a), Sobre el metodo de desagregacion temporal de Guerrero, Madrid, INE.
Quilis E.M. (2004b), A Matlab library of temporal disaggregation methods: summary, Madrid, INE.
18
Salazar E.L., R.J. Smith e M. Weale (1997), Interpolation using a Dynamic Regression Model: Specification and Monte Carlo Properties, NIESR Discussion Paper n. 126.
Santos Silva J.M.C. e F.N. Cardoso (2001), The Chow-Lin method using dynamic models, Economic
Modelling, 18: 269-280.
Terceiro J. J.M. Casals, M. Jerez, G. Serrano e S. Sotoca (2000), Time series analysis using MATLAB,
including a complete MATLAB Toolbox (mimeo).
Tserkezos D.E. (1991), A distributed lag model for quarterly disaggregation of the annual personal
disposable income of the Greek economy, Economic Modelling, 8: 528-536.
Vangrevelinghe G. (1966), L’evolution a court terme de la consommation des menages: connaissance
analyse et prevision, Études et Conjoncture, 9: 54-102.
Wei W.W.S. e D.O. Stram (1990), Disaggregation of time series models, Journal of the Royal Statistical
Society, B, 52:453-467.
19
20
Procedure di Disaggregazione Temporale utilizzate dall’ISTAT per la Stima
dei Conti Economici Trimestrali
di Roberto Astolfi e Marco Marini (ISTAT)
Abstract
In questo lavoro vengono presentate le tecniche statistiche di disaggregazione temporale impiegate nel processo di produzione dei conti economici trimestrali dall’ISTAT. Viene inoltre
descritta la procedura informatica TRIME, correntemente utilizzata per la trimestralizzazione
degli aggregati annuali di contabilità nazionale.
6
6.1
Introduzione
Il processo di produzione degli aggregati trimestrali di Contabilità Nazionale
Il diagramma di flusso nella Figura 1 descrive il processo di produzione degli aggregati trimestrali attualmente in uso nella Contabilità Nazionale (CN).
Il sistema di calcolo trimestrale degli aggregati di CN è basato su un approccio di tipo indiretto; in mancanza delle medesimi fonti utilizzate per il calcolo annuale, la dinamica trimestrale degli aggregati dipende
indirettamente da quella di uno o più indicatori di breve periodo. La trasmissione dell’informazione
dall’indicatore all’aggregato avviene per mezzo delle tecniche di disaggregazione temporale.
Gli input del processo produttivo sono essenzialmente due: l’aggregato di CN, calcolato nel quadro dei
conti economici nazionali a livello annuale, ed il relativo indicatore congiunturale, generalmente osservato
a cadenza mensile o trimestrale. La relazione esistente fra aggregato ed indicatore è generalmente di
tipo diretto; la scelta dell’indicatore è spesso basata sulla vicinanza del concetto economico misurato
dall’indicatore rispetto a quello, di solito più generale, rappresentato dall’aggregato di CN. Conviene
fin da subito rendere chiaro come la scelta dell’indicatore sia l’aspetto più delicato dell’intero processo
produttivo; nel prossimo paragrafo discuteremo in maniera più approfondita quali siano gli aspetti più
importanti che influiscono su tale scelta.
Esistono dei casi in cui vi sono più indicatori di riferimento per uno stesso aggregato. La prassi comunemente seguita è quella di calcolare un solo indicatore ponderato attraverso una combinazione lineare
degli indicatori selezionati. I pesi di tale combinazione sono generalmente determinati in base a criteri
oggettivi (in genere quote rispetto al fenomeno complessivo misurato dall’aggregato in esame); qualora
tali informazioni non esistano o siano difficili da reperire, si ricorre spesso a criteri soggettivi quali il
grado di affidabilità degli indicatori, le revisioni subite in corso d’anno, etc.
21
Figura 1: Il processo di produzione degli aggregati trimestrali di contabilità nazionale
Gli indicatori disponibili sono generalmente espressi secondo varie unità di misura: indici a base fissa,
indici concatenati, valori a prezzi correnti, etc. Non ha importanza quale sia il formato disponibile,
quanto l’effettiva capacità dell’indicatore di riprodurre la dinamica dell’aggregato annuale di interesse.
A livello infra-annuale una grandezza economica presenta tuttavia dei movimenti che disturbano l’analisi
delle tendenze di breve-medio periodo di maggiore interesse. Ci riferiamo a quei movimenti che dipendono
dalle diverse stagioni dell’anno (effetti stagionali) ed a quelli, meno evidenti, legati alla composizione del
calendario (effetti di calendario).
22
Il regolamento comunitario sui conti economici trimestrali impone l’invio dei dati trimestrali
• grezzi
• grezzi corretti per i giorni lavorativi
• destagionalizzati corretti per i giorni lavorativi.
È necessario utilizzare appropriate tecniche statistiche in grado di stimare gli effetti di calendario e
stagionali e produrre degli aggregati trimestrali al netto delle componenti stesse.
L’identificazione, la stima e la rimozione delle componenti è eseguita direttamente sugli indicatori di
riferimento. Un approccio alternativo è quello di applicare tali metodi sull’aggregato trimestrale grezzo,
ottenuto mediante disaggregazione dei dati annuali con l’indicatore grezzo. Esistono diverse ragioni di
tipo pratico e teorico per le quali l’approccio seguito risulta preferibile: qui di seguito elenchiamo quelle
che l’ISTAT ritiene più importanti. In primo luogo, se si assume che la variabilità trimestrale di un
aggregato sia completamente affidata alla dinamica dell’indicatore di riferimento, qualsiasi componente
infra-annuale deve essere su quest’ultimo individuata e stimata. Esiste poi il problema della correzione
per i giorni lavorativi che emerge maggiormente su dati mensili che su dati trimestrali. Infine, la tecnica di
destagionalizzazione utilizzata non assicura la consistenza dei dati destagionalizzati con il dato annuale
grezzo; in pratica, la destagionalizzione diretta dell’aggregato grezzo non garantirebbe il rispetto del
vincolo annuo.
In molti casi uno stesso indicatore è già disponibile in forma destagionalizzata; raramente poi, si dispone
anche di una versione corretta per gli effetti di calendario (indice della produzione industriale, ad esempio). Le procedure statistiche adottate dagli enti che rilasciano tali indicatori possono tuttavia non
essere omogenee. Per tale motivo, è preferibile partire sempre dal dato grezzo, ovvero dal dato cosı̀ come
viene rilevato, ed applicare uniformemente le medesime tecniche statistiche.
Questa rappresenta la fase del trattamento degli indicatori. Essa è composta dalla correzione per gli
effetti di calendario e dalla destagionalizzazione.
La correzione per il diverso numero di giorni lavorativi è una procedura relativamente nuova per la
contabilità, essendo stata introdotta solamente nel 2003. L’output di questa procedura è rappresentato
da una serie storica al netto dell’effetto dovuto al diverso numero dei giorni lavorativi, dell’effetto Pasqua
e dell’effetto anno bisestile. In effetti, il diverso numero dei giorni lavorativi è solo uno degli aspetti
che compongono il più generale fenomeno degli effetti di calendario. In questa sede basti ricordare
che le festività pasquali, essendo mobili nel tempo, possono generare distorsioni nella valutazione delle
dinamiche congiunturali di alcune fenomeni economici (un caso evidente sono i consumi delle famiglie
per alberghi e pubblici esercizi).
In questa fase introduttiva ci soffermiamo su due aspetti del metodo di correzione utilizzato. Come
già accennato, l’effetto dovuto al calendario è in genere poco evidente anche ad un occhio attento ed
esperto. Maggiore sarà il livello di aggregazione temporale al quale l’indicatore è disponibile, minore
sarà l’effetto riscontrato. Per questo motivo si preferisce condurre la fase di correzione a livello mensile,
qualora l’informazione sia disponibile. Ciò ha consentito di ottenere dei risultati più coerenti nell’ambito
dei conti trimestrali, sopratutto per le branche dell’industria dove si dispone degli indici mensili della
produzione industriale.
23
È utile ricordare, inoltre, che l’effetto dovuto ai giorni lavorativi non si esaurisce nell’anno, a differenza
della componente stagionale (che invece è a somma zero, se considerata in senso deterministico). Infatti,
anche gli anni sono composti da un diverso numero di giornate lavorative; ciò può influire, anche se
in misura molto limitata, sui tassi di crescita annuali di una grandezza economica. È per tale motivo
che in fase di correzione si calcola una stima dell’aggregato annuale corretta basata sull’aggiustamento
apportato all’indicatore di riferimento.
Il software statistico impiegato in questa fase è costituito dai programmi TRAMO (Time series Regression with Arima noise, Missing observations and Outliers) e SEATS (Signal Extraction with Arima
Time Series), che rappresentano gli strumenti ufficiali utilizzati dall’ISTAT per il trattamento delle serie
storiche congiunturali.
Dalla fase di correzione si ottiene l’indicatore depurato dagli effetto di calendario. Si hanno quindi a
disposizione due indicatori grezzi, uno corretto e l’altro non corretto. L’indicatore corretto rappresenta
l’input della fase di destagionalizzazione. Questo è un elemento innovativo rispetto al passato, poiché la
destagionalizzazione era prima effettuata sull’indicatore grezzo senza tener conto di alcuna correzione.
In effetti, il numero dei giorni lavorativi contiene una componente stagionale. Questa era prima erroneamente inglobata nella componente stocastica stimata nel processo di destagionalizzazione, mentre ora
viene modellata attraverso una serie di regressori deterministici.
Il processo di destagionalizzazione viene quindi effettuato al netto degli effetti di calendario. Per ridurre al
minimo le revisioni rispetto alle serie storiche passate, si è scelto di continuare ad effettuare la destagionalizzazione a livello trimestrale. Qualora mensile, l’indicatore corretto viene aggregato trimestralmente
per poi essere destagionalizzato. L’output principale della fase di destagionalizzazione è la serie destagionalizzata e corretta per i giorni lavorativi. Attraverso una semplice formula di passaggio, si calcola
anche una versione dell’indicatore destagionalizzato che non tiene conto della correzione. Le serie destagionalizzate senza correzione non sono più richieste da Eurostat ma continuano ad essere diffuse per
garantire agli utilizzatori interni una maggiore coerenza con i dati prodotti secondo la vecchia procedura
di destagionalizzazione.
Come già accennato, il software statistico utilizzato è SEATS. Il programma preleva in modo automatico
una serie linearizzata da TRAMO, ovvero depurata da una serie di effetti deterministici (tra i quali gli
effetti di calendario). Tale serie è scomposta in componenti inosservate (trend, ciclo, stagionalità ed
errore), disponibile ciascuna come output del programma. La serie destagionalizzata è la serie cui viene
detratta la sola componente stagionale.
Fin qui abbiamo percorso verticalmente lo schema di Figura 1. L’indicatore selezionato è stato corretto
e destagionalizzato ed è disponibile in quattro versioni. Ciascuna di esse sarà utilizzata nella successiva
fase di trimestralizzazione. Con tale termine si intende il processo di scomposizione di un dato annuale
in serie storica nei quattro trimestri corrispondenti, sottostante il vincolo che la somma (o la media,
nel caso di variabili indice) rispetti il dato annuale di partenza. Come vedremo, esistono varie tecniche
statistiche proposte per risolvere il problema in maniera rigorosa. La tecnica attualmente in uso in CN
è quella proposta da Chow e Lin (1971) nella versione modificata di Barbone et al. (1981).
Brevemente, qui conviene ricordare che la scomposizione è basata su un modello di regressione annuale tra
indicato ed indicatore. I parametri della relazione annuale sono utilizzati per calcolare i valori trimestrali
inosservati che, attraverso opportune formulazioni algebriche, hanno la proprietà di rispettare il vincolo
annuale fornito dall’aggregato.
È bene ricordare che la disponibilità di due versioni dei dati annuali di riferimento comporta la necessità
24
di effettuare trimestralizzazioni separate per i dati corretti (sezione di destra) e per i dati non corretti
(sezione di sinistra). Ciascuna versione trimestrale sarà ovviamente consistente con l’aggegato annuale
di riferimento.
Questo articolato processo produttivo fornisce in output le seguenti serie storiche:
• aggregato trimestrale grezzo
• aggregato trimestrale grezzo corretto
• aggregato trimestrale destagionalizzato corretto
• aggregato trimestrale destagionalizzato non corretto
6.2
La scelta degli indicatori congiunturali
La scelta degli indicatori congiunturali per gli aggregati di contabilità nazionale può essere suddivisa
in due stadi. Nel primo stadio si ricerca nell’insieme di informazioni congiunturali disponibile quali
indicatori risultano più appropriati sulla base di valutazioni di tipo qualitativo. Nel secondo stadio si
utilizzano strumenti di natura quantitativa per valutare l’efficacia degli indicatori scelti nel prevedere i
movimenti dell’aggregato.
Il concetto economico rilevato dall’indicatore deve essere il più possibile vicino a quello sul quale è basata
la stima dall’aggregato a livello annuale. Gli indici della produzione industriale per branca di attività
economica, ad esempio, rappresentano un ottimo indicatore del concetto economico di produzione nella
CN, pur non cogliendo appieno l’attività produttiva delle piccolissime imprese. Quello che un indicatore
deve cogliere è la dinamica dell’aggregato, non il suo livello assoluto.
Un indicatore deve essere tempestivo, ovvero deve essere disponibile nel più breve tempo possibile dalla
fine del periodo di riferimento. Questa proprietà è divenuta via via sempre più importante, a causa della
continua riduzione dei tempi di rilascio stabilita in ambito comunitario da Eurostat.
Contrapposta alla proprietà di tempestività vi è quella di affidabilità. Di solito, infatti, esiste un tradeoff fra di esse; un’informazione più tempestiva risulta quasi sempre meno affidabile, e viceversa. È
quindi molto importante avere una misura delle revisioni subite dall’indicatore tra release successive e
comprendere le ragioni che le determinano.
Un altro fattore fondamentale nell’approccio di tipo indiretto è la disponibilità di serie storiche piuttosto
lunghe. Le tecniche statistiche utilizzate per la destagionalizzazione e la trimestralizzazione sono in genere
tanto più efficienti quanto maggiore è il numero di osservazioni sul quale vengono basate le stime. Nel
caso di indicatori disponibili per un periodo limitato si ricorre molto spesso a ricostruzioni all’indietro per
mezzo di indicatori correlati: tale procedura non è esente da inconvenienti, in particolare per le difficoltà
di ricostruzione della componente stagionale.
Un’altra proprietà importante è la regolarità nella pubblicazione dell’indicatore. La disponibilità dell’indicatore in corso d’anno deve essere uniforme; nel caso in cui l’informazione non fosse sempre disponibile
si dovrebbe ricorrere maggiormente a tecniche di previsione che aumentano il grado di imprecisione delle
stime di contabilità trimestrale. È importante poi valutare con attenzione l’autorevolezza e l’affidabilità
dell’ente produttore dell’indicatore. Esso deve essere riconosciuto dalla comunità scientifica e deve accompagnare le statistiche prodotte da esaurienti note metodologiche.
25
A queste considerazioni di carattere qualitativo si affiancano strumenti quantitativi che mirano a misurare
il grado di accuratezza con il quale l’indicatore riproduce la dinamica di un aggregato. Il primo step è
quello di confrontare graficamente l’andamento dell’aggregato e dell’indicatore a livello annuale. A tal
fine sarà utile confrontare sia i livelli sia i tassi di crescita delle serie. Questi ultimi, tuttavia, forniscono
in genere informazioni più accurate sulla relazione dinamica esistente, che è poi quella che maggiormente
interessa gli utilizzatori dei conti economici trimestrali.
Una misura rappresentativa della relazione tra aggregato ed indicatore è il coefficiente di correlazione
lineare tra i tassi di crescita annuali. Questa misura è importante anche perché si presta alla comparazione
dell’accostamento di più indicatori rispetto ad uno stesso aggregato.
È auspicabile che la relazione fra aggregato ed indicatore sia di tipo diretto, ovvero il coefficiente di
correlazione deve risultare maggiore di zero. Le relazioni inverse, pur se stabili ed efficaci, sono sempre
controintuitive e di difficile interpretazione. Sono quindi da evitare, ad esempio, relazioni tra tasso
d’interesse ed investimento o fra tasso di cambio ed esportazioni.
Il concetto statistico di causalità è quello che in maniera migliore esprime il tipo di relazione che dovrebbe
esistere tra aggregato ed indicatore. Una variabile si dice causale nei confronti di un’altra variabile quando
ne migliora la prevedibilità. Il test di non-causalità proposto da Granger rappresenta lo strumento
statistico più idoneo per valutare la relazione di causalità tra aggregato ed indicatore (Hendry, 1995,
Hamilton, 1994);
Infine, è a volte utile considerare il concetto di cointegrazione. Due variabili non stazionarie si dicono
cointegrate quando esiste una loro combinazione lineare stazionaria. In altri termini, ciò vuol dire che la
relazione di lungo periodo esistente fra di esse è stabile e non è determinata da una comune dipendenza
da un fattore esterno. In presenza di serie abbastanza lunghe un test di cointegrazione à la Johansen
può rafforzare (o indebolire, nel caso venga rifiutata l’ipotesi di cointegrazione) l’accostamento empirico
riscontrato tra aggregato ed indicatore (Hamilton, 1994). I test di cointegrazione sono oramai integrati
in tutti i software econometrici.
7
7.1
Metodi di trimestralizzazione
La disaggregazione temporale delle serie di contabilità nazionale
La stima degli aggregati annuali di contabilità nazionale è basata su fonti statistiche osservate a frequenze
normalmente pari o superiori all’anno. Indicando con j la frequenza di osservazione per anno (j = 1
nel caso di informazioni annuali, j = 4 nel caso di dati trimestrali, etc), possiamo definire Ωjt come
l’insieme delle informazioni disponibili al tempo t con frequenza j . Una stima di un qualsiasi aggregato
di contabilità nazionale yt per un generico anno t può essere espressa come valore atteso condizionato
ybt = E(yt | Ωjt , j ≥ 1).
(1)
L’operatore valore atteso condizionato va qui inteso come l’insieme delle metodologie utilizzate dal ricercatore per ottenere una stima non distorta ed efficiente di yt sulla base di Ωjt per ogni j. Quando Ω1t 6= ∅ e
Ωjt = ∅, j > 1 (ovvero nel caso in cui vi siano informazioni disponibili esclusivamente a livello annuale), la
stima ybt non può essere calcolata a frequenze superiori seguendo le stesse metodologie implicite nella (1).
Vi è quindi l’esigenza di integrare l’informazione mancante attraverso l’utilizzo di appropriate tecniche
statistiche, denominate tecniche di disaggregazione temporale.
26
In questa nota si farà riferimento esclusivamente alla trimestralizzazione di flussi disponibili a livello annuale. È importante tuttavia puntualizzare che una metodologia di disaggregazione temporale può essere
formulata per varie cadenze temporali (da annuale a mensile, da mensile a giornaliero, da semestrale a
trimestrale). Inoltre, attraverso opportune modifiche delle formule si possono risolvere i casi di interpolazione di variabili stock, oppure di distribuzione di grandezze osservate in un solo periodo dell’anno
(tipo consistenze di inizio o fine periodo).
Una classificazione dei metodi di trimestralizzazione che può tornare utile in fase di descrizione delle loro
caratteristiche essenziali distingue tra (Di Fonzo, 1987)
• metodi di trimestralizzazione senza indicatori di riferimento;
• metodi di aggiustamento (o a 2 stadi);
• metodi di trimestralizzazione BLUE (o ottimali).
Nel paragrafo che segue il problema viene rappresentato in forma matriciale al fine di facilitare la successiva illustrazione degli ultimi due approcci elencati, mentre si rimanda alla letteratura citata per una
trattazione sul primo approccio.
7.2
Rappresentazione matriciale del problema della trimestralizzazione
Definiamo y0t come il valore assunto dall’aggregato y nell’anno t. Gli ignoti valori trimestrali yt,q , con
q = 1, . . . , 4, sono legati tra loro dal vincolo lineare di aggregazione temporale
y0t = yt,1 + yt,2 + yt,3 + yt,4 .
Si ipotizza di avere a disposizione i valori y0t per t = 1, . . . , T e, per comodità di esposizione la serie
trimestrale yt,q sarà indicata con yi , per i = 1, . . . , 4T.
Uno strumento molto utile per lo sviluppo delle espressioni matriciali che ci accingiamo a presentare è
la matrice di aggregazione temporale C:

1 1
 0 0
C=
 . .
0 0
1 1
0 0
. .
0 0
0
0
.
0
··· 0
··· 0
··· .
··· 0
0 0
0 0
. .
1 1

0 0
0 0 
.
. . 
1 1
La matrice C, di dimensione (T × 4T ), effettua una trasformazione da variabili trimestrali con 4T
osservazioni a variabili annuali con T osservazioni ottenute come somma delle prime. Nel caso in cui vi
siano trimestri da estrapolare la matrice C si modifica semplicemente aggiungendo tante colonne di zeri
quanti sono i trimestri da prevedere. In notazione matriciale il vincolo di aggregazione temporale può
dunque essere espresso come Cy = y0 , dove y è il vettore11 degli ignoti valori trimestrali che si desidera
stimare e y0 è il vettore (T × 1) dei valri annuali.
11 Il
vettore y ha un numero di righe pari al numero di colonne della matrice C.
27
Le formule di trimestralizzazione che si basano su indicatori di riferimento hanno delle espressioni molto
simili tra loro. Esse possono essere generalizzate nel seguente modo
b = Xβ + L(y0 − CXβ)
y
b si indica il vettore colonna di dimensione 4T contenente la serie trimestrale stimata, X è la
dove con y
matrice (4T × K) degli indicatori trimestrali e β è un vettore colonna di dimensione K che contiene i
parametri di collegamento fra X e y. La matrice L (4T × T ) è detta matrice di lisciamento degli errori:
da essa dipende il modo con il quale i residui annuali sono ripartiti nei corrispondenti trimestri. Essa, a
sua volta, dipende in modo cruciale dalla matrice di covarianza (V) dei termini di disturbo del modello
di regressione assunto come riferimento per la procedura di disaggregazione temporale. Come vedremo,
i metodi di trimestralizzazione che si basano su indicatori di riferimento differiscono tra loro in base al
diverso modo di determinare/imporre V e, quindi, L.
Metodi di aggiustamento
Il metodo di Denton (1971)
Denton (1971) ha proposto una metodologia per aggiustare serie mensili (o trimestrali) dato un vincolo
a livello annuale. Il metodo prevede l’esistenza di una serie preliminare pi molto vicina all’ignota serie
trimestrale yi , che tuttavia non quadra con i totali annui y0t . L’idea sottostante è quella di distribuire le
differenze rispetto ai dati annuali senza introdurre discontinuità nella serie aggiustata. Una stima delle
yi si ottiene minimizzando la seguente funzione di perdita quadratica
(y − p)′ M(y − p)
(2)
dato il vincolo Cy = y0 . La matrice M ha dimensione (4T × 4T ) e dipende dal tipo di funzione obiettivo
che si intende minimizzare. Se M = I, si ha un problema di minimizzazione della somma dei quadrati
delle distanze tra y0 e Cp = p0 . In questo caso la soluzione sarebbe banale: le discrepanze annuali
(y0 -p0 ) verrebbero distribuite in parti uguali nei corrispondenti trimestri. Ciò comporterebbe con ogni
probabilità la presenza di salti di serie fra il quarto trimestre di un anno e il primo trimestre di quello
successivo.
Denton fornisce una versione di M definendo una funzione di penalità basata sulle differenze fra le
differenze prime della serie aggiustata y e della serie preliminare p. La funzione considerata si esprime
come
4T
X
[(yi − yi−1 ) − (pi − pi−1 )]2
(3)
i=1
in cui si assume (y0 − p0 ) = 0.
La formula (3) si può esprimere secondo la (2) mediante

1
0 0 0
−1 1 0 0


D =  0 −1 1 0

 .
.
. .
0
0 0 0
28
la matrice (4T × 4T )

. 0 0
. 0 0


. 0 0 .

.
.
.
. −1 1
Infatti, considerando l’espressione


D(y − p) = 

y 1 − p1
(y2 − y1 ) − (p2 − p1 )




(y4T − y4T −1 ) − (p4T − p4T −1 )
la (2) si può riscrivere come
′
(y − p) D′ D(y − p).
In questo caso si avrebbe M = D′ D. La formula di disaggregazione che ne risulta è
−1
b = p + (D′ D)−1 C′ C(D′ D)−1 C′
y
(y0 − Cp).
Il pregio del metodo di Denton è la sua semplicità computazionale; lo svantaggio è quello di essere un
metodo meccanico che non tiene conto della relazione esistente tra y e p. Si noti come la matrice di
lisciamento L sia basata su C e D, matrici note a priori a prescindere dal problema di aggiustamento
considerato. In generale, il metodo di Denton è applicato in contabilità per ripartire piccole discrepanze
rispetto al dato annuo che possono risultare in alcune fasi di stima dei conti trimestrali (ad esempio la
doppia deflazione per gli aggregati dell’offerta o la destagionalizzazione degli indicatori)12 .
Metodi di trimestralizzazione BLUE
Il metodo di Chow e Lin
Il metodo di disaggregazione temporale proposto da Chow e Lin (1971) è quello sul quale si fonda
la procedura di trimestralizzazione degli aggregati annuali utilizzata in contabilità nazionale. Esso è
stato definito metodo ottimale (Di Fonzo, 1987), nel senso che risolve il problema della disaggregazione
temporale nell’ambito dei modelli di regressione lineare mediante stime BLUE (Best Linear Unbiased
Estimator ) dei coefficienti. Tali metodi si distinguono da quelli di aggiustamento, che prevedono una
stima preliminare ed un aggiustamento successivo della stima al vincolo annuo (metodi two-steps). Chow
e Lin ipotizzano il seguente modello di regressione a livello trimestrale
y = Xβ + u,
(4)
con y il vettore contenente le osservazioni trimestrali dell’aggregato di dimensione (T × 1), X la matrice
(T × K) di K regressori considerati, β il vettore colonna dei K coefficienti di regressione e u il vettore
contenente i residui del modello di regressione. La matrice X contiene generalmente l’indicatore (o gli
indicatori) ed il termine costante. Si ipotizza poi una matrice di varianza-covarianza dei disturbi
E(uu′ |X) = V
diversa dalla matrice identità tipicamente utilizzata nel modello di regressione ordinario. Una stima
ottimale dell’equazione (4) potrebbe quindi essere ottenuta attraverso lo stimatore dei minimi quadrati
generalizzati (GLS). Tuttavia, la relazione è impossibile da stimare a livello trimestrale in quanto non si
conoscono i valori della y né la forma della matrice V.
12 La
procedura che implementa il metodo di Denton in Modeleasy si chiama DENTNEW.
29
Si trasforma quindi la relazione (4) ipotizzata a livello trimestrale in una a livello annuale, mediante
l’utilizzo della matrice di aggregazione temporale C. Pre-moltiplicando la (4) per la matrice C si ottiene
Cy= CXβ + Cu
y0 = X0 β + u0
(5)
con matrice di varianza-covarianza
E(Cuu′ C′ |X) = CVC′ = V0 .
L’applicazione della matrice C implica una riduzione della dimensionalità dei vettori e delle matrici coinvolte nell’equazione di regressione: da 4T osservazioni trimestrali si hanno ora T osservazioni aggregate a
livello annuale. L’equazione (5) contiene adesso elementi noti (y0 e X0 ), ma rimane ancora indefinita la
matrice di varianza-covarianza dei disturbi V. Ipotizzando di conoscerne la forma, la stima trimestrale
della serie y si può ottenere mediante l’espressione
dove
b
b= Xβb + VC′ V0 −1 (y0 − X0 β)
y
(6)
βb = (X′0 V0 −1 X0 )−1 X′0 V0 −1 y0
è il vettore (K ×1) contenente i coefficienti di regressione stimati secondo il metodo GLS a livello annuale.
b si ottiene applicando i coefficienti stimati
La formula (6) è composta da due addendi. Il primo (Xβ)
b serve a distribuire i
della relazione annuale agli indicatori trimestrali. Il secondo, VC′ V0 −1 (y0 − X0 β),
b
b
residui annuali u0 = (y0 − X0 β) nei vari trimestri dell’anno secondo la matrice di lisciamento VC′ V0 −1
in modo che sia rispettato il vincolo annuo Cy = y0 .
Nel caso di una relazione perfetta (o quasi) fra aggregato ed indicatore, l’importanza del secondo termine
b 0 a livello annuale, maggiore è l’importanza
della (6) è trascurabile. Maggiori sono i residui stimati u
del secondo termine nella determinazione dei valori ignoti trimestrali y. Questo termine, come detto,
si basa sulla forma assunta dalla matrice di varianza-covarianza dei disturbi a livello trimestrale V. A
tal riguardo sono state proposte diverse soluzioni, tutte valide in linea di principio ma che rispondono a
criteri euristici piuttosto che a ragionamenti teorici sul comportamento del fenomeno a livello trimestrale.
La soluzione proposta da Chow e Lin si basa sull’ipotesi che i residui trimestrali ui seguano un processo
autoregressivo del primo ordine del tipo
ui = ρui−1 + εi , per i = 1, ..., 4T
(7)
con |ρ| < 1 ed εi un processo white noise con varianza σε2 . Tale ipotesi consente una ripartizione delle
discrepanze annuali più regolare rispetto a quella che si avrebbe ipotizzando i disturbi di tipo white noise
(caso ρ = 0). In sostanza si evita la presenza di uno scalino tra il quarto trimestre di un anno ed il primo
trimestre di quello successivo, quando i residui dei due anni consecutivi sono molto differenti tra loro.
La forma della matrice V per il processo (14) è simmetrica e si scrive come


1
ρ
ρ2
. ρ4T −1
 ρ
1
ρ
. ρ4T −2 

σε2 
 2

V=
ρ
1
. ρ4T −3  .
 ρ
2

1−ρ 
 .
.
.
.
. 
ρ4T −1 ρ4T −2 ρ4T −3 .
1
30
Quindi, per ottenere una stima della matrice V si deve disporre di una stima dei parametri ρ e σε2 : a questo
scopo si può utilizzare il metodo della massima verosimiglianza (Bournay e Laroque, 1979, Di Fonzo,
1987). Ricordiamo che, assumendo la normalità dei disturbi trimestrali, la funzione di verosimiglianza
può scriversi come
L(V0 , β, σε2 ) =
1
2πσε2
T /2
−1/2
|V0 |


exp

−(y0 − X0 β)′ V0−1 (y0 − X0 β)

2σε2
(8)
I valori dei parametri che massimizzano la (8) possono essere trovati usando opportune procedure numeriche, sulle quali si tornerà tra breve.
Va peraltro ricordato che (i) Chow e Lin (1971) propongono una tecnica di stima diversa, essenzialmente
fondata sul legame funzionale esistente tra ρ e l’autocorrelazione di ordine 1 dei disturbi aggregati
temporalmente13 , e (ii) la tecnica attualmente implementata in contabilità nazionale per la stima dei
conti trimestrali differisce tanto da quella originariamente proposta da Chow-Lin, quanto dalla massima
verosimiglianza. Su questi aspetti ci soffermiamo nel prossimo paragrafo.
8
La procedura di trimestralizzazione impiegata dall’ISTAT
La procedura di trimestralizzazione attualmente utilizzata in contabilità nazionale si basa sulla versione
del metodo Chow-Lin proposta da Barbone et al. (1981). Le differenze introdotte rispetto alla versione
originale sono due e riguardano la stima del parametro autoregressivo ρ e l’estrapolazione dei valori in
corso d’anno.
Per quanto riguarda il primo aspetto, Barbone et al. (1981) propongono di basare la stima dei parametri
del modello di regressione sulla minimizzazione della somma ponderata dei quadrati dei residui (SSR)
SSR = (y0 − x0 β)′ V0−1 (y0 − x0 β)
(9)
che in pratica coincide con l’argomento dell’esponenziale nella (8). A rigore, quindi, la stima dei parametri
non è di massima verosimiglianza, ma piuttosto una stima dei minimi quadrati generalizzati stimati.
Quanto alla procedura numerica di stima, Barbone et al. (1981) propongono di adottare una tecnica di
scanning: sotto l’ipotesi di stazionarietà del processo autoregressivo (14), si ipotizza una griglia di valori
per il parametro ρ all’interno dell’intervallo [−0.99, +0.99]. Si calcola poi SSR utilizzando le stime delle
b 0 , β,
bσ
matrici (V
bε2 ) associate a ciascun valore di ρ considerato. Il valore del ρ per il quale risulta minima
la quantità SSR è la stima finale utilizzata per il calcolo della matrice V. Ovviamente la medesima
procedura numerica può essere adottata per ottenere le stime di massima verosimiglianza dei parametri
di interesse.
Un problema che può sorgere in fase di stima è rappresentato dalle soluzioni ‘ai limiti’ della regione di
stazionarietà, ossia ρ = −.99 oppure ρ = .99. Il primo caso, e più in generale ogni volta che la stima
di ρ è negativa, può presentare parecchi problemi pratici, in quanto la ripartizione dei residui annuali
può risultare caratterizzata da pronunciate alternanze di segno, con possibili, inopportune fluttuazioni
13 Tale relazione, biunivoca per −1 < ρ < 1 nel caso di aggregazione trimestrale di dati mensili, non lo è più nel caso di
nostro specifico interesse, in cui i disturbi trimestrali vengono aggregati annualmente. In questo caso, infatti, la biunivocità
resta valida solo per valori positivi di ρ, rendendo di fatto problematica l’applicazione della proposta originale di Chow e
Lin. Su questa questione si veda Bournay e Laroque (1979) e Di Fonzo (1987).
31
dei valori trimestrali stimati. Il secondo caso, invece, pur dando luogo in genere a serie trimestrali ‘ragionevoli’ (quanto meno prive, cioè, di andamenti altamente irregolari e/o di fluttuazioni irrealistiche), va
interpretato come un segnale di difficoltà del modello statistico utilizzato a cogliere appieno le dinamiche
(spesso non stazionarie) delle variabili in gioco.
In base alla nostra esperienza, abbiamo notato come la minimizzazione di SSR sia, in generale, una
procedura più robusta rispetto alla massimizzazione della verosimiglianza. Spesso la stima di massima
verosimiglianza di ρ che si ottiene si trova proprio al limite dell’intervallo di valori prefissato (0.99 o -0.99);
in questi casi è probabile che aumentando oltre l’intervallo [-0.99, 0.99] il valore di ρ la verosimiglianza
cresca ancora, magari indefinitamente.
La seconda differenza dell’approccio di Barbone et al. (1981) consiste, come detto, nella formula di
estrapolazione dei trimestri in corso d’anno. Il problema dell’estrapolazione secondo il metodo ChowLin si risolve semplicemente aggiungendo un numero di colonne di zeri alla matrice di aggregazione C
pari al numero delle previsioni che si vuole ottenere ed applicando la formula di interpolazione (6). Si
può facilmente dimostrare come la previsione dipende dall’effetto dell’indicatore (Xβ) e da un effetto
b 0 per mezzo dalla matrice di lisciamento L.
di correzione basato sull’intero vettore dei residui annuali u
Barbone et al. (1981) derivano la seguente formula di estrapolazione ricorsiva
b
yb4T +j = x′4T +j β+
ρbj ρb3 (b
u0,T )
per j = 1, 2, . . . .
1 + ρb + ρb2 + ρb3
(10)
Come si vede, la formula (10) considera esclusivamente il residuo stimato per l’ultimo anno disponibile;
in tal senso la previsione non è ottimale in quanto non considera in modo completo l’informazione a
disposizione.
La procedura di interpolazione proposta da Barbone et al. (1981) è implementata in Modeleasy attraverso
la subroutine DISAGGR. La subroutine TRIME, invece, richiama al suo interno DISAGGR ed effettua le
seguenti operazioni :
• standardizzazione delle variabili;
• estrapolazione dei valori in corso d’anno;
• perequazione dell’effetto delle dummies;
• stampa degli indicatori di qualità della trimestralizzazione.
La standardizzazione effettuata in TRIME serve a trasformare la variabile indicatore nella dimensione
dell’aggregato. L’effetto di questa operazione è quello di ottenere un coefficiente di regressione standardizzato, che non dipende quindi dalle diverse unità di misura delle due variabili coinvolte. La formula di
standardizzazione utilizzata è la seguente
x∗ =
x−x
σ
bx
σ
by0 + y0
dove con (x, σ
bx ) e (y 0 , σ
by0 ) indichiamo la media e la deviazione standard campionaria dell’indicatore
trimestrale x e dell’aggregato annuale y0 .
32
Abbiamo già illustrato la formula (10) che consente di estrapolare i valori in corso d’anno senza vincoli
temporali. La perequazione dell’effetto delle variabili dummy serve a ridurre la presenza di scalini tra
trimestri contigui di anni differenti dovuti all’utilizzo di step o trend dummy. La serie trimestralizzata
viene passata al programma X11 (subroutine x11arima), dal quale si preleva in output una serie più
liscia ottenuta mediante l’applicazione di un sistema di medie mobili. Questa serie mantiene il segnale di
tendenza di quella originale ma i movimenti con forte erraticità risultano attenuati. Nei punti di ingresso
e di uscita dalle dummy immesse il valore ottenuto in trimestralizzazione viene sostituito da quello della
serie perequata da X11. Chiaramente dopo questa operazione alcuni vincoli temporali non sono più
rispettati, per cui vi è il problema di riquadrare nuovamente la serie risultante con il vincolo annuale.
La riquadratura è effettuata attraverso il metodo di Denton. Nell’appendice verrà fornita una breve
presentazione degli input e degli output della procedura TRIME, assieme ad una illustrazione sommaria
delle procedure informatiche attualmente in uso nei conti trimestrali.
33
Riferimenti bibliografici
Barbone L., G. Bodo e I. Visco (1981), Costi e profitti in senso stretto: un’analisi su serie trimestrali,
Bollettino della Banca d’Italia, 36: 465-510.
Boot J.C.G., W. Feibes and J.H.C. Lisman (1967), Further methods of derivation of quarterly figures
from annual data, Cahiers Economiques de Bruxelles, 36: 539-546.
Bournay J. and G. Laroque (1979), Réflexions sur la méthode d’élaboration des comptes trimestriels,
Annales de l’INSEE, 36: 3-30.
Chow G. and A.L. Lin (1971), Best linear unbiased interpolation, distribution and extrapolation of time
series by related series, The Review of Economics and Statistics, 53: 372-375.
Denton F.T. (1971), Adjustment of monthly or quarterly series to annual totals: An approach based on
quadratic minimization, Journal of the American Statistical Association, 66: 99-102.
Di Fonzo T. (1987), La stima indiretta di serie economiche trimestrali, Padova, Cleup.
Hamilton J.D. (1994), Time series analysis, Princeton, Princeton University Press.
Hendry D. (1995), Dynamic econometrics, Oxford, Oxford University Press.
34
Appendice A. L’organizzazione dei programmi deck
L’esperienza maturata nell’ambito dei conti trimestrali suggerisce che organizzare i programmi secondo
uno schema standard offre diversi vantaggi. Tra questi vanno sicuramente menzionati: la possibilità
di essere eseguiti anche da persone diverse dal responsabile; il più facile passaggio di consegne in fase
di avvicendamento del personale; la possibilità di collaborazione tra colleghi; la possibilità di usufruire
dell’esperienza maturata da altri.
Proponiamo pertanto qui di seguito uno schema standard seguito in contabilità trimestrale (vedi figura
2).
Figura 2: Organizzazione dei deck
Tale schema prevede la creazione di un deck principale (solitamente chiamato main) dal quale è possibile richiamare le diverse fasi. Tra queste, si veda ancora la figura 2, devono annoverarsi almeno:
l’aggiornamento dei dati, la correzione per gli effetti di calendario, la destagionalizzazione, la trimestralizzazione, l’analisi dei risultati e il backup dei dati. L’organizzazione del programma secondo un insieme
di fasi parcellizzate permette una gestione flessibile del programma oltre ad un maggior controllo. Poiché
in ciascuna fase sarà necessario richiamare sia programmi sia dati precedentemente salvati, è indispensabile far precedere all’esecuzione del main la definizione delle librerie.
Librerie
Il caricamento delle librerie è un’operazione preliminare all’esecuzione dei programmi. Per “librerie” si
intende qui un deck contenente tutti i riferimenti alle directory che saranno utilizzate. Tra queste vanno
senz’altro incluse:
mykeep: è la directory in cui vengono salvati i file in formato keep, vale a dire tutto ciò che viene
elaborato dal programma. Di solito è la directory dati;
mykeepde: è la directory in cui vengono salvati i file in formato deck, cioé i programmi. Di solito è la
directory prog;
35
coordtri: è la directory contenente le procedure di trimestralizzazione, correzione dei giorni lavorativi,
e altri utili strumenti preparati dalla contabilità trimestrale;
A queste si aggiungerà ogni altra libreria necessaria all’esecuzione del programma.
Le righe che seguono mostrano un esempio di libreria:
program
library(mykeep "/home/mamarini/qna/dati")
library(mykeepde "/home/mamarini/qna/prog")
library(coordtri "/istat/depretis/dcna/trim/coordtri/prog")
library(datiqna "/istat/depretis/dcna/trim/assembla/corretti/dati")
library(graph "/home/mamarini/speakez/@seats.tmp/graph/series")
library(outtramo "/home/mamarini/speakez/@seats.tmp/output")
end
La prima riga deve sempre riportare la dicitura program, e l’ultima end.
Tutte le righe comprese tra la seconda e la penultima inizieranno con il comando library. Questo si
compone di un primo argomento, ad esempio mykeep, che rappresenta il “nomignolo” della directory
che si intende identificare. Segue prima uno spazio e poi, racchiuso tra virgolette, il percorso completo
che porta a puntare la directory desiderata. Le directory cosı̀ identificate potranno essere richiamate nei
programmi con il comando lib. Pertanto, volendo richiamare una subroutine presente nella directory
del coordinamento dei trimestrali si scriverà un’istruzione del tipo “getdeck aggrsum lib coordtri”,
dove coordtri é l’etichetta che era stata assegnata alla directory nel comando library.
Le directory identificate con mykeep e mykeepde costituiscono un’eccezione nel senso che vengono utilizzate da Modeleasy senza bisogno di usare il comando lib. Nel momento in cui si salva una matrice di dati
con il comando keep o si carica una matrice con il comendo get, Modeleasy utilizza automaticamente
la directory identificata come mykeep. In modo analogo, se si carica un deck, il comando getdeck non
seguito da alcuna specificazione relativa alla directory da utilizzare, sarà gestito da Modeleasy come se
fosse seguito da lib mykeepde.
Main
Il main è il programma principale dal quale è possibile richiamare tutte le fasi del processo di produzione.
Si compone essenzialmente di tre parti:
Aggiormamento date Nella prima parte del main viene di solito incluso l’aggiornamento delle date.
In sostanza si definiscono delle variabili globali necessarie nell’esecuzione delle fasi successive del programma. Risulta utile definire innanzitutto il trimestre che si sta elaborando (ad es. 2004Q1). Sulla base
di tale informazione è possibile calcolare il numero di osservazioni che devono comporre le serie. Se le
36
elaborazioni partono dal 1970Q1, le matrici devono avere, nel primo trimestre del 2004, 137 osservazioni.
Vedremo nel seguito l’utilità di tale informazione. Per il momento ci preme evidenziare che vengono
adottati due diversi metodi per effettuare l’aggiornamento delle date.
Alcuni preferiscono inserire all’inizio del main delle righe che prevedono una serie di domande. Eccone
un esempio:
ask ("ULTIMO ANNO (AAAA)","DATE1=")
ask ("ULTIMO TRIMESTRE (1-4)","DATE2=")
fine=(DATE1-1970)*4+DATE2
Nella prima riga si domanda quale sia l’anno in corso, ad esempio il 2004. Nella seconda si domanda
quale sia il trimestre, ad esempio 1. Con queste due informazioni, nella terza riga si calcola il numero
delle osservazioni a partire dal primo trimestre del 1970. Nell’esempio si avrà (2004-1970)*4+1=137.
Questo modo di programmare garantisce che le date vengano aggiornate ogni volta che si lancia il main
ma presenta pure l’inconveniente di dover ripetere più volte l’operazione d’aggiornamento.
Un modo alternativo è quello di avere un deck che contiene le informazioni sulle date. Di solito questo
file viene chiamato agdate, e si compone delle seguenti righe:
data
1970 1 2004 1
end
In questo caso il deck contiene dei dati anziché un programma. Nella prima riga si scriverà pertanto
data. Per richiamare tale deck si usa l’istruzione:
getdeck (agdate lib dati);dat=(4:);loaddata(dat,agdate)
ovviamente se il file agdate si trova nella directory dati.
La terza riga del precedente esempio dovrà essere sostituita da:
fine=(dat(3)-dat(1))*4+dat(4).
Nell’uno come nell’altro caso è possibile includere tante informazioni quante se ne ritengano necessarie.
Ad esempio, esistono programmi in cui alcune serie hanno diversa data d’inizio.
Menù delle scelte Dopo avere effettuato l’aggiornamento delle date, il main dovrebbe contenere un
menù delle scelte, nel quale vengono elencate le diverse fasi del processo di produzione dei dati. Nello
schema che qui presentiamo, il main prevede la presentazione delle diverse fasi, lasciando poi all’operatore
la possibilità di scegliere quale di queste eseguire (figura 3). La parte del programma denominata
“chiedi” stampa a schermo l’elenco delle diverse fasi. Segue poi la richiesta “askname("quale fase
vuoi svolgere?","ris=")”. La variabile ris potrà assumere valori compresi tra 1 e 8. Avendo incluso
l’istruzione “if(ris.gt.’8’) goto chiedi” laddove per errore venisse digitato un valore superiore a
8, verrà riproposto il menù delle scelte. Per ris=8 si va direttamente alla fine del programma e dunque
all’uscita. A ciascuno dei restanti valori di ris (da 1 a 6) corrisponde una diversa parte del programma
(f1, f2, ecc) in cui viene richiamato ed eseguito il relativo deck.
37
Esecuzione dei programmi Subito dopo il menù delle scelte vengono inserite delle righe che associano
alla risposta fornita (la variabile ris) una diversa porzione del main. Ad esempio, se si vuole eseguire
la fase 1, aggiornamento dei dati, si risponderà con 1 alla domanda della fase precedente. La riga
di comando “if(ris.eq.’1’) goto f1” rimanda alla porzione denominata f1. Qui viene caricato ed
eseguito il deck di aggiornamento. Lo stesso vale per tutte le altre fasi.
Esecuzione automatica e manuale Per eseguire l’insieme delle operazioni incluse nel main sono
possibili due diverse opzioni. Da un lato si può procede manualmente a richiamare le diverse fasi che
verranno pertanto svolte una alla volta. Questo modo di procedere permette di analizzare i risultati di
ciascuna fase prima di procedere alla successiva. Altro modo è invece quello di programmare l’esecuzione
automatica di tutte le fasi. Questo secondo approccio ha il pregio della velocità.
Le due opzioni possono essere considerate complementari anziché alternative. Nelle diverse fasi del
processo di produzione dei dati può risultare a volte utile la prima, in certe altre la seconda.
Figura 3: Main deck: esempio
Nell’esempio riportato nella Figura 3, si mostra come programmare le due opzioni in un unico deck, in
modo da poter scegliere l’una o l’altra a seconda delle esigenze.
38
Si costruisce una stringa di comodo, qui denominata seq, che assume di default il carattere ‘m’. In questo
modo al termine dell’esecuzione di una qualsiasi fase di quelle elencate nel menù delle scelte si incontrerà
la riga “if(seq.eq.’m’) goto chiedi”. Dato che il valore di seq è pari a ‘m’, il programma ritornerà
al menù delle scelte.
L’opzione 7 permette la modifica di seq. Allorché questa viene richiamata si passa all’esecuzione della
parte di programma denominata f7. Qui si domanda quale opzione si preferisce: “askname("Digita
(m) per manuale oppure (s) per sequenziale","seq=")“. Notare che nella riga successiva è stato
inserito un elemento di controllo in modo che se la risposta fornita non risulta essere né manuale ‘m’ né
sequenziale ‘s’, la fase f7 viene richiamata e la stessa domanda riproposta finché non si digiterà ‘m’ o ‘s’.
Una volta modificata seq, viene riproposto il menù delle scelte.
Se seq è stata posta pari a ‘s’, l’esecuzione di una qualsiasi fase elencata nel menù delle scelte comporta anche l’esecuzione di tutte le fasi successive. Se ad esempio si scegliesse la fase f2, al termine
dell’esecuzione di questa si incontrerebbe la riga di comando “if(seq.eq.’m’) goto chiedi”: essendo
ora seq pari a ‘s’ il “goto chiedi” viene ignorato e la successiva fase 3 eseguita.
Altre osservazioni Il richiamo alle diverse fasi (destagionalizzazione, trimestralizzazione, ecc) può
anch’essa essere organizzata attraverso un suo main. Si userà in quel caso lo stesso tipo di organizzazione
che si è presentato qui: menù delle scelte e successive fasi. Anche in quel caso è bene prevedere la
possibilità di esecuzione in “manuale” e in “sequenziale”.
Nella fase di backup è bene includere anche un’istruzione per la costruzione e la successiva esportazione
delle serie annuali e dei relativi indicatori annualizzati, da utilizzarsi in sede di specificazione della
relazione a bassa frequenza (annuali).
Le matrici degli indicatori e degli indicati dovrebbero essere organizzate in modo uniforme. Se si sceglie
di organizzare le matrici per colonna, in modo che ciascun vettore colonna della matrice rappresenti una
variabile e le righe rappresentino il tempo, sarebbe utile fare in modo che le matrici annuali abbiano
lo stesso numero di colonne e che a ciascuna variabile annualle corrisponda il proprio indicatore nella
matrice degli indicatori. Allo stesso modo, se si costruisce una matrice degli indicatori grezzi ed una
degli indicatori destagionalizzati, sarebbe utile mantenere lo stesso ordine.
La subroutine TRIME e relativi parametri
Il comando TRIME esegue la subroutine preparata per la disaggregazione temporale. La subroutine
è archiviata nella directory coordtri. Per richiamarla sarà allora sempre necessario eseguire prima il
comando “getdeck trime lib coordtri”.
La stringa d’esecuzione di TRIME richiede, come di consueto, di specificarne gli input, gli output e le
opzioni (vedi figura 4). Vediamoli brevemente:
Input:
Y Il primo parametro è il vettore (T × 1) dei dati annuali da disaggregare. Solitamente questo comprende tutte le osservazioni dall’anno iniziale fino all’ultimo anno chiuso.
39
Figura 4: Comando TRIME
X Il secondo elemento è un’array al cui interno compare come prima colonna l’indicatore trimestrale
di riferimento, e nelle successive le variabili dummy che si ritiene necessario includere nella regressione.
Tale array avrà pertanto un numero di righe pari a quattro volte quelle del vettore Y (quattro trimestri
per anno) più il numero di trimestri dell’anno in corso, ed un numero di colonne pari al numero delle
variabili dummy più 1, (cioè più l’indicatore).
ND Il terzo valore da inserire è il numero delle variabili dummy. Risulta spesso conveniente parametrizzare tale valore utilizzando il comando nocols. Se l’array X è già stata costruita, ND sarà pari a
nocols(X)-1.
FRC FRC é il numero di anni da prevedere. Se il vettore Y é completo, nel senso che comprende
l’ultimo anno chiuso, FRC viene posto pari a zero. Nel caso invece si avesse la necessità di prevedere il
dato annuo mancante, si pone FRC pari al numero di anni da estrapolare.
ANNINIZ È l’anno d’inizio della serie y.
TIT È una stringa di testo, il titolo, che viene stampata a schermo prima della presentazione dei
risultati della disaggregazione temporale.
Output
YT L’unico output è YT cioè il vettore della serie disaggregata. Questo avrà lo stesso numero di righe
di X ed una sola colonna.
40
Opzioni
OPT Permette di specificare se l’aggragazione dell’indicatore infrannuale debba essere fatta per somma
o per media. Le due opzioni possono essere specificate con ‘SUM’ ovvero ‘MEAN’, avendo cura di far
precedere e seguire l’opzione dall’apostrofo (’).
US,LS Rappresentano rispettivamente la soglia superiore ed inferiore per la banda di “lisciamento” in
entrata ed uscita dalle dummy. I valori di default, che si consiglia di utilizzare, sono pari rispettivamente
a 1.5 e 2.5. All’interno della procedura trime è infatti contenuto un richiamo al programma X11 che, con
il ricorso alle medie mobili, evita che in corrispondenza dell’intorno del dato per il quale è stata inserita
una dummy si abbiano dei salti di serie.
OUTPUT
È un flag tale che, se fissato pari ad 1, attiva il comando PRINTER ON.
Dummy in Unix
Il comando DUMMY esegue la subroutine preparata per la costruzione delle variabili dummy. La subroutine è archiviata nella directory coordtri. Per richiamarla sarà allora sempre necessario eseguire prima
il comando ”getdeck dummy lib coordtri”.
Figura 5: dummy.eps
La stringa d’esecuzione di DUMMY richiede, come di consueto, di specificarne gli input, gli output e le
opzioni. Vediamoli brevemente:
41
Figura 6: Diversi tipi di dummy
Input DUMMY(AI,AF,TIPO,ANNINIZ,LEN,FREQ,DM)
AI é l’anno d’inizio della dummy (da zero diventa 1).
AF é l’anno finale della dummy (ultimo anno per il quale vale 1, poi 1 torna a zero).
TIPO: serve a specificare il tipo di dummy che si vuole costruire.
Si possono costruire tre tipi di variabili dummy (vedi figura 6):
’ID’=IMPULSE DUMMY
’SD’=STEP DUMMY
’TD’=TREND DUMMY
ANNINIZ Anno d’inizio della variabile dummy. Coincide con l’anno di inizio dell’indicatore.
LEN Numero di osservazioni della variabile dummy
FREQ Frequenza della dummy
Output DM é la variabile dummy in uscita
Esempio DUMMY(1977,1980,’SD’,1970,NT70,4,DM);X1(,2)=DM
if (ris .eq.
1 .or.
ris .eq.
0) then
TIT="MACCHINARI E ATTREZZATURE"
X1=(NT85,1:)
X1(,1)=vinddnt(ints(61,NT70),1)
42
DUMMY(1985,1985,’SD’,annogr,NT85,4,DM);X1(,2)=DM
DUMMY(1989,1992,’SD’,annogr,NT85,4,DM);X1(,3)=DM
DUMMY(1993,1993,’SD’,annogr,NT85,4,DM1);
DUMMY(1997,1997,’SD’,annogr,NT85,4,DM);X1(,4)=DM-DM1
a=nocols(x1)-1
TRIME (dispann(,1),X1,a,YT,0,1985,’SUM’,2.5,1.5,TIT,1)
PROQ2(,1)=YT
endif
Appendice B. Alcuni comandi utili di Modeleasy
Qui di seguito sono elencati i comandi maggiormente utilizzati nel processo di stima dei conti economici
trimestrali. Alcuni di essi sono comandi predefiniti in Modeleasy, altri richiamano delle subroutines di
comune utilizzo scritte dai vari ricercatori che negli anni hanno lavorato nell’unità dei conti trimestrali.
• AGGRSUM(y1,y2,f1,f2)
funzionalità
Aggrega temporalmente per somma la variabile y1 a frequenza f1 (es. trimestrale) nella
variabile y2 a frequenza più bassa f2 (es. annuale)
input
y1 : una o più serie a frequenza f1
f1 : frequenza maggiore (trimestrale = 4, mensile = 12)
f2 : frequenza minore (annuale = 1, trimestrale = 4)
output
y2 : serie y1 aggregate alla frequenza f2
• AGGRMEAN(y1,y2,f1,f2)
Simile ad aggrsum, l’aggregazione temporale è fatta in media
43
• DATEYYPP(start=yyyyqq,end=yyyyqq,freq=f)
funzionalità
Crea un vettore stringa contenente le labels per i periodi (mesi o trimestri) tra start ed end.
esempi
trim=DATEYYPP(start=197001,end=200312,freq=12)
mesi=DATEYYPP(start=197001,end=200304,freq=4)
• DELTAP(x,l)
funzionalità
Calcola le variazioni percentuali della serie rispetto ad l periodi di ritardo (l=1 tassi congiunturali ed l=4 tassi tendenziali nel caso di una variabile trimestrale). La funzione DELTA
funziona allo stesso modo ma fornisce le differenze assolute anzichè le variazioni percentuali
input
x : vettore di dati (mensili o trimestrali)
l : ritardo temporale da considerare per il calcolo dei tassi di crescita
• DENTNEW(x,ann,freq,opt)
funzionalità
Effettua un aggiustamento temporale della matrice di dati trimestrali (o mensili) x dove il
vincolo è dati dai valori annuali ann.
input
x : matrice di dati infra-annuali da aggiustare
ann : matrice di dati annuali
freq : frequenza dei dati infra-annuali (4=trimestrali)
opt : tipo di aggregazione, ’m’ (media) oppure ’s’ (somma)
output
44
x : matrice dei dati infra-annuali coerenti con quelli annuali (in questo caso la serie di input
corrisponde a quella di output: si raccomanda una copia di back-up prima di lanciare la
routine se si vuole mantenere una copia dei dati pre-aggiustamento)
• TRAMO(y,x:optdeck:a)
funzionalità
Effettua la destagionalizzazione della serie di input per mezzo dei programmi TRAMO-SEATS
input
y : la serie storica grezza (formato timeseries)
a : la stringa di opzioni per TRAMO-SEATS (parametro opzionale)
output
x : la serie storica destagionalizzata (formato timeseries)
fore 1 : una matrice (17×3) contenente la previsione per la serie y
• TRIME(y,x,nd,yt,frc,ai,opt,us,ls,tit,out)
funzionalità
Effettua la disaggregazione temporale della serie annuale y utilizzando l’indicatore (o gli
indicatori) x secondo il metodo Barbone, Bodo e Visco (1981)
input
y : dati annuali
x : dati trimestrali (indicatori e dummies)
nd : numero dummies utilizzate
frc : numero di anni di previsione (mai utilizzato, da fissare a 0)
ai : anno iniziale (es., 1970)
opt : tipo di aggregazione, ’mean’ (media) oppure ’sum’ (somma)
us : limite superiore per la pulizia delle dummies (parametro per X11)
ls : limite inferiore per la pulizia delle dummies (parametro per X11)
tit : stringa contenente un titolo per la trimestralizzazione
out : 1 se si vuole l’output della trimestralizzazione a video, 0 altrimenti
45
output
yt : dati trimestralizzati consistenti con i dati annuali y
• TDCHECK(y,name,parall,mod,logs,pararima,partd,pareast)
funzionalità
Controlla la significatività statistica degli effetti di calendario.
input
y : vettore di dati grezzi (formato timeseries)
name : nome della variabile (es: ’branca alimentari’)
output
parall: coefficienti e t-student dei regressori per il trading-day effect per tutti i modelli (6 x
4)
modello con un regressore: colonne 1-2 (1 coeff. + t-stud)
modello con 6 regressori: colonne 3-4 (6 coeff. + t-stud)
mod : flag del modello che minimizza l’aic (bic indicato a stampa)
0 : nessun effetto di correzione
1 : correzione secondo il modello con un regressore
6 : correzione secondo il modello con 6 regressori
pararima : modello arima (p,d,q)x(bp,bd,bq) (1 x 6)
logs : trasformazione logaritmica = 0, livelli = 1
partd : coefficienti e t-student dei k regressori per il trading day effect con effetto Pasqua (k
x 2)
pareast : coefficiente e t-student per l’effetto pasqua (1 x 2)
• TDYEAR(y,x,xc,yc)
funzionalità
Produce la stima corretta per gli effetti di calendario dell’aggregato annuale sulla base della
correzione sull’indicatore.
input
46
y : aggregato annuale (n x 1)
x : indicatore trimestrale (4n x 1)
xc : indicatore trimestrale corretto (4n x 1)
output
yc : aggregato annuale corretto (n x 1)
• TDADJ(y,opt,mod,ydest,yc,yct,yce)
funzionalità
Produce la stima corretta per gli effetti di calendario per la serie in input.
input
y : vettore di dati grezzi (formato timeseries)
opt : stringa di opzioni per TRAMO per la correzione dei giorni lavorativi:
Ricordarsi che:
ireg=0 (modello senza correzione)
ireg=1 (modello con un regressore)
ireg=6 (modello con un regressore)
per la correzione della pasqua: ”ieast=1,idur=6”
mod : modello 1 regressore =1, modello 6 regressori = 6
output
ydest : dati destagionalizzati e corretti (formato timeseries)
yc : dati corretti per gli effetti di calendario complessivi
yct : dati corretti per i trading days (al netto delle festivita’)
yce : dati corretti per l’effetto pasqua
47
48
Beyond Chow-Lin. A Review and some Technical Remarks
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova)
Abstract
A dynamic extension according to a simple, though largely used, model of the Chow-Lin
approach to temporal disaggregation of a time series is discussed. Furthermore, a procedure to
deal with log-transformed data that fulfils the temporal aggregation constraints is developed.
Some applications to real-world data are finally presented.
1
Statement of the problem
A traditional problem often faced by National Statistical Institutes (NSIs) and more generally by economic researchers is the interpolation or distribution of economic time series observed at low frequency
into compatible higher frequency data. While interpolation refers to estimation of missing observations
of stock variables, a distribution (or temporal disaggregation) problem occurs for flows aggregates and
time averages of stock variables.
The need for temporal disaggregation can stem from a number of reasons. For example NSIs, due to
the high costs involved in collecting the statistical information needed for estimating national accounts,
could decide to conduct large sample surveys only annually. Consequently, quarterly (or even monthly)
national accounts could be obtained through an indirect approach, that is by using related quarterly
(or monthly) time series as indicators of the short-term dynamics of the annual aggregates. As another
example, econometric modelling often implies the use of a number of time series, some of which could
be available only at lower frequencies, and therefore it could be convenient to disaggregate these data
instead of estimating, with a significant loss of information, the complete model at lower frequency level.
Temporal disaggregation14 has been extensively considered by previous econometric and statistical literature and many different solutions have been proposed so far. Broadly speaking, two alternative
approaches have been followed15 :
1. methods which do not involve the use of related series but rely upon purely mathematical criteria
or time series models to derive a smooth path for the unobserved series;
14 In this paper we deal exclusively with issues of temporal disaggregation of univariate time series. For a discussion on
the multivariate case, see Di Fonzo (1994).
15 Eurostat (1999) contains a survey, taxonomy and description of the main temporal disaggregation methods proposed
by literature, and in Bloem et al. (2001) two chapters analyze the relevant techniques to be used for benchmarking,
extrapolation and related problems occurring in the compilation of quarterly national accounts.
49
2. methods which make use of the information obtained from related indicators observed at the desired
higher frequency.
The first approach comprises purely mathematical methods, as those proposed by Boot et al. (1967) and
Jacobs (1994), and more theoretically founded model-based methods (Wei and Stram, 1990) relying on
the ARIMA representation of the series to be disaggregated. The latter approach, more interesting for
our purposes, includes, amongst others, the adjustment procedure due to Denton (1971) and the related
Ginsburgh’s (1973) approach, the method proposed by Chow and Lin (1971) and further developed by
Bournay and Laroque (1979), Fernández (1981) and Litterman (1983). Moreover, Al-Osh (1989), Wei and
Stram (1990), Guerrero (1990) and Guerrero and Martinez (1995) combine an ARIMA-based approach
with the use of high frequency related series in a regression model to overcome some arbitrariness in the
choice of the stochastic structure of the high frequency disturbances.
However, in these very last years other papers on temporal aggregation and disaggregation of time series
have been produced, and the resurgence of interest has been so sudden that many of them are still
unpublished (sometimes under work) or are going to appear soon in the specialized journals.
Though the behaviour of NSIs could sometimes appear conservative enough in the introduction of new
techniques, it should be stressed that their introduction in a routine process (such as that of the estimation
of quarterly national accounts) requires that at least the following requirements are fulfilled:
1. the techniques should be flexible enough to allow for a variety of time series to be treated easily,
rapidly and without too much intervention by the producer;
2. they should be accepted by the international specialized community;
3. the techniques should give reliable and meaningful results;
4. the statistical procedures involved should be run in an accessible and well known, possibly user
friendly, and well sounded software program, interfacing with other relevant instruments typically
used by data producers (i.e. seasonal adjustment, forecasting, identification of regression models,
. . .).
The difficulties in achieving these goals have led NSIs using indirect approaches to quarterly disaggregation to rely mainly on techniques developed some thirty years ago (as Denton, 1971 or Chow and Lin,
1971). For obvious reasons, these techniques do not consider the more recent developments of econometric literature (typically, the introduction of the dynamic specifications, the co-integration and common
component analyses and the use of an unobserved time series models environment) and therefore sometimes they demonstrate obsolete and not responding to the increasing demand for more sophisticated
and/or theoretically well founded statistical and mathematical methods in estimating national accounts
figures.
Focusing for the moment on temporal disaggregation, broadly speaking two research lines seem to be
pre-eminent and will be discussed in this survey:
• techniques using dynamic regression models (and possibly transformed data) in the identification
of the relationship linking the series to be estimated and the (set of) related time series;
50
• techniques using formulations in terms of unobserved component models/structural time series
(either in levels or transformed) and the Kalman filter to get optimal estimates of missing observations by a smoothing algorithm (Gudmundsson, 1999, Hotta and Vasconcellos, 1999, Proietti,
1999, Gómez, 2000).
In this paper we discuss and extend the main methods in category a), namely those by Gregoir (1995),
Salazar et al. (1994, 1997, 1998) and Santos Silva and Cardoso (2001), while as for techniques under b)
we refer to Moauro and Savio (2001). This choice can be explained by the fact that, at the present state
of art, methods based on a structural time series modelling approach appear very promising, for they
bring into the estimation problem a more thorough consideration of the (possibly multivariate, Harvey,
1989, Harvey and Koopman, 1997) data generating process16 . On the other hand, from our point of
view this approach does not still fulfil the minimal requirements 1)-4) for NSIs we quoted above.
The work is organized as follows: in the next section the terms of the problem are rapidly presented,
along with the notation that will be used in the rest of the paper. In section 3 we review the classical
approaches to the disaggregation of a single time series based on both a constrained minimization of a
quadratic loss function and a static regression model and discuss some fundamental identification issues.
The links between the two approaches will be made clear, for they will be useful for the results presented
in the next sections. In section 4 we deal with the management of the logarithmic transformation and
discuss the model in first differences of the variables, by this way giving a first enlargement of the
classical temporal disaggregation approach considered so far. The reasons suggesting an enlargement
towards dynamic structure of the underlying regression model on which the temporal disaggregation is
based are discussed in section 5. A disaggregation procedure founded on the simple dynamic regression
model worked out by Gregoir (1995) and Salazar et al. (1997, 1998) is then presented (section 6) along
with a closed form expression for the estimated high-frequency values. Section 7 is devoted to an almost
complete review of the method by Santos Silva and Cardoso (2001), whose proposal permits to deal with
a simple dynamic framework without conditioning on the initial observations, allowing by this way for a
straightforward formulation of the disaggregation formulae.
2
Notation
According to Salazar et al. (1994), we adopt the following notation convention. Single observations of
low-frequency (LF) data are denoted by a single subscript, i.e. yt , and are observed in T consistently
spaced periods, which is a key assumption for the methods outlined in the following sections.
Our aim is to derive an estimate of the underlying high-frequency (HF) series, whose unknown values
are denoted by a double-subscript, so that yt,u denotes the HF value of Y in sub-period u of period
t = 1, . . . , T , which is assumed to have periodicity s. For example, s = 3 if we require monthly estimates
of a quarterly observed series, s = 4 if we want quarterly estimates for yearly data and s = 12 if monthly
data are required for an annually observed series.
The (T × 1) vector of LF data is denoted by
yl = (y1 , . . . , yt , . . . , yT )′
while the (n × 1) vector of HF data is denoted by yh . Notice that we must have n ≥ sT . If n = sT ,
then yh = (y1,1 , . . . , yT,s )′ and we face a problem of distribution or interpolation. If n > sT , also an
16 Another advantage of this approach lies in the fact that both unadjusted and seasonally adjusted series can be simultaneously estimated.
51
extrapolation issue has to be considered, with the difference n − sT being the number of HF sub-periods
not subject to temporal aggregation constraints.
Similarly, any (T × K) matrix of LF data is denoted by Xl and its (n × K) HF counterpart is written
as Xh . The columns of the LF matrix Xl are denoted by xl,k and those in the HF matrix Xh by xh,k ,
where k = 1, . . . , K denotes the relevant variable of the matrices Xl and Xh , respectively. Accordingly,
xl and xt,u are (K × 1) vectors containing the LF and HF observations on K related series in period t
and (t, u), respectively.
We assume, in general, that there exists a (time-invariant) constraint linking the LF and HF data given
by
s
X
yl =
cu yt,u ,
t = 1, . . . , T,
(1)
u=1
where the weights {cu }su=1 are known a priori and are typically 1 if the LF data are simply aggregates
of the high-frequency data or 1s if the LF data are averages of the HF data. If the LF data have been
obtained by systematically sampling the HF ones, then the weights are all zeroes but either the last one
(stock variable observed at the end of the LF period) or the first one (stock variable observed at the
beginning of the LF period).
We express (1) in matrix terms by defining a (T × n) matrix C that links the (observed) LF vector yl
to that of the corresponding (unknown) HF series yh . If n = sT the aggregation matrix C has a blockdiagonal structure: C = I ⊗ c′ , where c = (c1 , . . . , cu , . . . , cs )′ and ⊗ denotes the Kronecker product.
Hence, we have
yl = Cyh .
(2)
In the case of a pure distribution problem it is

1 1 ··· 1 0
 0 0 ··· 0 1
C

= . . .
. . ... ...
(T × sT )
 .. ..
0 0 ··· 0 0
0
1
..
.
0
where c = (1, 1, . . . , 1)′ . Alternatively, in the case
values, we have:

0 0 ··· 1 0 0
 0 0 ··· 0 0 0
C

= . . .
. . ... ... ...
(T × sT )
 .. ..
0 0 ··· 0 0 0
··· 0 ··· 0
··· 1 ··· 0
. ..
.
..
. ..
. ..
··· 0 ··· 1

0 ··· 0
0 ··· 0 

′
.. . .
..  = IT ⊗ c ,

.
.
.
1 ··· 1
of a pure interpolation situation, with end of period
··· 0 ··· 0
··· 1 ··· 0
. ..
.
..
. ..
. ..
··· 0 ··· 0

0 ··· 0
0 ··· 0 

′
.. . .
..  = IT ⊗ c ,

.
.
.
0 ··· 1
where c = (0, 0, . . . , 1)′ . When an extrapolation problem is also present, n − sT columns of zeroes must
be added to the previous matrices.
3
A general formulation of the disaggregation problem
Without any loss of generality assume that we want to estimate a column vector yh of n HF values of a
given variable Y . For this purpose we dispose of a vector yl of T LF observations of the same variable.
The first element of yh will correspond to the first sub-period of the first LF period in yl (e.g., the first
52
month of the first quarter). Also assume we observe K related HF series whose observations are grouped
in a (n × K) matrix Xh .
Define the estimator of β and yh as the optimal solution of the following problem:
min L(yh , β) = (yh − Xh β)′ W(yh − Xh β)
yh ,β
(3)
subject to Cyh = yl , where W is a positive definite metric matrix and β is a (K × 1) column vector of
coefficients. L(yh , β) is a quadratic loss function in the differences between the series to be estimated
yh and a linear combination Xh β of the related HF series.
The optimal solutions ŷh and β̂ are easily obtained from (3) (Pinheiro and Coimbra, 1992):
ŷh = Xh β̂ + W−1 C′ (CW−1 C′ )−1 (yl − Xl β̂),
−1 ′
β̂ = X′l (CW−1 C′ )−1 Xl
Xl (CW−1 C′ )−1 yl
(4)
(5)
Solutions (4)-(5) obtained according a quadratic-linear approach may be derived in a different way (Chow
and Lin, 1971). Assume that we have the HF regression model
yh = Xh β + uh ,
E(uh u′h |Xh ) = Vh .
E(uh |Xh ) = 0,
(6)
Pre-multiplying equation (6) by C, we obtain the LF regression
yl = Xl β + ul ,
(7)
with ul = Cuh , E(ul |Xl ) = 0, E(ul u′l |Xl ) = CVh C′ = Vl . It is interesting to notice that β̂ in (5) is
the Generalized Least Squares (GLS) estimator of β in the LF regression (7) if we let Vh = W−1 .
From the Gauss-Markov theorem, it is obvious that (5) is the Best Linear (in yl ) Unbiased Estimator
(BLUE) for β, conditional to Xh . The estimator ŷh is obtained by correcting the linear combination of
the HF series Xh β̂ by distributing the estimated LF residuals û − Xl β̂ among the HF sub-periods. The
covariance matrix of ŷh is given by (Bournay and Laroque, 1979)
E(ŷh − yh )(ŷh − yh )′ = (In − LC)Vh + (Xh − LXl )(X′l Vl−1 Xl )−1 (Xh − LXl )′ .
(8)
where L = Vh C′ Vl−1 . From (8) standard errors of the estimated HF series may be calculated and used
to produce, for example, reliability indicators of the estimates (van der Ploeg, 1985) as
ŷt,u
,
σ̂ŷt,u
t = 1, . . . , T,
u = 1, . . . , s,
or, assuming ut,u normal, confidence intervals.
3.1
Alternative choices of Vh and the fundamental identification issue
From a practical point of view, when one is interested in applying one of the procedures presented so far,
the key issue lies in identifying the covariance matrix Vh from Vl . This latter can be estimated from
the data, possibly by imposing an ARIM A structure on the aggregate but, in general, the covariance
matrix of the HF disturbances cannot be uniquely identified from the relationship Vl = CVh C′ .
53
For the case when indicators are available several restrictions on the DGP of ut,u |Xh have been proposed
in order to simplify the problem (Eurostat, 1999):
Chow and Lin (1971): ut,u |Xh ∼ AR(1);
Fernández (1981): ut,u |Xh ∼ random walk;
Litterman (1983): ut,u |Xh ∼ ARIM A(1, 1, 0);
Wei and Stram (1990) consider the general case where ut,u |Xh ∼ ARIM A(p, d, q). They deal with
sufficient condition under which the derivation of Vh from Vl is possible for the ARIM A model class
when c = is ; Barcellan and Di Fonzo (1994) provide the extension for general c.
Despite the proposal of Wei and Stram (1990) encompasses the three models above, statistical agencies
often limit their considerations to the Chow and Lin procedure (or to suitable extensions) because of
its computational simplicity. As a consequence, little or no attention is posed on the data generating
mechanism and it is often the case that the residuals of the LF regression distributed across the periods
are realizations of integrated processes. Also, the procedure is applied twice using both raw and seasonally
adjusted indicators, in order to get a raw and seasonally adjusted disaggregated or interpolated series.
Moreover, as noted by the authors themselves in their concluding remarks, “The usefulness of this method
in practice (· · · ) surely depends on the validity of the regression model assumed”.
The approach of Wei and Stram has the merit of reducing the arbitrariness of the parametrization of
Vh that characterizes the ’classical’ solutions. On the other hand, given the (usually) reduced number
of aggregated observations on which the identification procedure must be based, its application is not
straightforward. In fact, as Proietti (1999) points out, the Wei and Stram approach relies heavily on
model identification for the aggregate series according to the Box-Jenkins strategy, which fundamentally
hinges upon the correlograms. These have poor sample properties for the typical sample sizes occurring
in economics. Furthermore, due to the increase in the standard errors (Rossana and Seater, 1995), the
autocorrelations may become insignificant and low order models are systematically chosen. The Monte
Carlo evidence presented in Chan (1993) shows that the approach is likely to perform comparatively bad
when T < 40 (which is not an infrequent size in economics).
4
4.1
A first enlargement of the classic approach: temporal disaggregation of a log or logdifferenced variable
Preliminary remark. A regression model in first differences: another look at Fernández
(1981)
Consider the following representation of the underlying HF data:
△yt,u = △x′t,u β + εt,u
(9)
where △ = 1 − L, L being the lag operator, β is a (K × 1) vector of fixed, unknown parameters and the
process {εt,u } is white noise with zero mean and variance σε2 . Notice that implicit assumptions of model
(9) are that (i) the constant is not part of vector xt,u , (ii) a constant term appears in △xt,u only if xt,u
contains a linear deterministic trend. We will turn on these two points later.
Representation (9) implies
yt,u = x′t,u β + ut,u
54
(10)
where the process {ut,u }, defined by △ut,u = εt,u , follows a random walk without drift. Implicitly
therefore it is assumed that the two series are not cointegrated (or, if they are, that the cointegrating
vector differs from β).
Now, let us consider the (n × n) matrix D such that dii = 1, di,i−1 = −1 and zero elsewhere:


1
0 0 ··· 0 0
 −1 1 0 · · · 0 0 




0
−1
1
·
·
·
0
0

,
D=
..
.. . .
..
.. 
 ..

.
 .
.
.
. 
.
0
whose inverse is the lower triangular matrix:

D−1



=



0
1
1
1
..
.
1
0
1
1
..
.
1
0
· · · −1 1
0
0
1
..
.
1
···
···
···
..
.
···
0
0
0
..
.
1
0
0
0
..
.
1
(11)




.



Model (10) can thus be re-written in matrix form as
yh = Xβ + ξ h
(12)
where ξ h = D−1 εh is a zero-mean stochastic vector with covariance matrix given by E(ξ h ξ ′h ) =
σε2 D−1 (D−1 )′ = σε2 (D′ D)−1 , where


1 1 1 ···
1
1
 1 2 2 ···
2
2 




′
−1
1
2
3
·
·
·
3
3
.

(D D) = 
.. 
..

 .. .. .. . .
.
 . . .
. 
.
1 2 3 ··· n − 1 n
If the HF model contains an intercept, that is
△yt,u = α + △x′t,u β + εt,u ,
(13)
the process {ut,u } follows a random walk with drift, that is △ut,u = α + εt,u . Notice that this case
is equivalent to augment the original (not differenced) indicator matrix by a linear deterministic trend
variable, such that
X ∗h = [j X h ],
where, without loss of generality, j is a (n × 1) vector with generic element jt,u = (t − 1)s + u. The HF
model becomes
yh = X∗ β ∗ + ξ h
(14)
where β ∗ is a ((K + 1) × 1) vector of parameters whose first element is the drift α.
55
Table 1: Constant and linear trend in the regression model in first differences
In the light of what we have discussed so far, the user should not put deterministic variables as either a
constant or a linear deterministic trend (or both) in vector xt,u , because, as shown in table (1), they are
either incompatible or encompassed by models (9) and (13).
Model (12) has been considered by Fernández (1981) first (see also Di Fonzo, 1987, pp. 51-52) and later
by Pinheiro and Coimbra (1992), Salazar et al. (1994)17 and Eurostat (1999). Following the classical
result by Chow and Lin (1971, 1976), the MMSE estimated HF vector in the case of random walk without
drift is given by:
ŷ h = X h β̂ + (D ′ D)−1 C ′ [C(D′ D)−1 C ′ ]−1 (y l − X l β̂),
where
−1 ′
β̂ = X ′l [C(D ′ D)−1 C ′ ]−1 X l
X l [C(D′ D)−1 C ′ ]−1 y l .
The solution for the case in which a drift is present is simply obtained by substituting β, X h and X l
with β ∗ , X ∗h and X ∗l = CX ∗h , respectively.
As Salazar et al. (1994) note, the GLS procedure above, and in general the estimates obtained according
to Chow and Lin’ss approach, are satisfactory in the case where the temporal aggregation constraint is
linear and there are no lagged dependent variables in the regression. We next discuss how the approach
can be extended to deal with logarithmic and lagged dependent variables.
4.2
Log-transformed variable in a static model
Empirical studies on macroeconomic time series show that in many circumstances it is strongly recommended to use logarithms of the original data to achieve better modelizations of time series. In
particular, most macroeconomic aggregate time series become stationary after applying first differences
17 However, contrary to what Salazar et al. (1994, p. 6) state, this procedure is not an amended version of Ginsburgh’s
(1973) method, which is a well known two-step adjustment procedure, but rather the optimal (one step) solution to the
temporal disaggregation problem given by either model (12) or (14).
56
to their logarithms. Unfortunately, logarithmic transformation being not additive, for a distribution
problem the standard disaggregation results can not be directly applied.
The problem of dealing with log-transformed variables in a disaggregation framework has been considered
by Pinheiro and Coimbra (1992), Salazar et al. (1994, 1997), Proietti (1999) and Aadland (2000). The
approach followed here is strictly related to the last reference, which is rather comprehensive on this
issue and whose results regarding the temporal disaggregation of flows variables confirm those found
by Salazar et al. (1994, 1997). Moreover, Salazar et al. (1997) and Proietti (1999) tackle the related
problem of the adjustment of the estimated values to fulfil the temporal aggregation constraints.
Let us consider first the case of temporal disaggregation of a flow variable Y , whose high-frequency values
yt,u are unknown, t = 1, . . . , n being the low-frequency index and u = 1, . . . , n the high-frequency one
(s = 3 in the quarterly/monthly, s = 4 in the annual/quarterly and s=12 in the annual/monthly cases,
respectively).
As for Y , (i) we observe the temporal aggregate yt =
disaggregated level be
ln yt,u = x′t,u β + ut,u ,
Ps
u=1
yt,u and (ii) we suppose that the model at
t = 1, . . . , T ;
u = 1, . . . , s,
that is
zt,u = xt,u β + ut,u ,
t = 1, . . . , T ;
u = 1, . . . , s,
(15)
where zt,u ≡ ln yt,u and ut,u is a zero-mean random disturbance.
Now, let us consider the first order Taylor series expansion of yt,u around the period-level average of the
Ps
variable to be estimated, ȳt = 1s u=1 yt,u = yst :
ln yt,u = zt,u ≃ ln ȳt +
1
syt,u
(yt,u − ȳt ) = ln yt − ln s +
−1
ȳt
yt
(16)
Summing up (16) over the high-frequency periods we have
s
X
zt,u ≃ s ln yt − s ln s +
u=1
s
Ps
u=1
yt,u
yt
= s ln yt − s ln s.
So, discarding the approximation and summing up (15) over u, we have the observable aggregated model
zt = x′t β + ut ,
where zt = ln yt − s ln s, xt =
Ps
u=1
xt,u and ut =
t = 1, . . . , T.
Ps
u=1
(17)
ut,u .
Depending on the assumptions made on the disturbance term in model (15), the classic Chow and Lin’s
disaggregation approach, according to the original AR(1) formulation or in one of its well-known variants
(Fernández, 1981, Litterman, 1983), can thus be applied using model (23)18
Let us re-write (15) in matrix form as
z h = X h β + uh ,
18 In the next section we shall see that producing log-transformed HF data following the procedure by Fernández (1981)
has a straightforward economic interpretation in terms of modelling the rate of change of the variable.
57
with E(uh |X h ) = 0vet and E(uh u′h |X h ) = V h , assumed known. The BLU solution to the problem of
obtaining estimates of z h coherent with the aggregate z l is given by (see section 3):
ẑ h = X h β̂ + V h C ′ V −1
l (z l − X l β̂
−1
β̂ = (X ′l V −1
X ′l V −1
l X l)
l zl ,
where V l = CV h C ′ .
As a consequence, we estimate disaggregated values ẑt,u such that
s
X
ẑt,u = zt = s ln yt − s ln s,
t = 1, . . . , T.
u=1
A natural estimate for yt,u is thus given by
ŷt,u = exp(ẑt,u ),
t = 1, . . . , T ;
u = 1, . . . , s.
(18)
Due to the approximation, the estimated values in (18) will generally violate the aggregation constraints:
s
X
ŷt,u 6= yt
⇔
u=1
yt −
s
X
ŷt,u = rt 6= 0,
t = 1, . . . , T.
u=1
As suggested by Proietti (1999), the simplest solution is to adopt the Denton (1971) algorithm so as to
distribute the residuals across the HF periods. The vector stacking the final estimates, ỹ h is obtained
adding to the preliminary estimates in ŷ h a term resulting from the distribution of the LF residuals in
the vector r:
ỹ h = ŷ h + (D ′ D)−1 C ′ [C(D ′ D)−1 C ′ ]−1 r,
(19)
where C = I T ⊗ i′s is the aggregation matrix for flows, is = [1, 1, . . . , 1]′ is a (s × 1) vector and D is the
(n × n) matrix in (11).
When the LF data are the temporal averages of the HF ones (index variable), we have
yt =
After some algebra19, we find that
s
1X
ln yt,u ,
s u=1
t = 1, . . . , T.
s
1X
ln yt,u ≃ ln yt .
s u=1
(20)
For an index variable the observable aggregated variable zt to be used in model (23) is thus zt = ln yt ,
the other results remaining unchanged, apart in Denton’s formula (19) the definition of the aggregation
matrix C, which is C = I T ⊗ i′s /s for the problem in hand.
Finally, when the temporal aggregate takes the form of end-of-stock period, yt = yt,s , t = 1, . . . , T ,
the dependent variable in model (23) is simply zt = ln yt . Similar conclusion holds also for beginningof-period stock variables. As it’s obvious, in both cases no further adjustment to the estimates ŷt,u =
exp(ẑt,u ) is needed. A synthesis of the results presented so far is shown in tab. (2).
19 The
first order Taylor expansion is now around yt : ln yt,u = zt,u ≃ ln yt +
the expression (20).
58
1
(yt,u
yt
− yt ). Summing up over u we find
Table 2: Approximation and adjustment needed to deal with a logarithmic transformation of the variable to be
disaggregated∗
To conclude on this issue, when dealing with a log-transformed variable in a static regression model,
the researcher can use one of the commonly available standard disaggregation procedures, with the only
caution to give the procedure as input the transformed aggregated variable zt , whose form will depend
on the type of variable/aggregation in hand. The estimated values ẑt,u have then to be transformed as
ŷt,u = exp(ẑt,u ): for stock variables the procedure so far produces estimates coherent with the original
aggregated series. For both flow and index variables the low-frequency residuals rt must be evaluated
and, if the discrepancies are relatively large, adjusted estimates according to Denton’s procedure should
be calculated.
4.3
The deltalog model
Many economic models are expressed in terms of rate of change of the variable of interest, i.e. logarithmic
difference,
yt,u
△ ln yt,u =
= △zt,u ,
yt,u−1
where zt,u = ln yt,u and, with obvious notation, yt,0 = yt−1,s . In this case the HF regression model in
first differences (9) becomes20
△zt,u = △x′t,u β + εt, u.
(21)
Using the results of the previous sections, estimates of coherent with and obtained in the framework of
the deltalog model (21) are simply obtained using the procedure by Fernández (1981)21 :
ŷh = exp(ẑh ),
where
−1
ẑh = Xh β̂ + (D′ D)−1 C′ C(D′ D)−1 C′
(zl − Xl β̂),
n o
−1
−1
−1
β̂ = X′l C(D′ D)−1 C′
Xl
X′l C(D′ D)−1 C′
zl .
20 In model (21) the related series are not log-transformed in order to safe notation. The results remain of course valid
in this case too.
21 For either flow or index variable, a further adjustment according to Denton’s formula (19) should be performed to
exactly fulfil the temporal aggregation constraints.
59
Now, in order to get an interesting interpretation of the way this disaggregation procedure works, let
us consider the case of a temporally aggregated flow variable. Pre-multiplying (21) by the polynomial
P
(1 + L + . . . + Ls−1 ) and using the approximation relationship su=1 ln yt,u ≃ zt = s ln yt − s ln s, we find
s
X
u=1
ln
yt,u
yt−1,u
≃ △zt = s△ ln yt ,
(22)
where, in this case, △ = 1 − B, B being the lag operator operating at low-frequency. In other words,
yt
the (logarithmic) growth rate of the temporally aggregated variable, ∆ ln yt = yt−1
, can be viewed as an
approximation of a s-period average of past and current growth rates of yt,u (see Aadland, 2000).
As a matter of fact, ln yt,u contains a unit root (at the higher frequency) and, after application of the
transformation polynomials, ln yt displays a unit root at the lower frequency too. In fact, by temporally
aggregating equation (21), and taking into account relationship (22), we obtain a model expressed in
terms of the low-frequency rate of change:
s
X
△ ln yt,u = x′t β + εt
⇔
s△ ln yt = x′t β + εt .
(23)
u=1
Similar results hold also when the other types of aggregation (index and stock variables) are considered.
For, in these cases the observable, aggregated model is simply
△ ln yt = x′t β + εt .
It’s thus possible to say that, by using the deltalog model, the variable of interest is disaggregated in such
a way that its estimated HF rates of change are coherent (approximately for flow and index variables)
with the LF counterparts.
4.4
An example: the estimation of monthly Italian Value Added
To get more insights about the effect of working with log-transformations in a disaggregation framework,
and only with illustrative purposes22 , monthly Italian industrial value added (1970:01-2001:09) has been
estimated by temporal disaggregating the available quarterly seasonally adjusted data (yt , source: Istat)
using the monthly seasonally adjusted series of Italian industrial production index (xt,u , source: Bank
of Italy) and according to the deltalog model
△ ln yt,u = β0 + β1 △ ln xt,u + εt,u ,
t = 1970q1, . . . , 2001q3,
u = 1, 2, 3,
(24)
where yt,0 = yt−1,3 and εt,u is a white noise.
Estimated parameters are all significant (see tab. 4.3), while the determination coefficient of the auxiliary
quarterly GLS regression (0.354) is rather low23 .
Figure (1) shows the percentage discrepancies between the original, quarterly Italian data, yt , and the
P3
preliminary quarterly sum of the monthly estimates, ŷt = u=1 ŷt , obtained according to the deltalog
model.
22 A preliminary study of the dynamic properties of the available series is of course a necessary pre-requisite for the
application of any model.
23 On the meaning of the determination coefficient in models in differences, and on comparing regression models in levels
and first differences, see Harvey (1980) and the discussion in Maddala (2001, pp. 230-234). It should be noted that in
this case the auxiliary quarterly regression model has not spherical disturbances, for it is estimated by GLS, making the
comparison even more difficult.
60
Figure 1: Quarterly percentage discrepancies of the aggregated estimated Italian monthly value added: rt =
yt −ŷt
yt
× 100, t = 1970q1 − 2001q3
It’s worth noting that the discrepancies are all negative (the quarterly sums of the estimated monthly
values overestimate the ‘true’ figures) but they are with no doubt negligible, as they are practically
always less than 0.04 percentage points.
To fulfil the temporal aggregation constraints, adjusted estimates according to Denton’s first order procedure have been calculated. As figures 2 and 3 clearly show, the amount of ‘correction’ to the preliminary
estimates is very small and by no way changes the dynamic profile of the estimated series.
Figure 2: Monthly percentage discrepancies of the estimated Italian monthly value added: rt,u =
ŷt,u −ỹt,u
ŷt,u
×100,
t = 1970q1 − 2001q3, u = 1, 2, 3
5
Towards a dynamic framework for disaggregation
One of the major drawback of the Chow and Lin’s approach to temporal disaggregation of an economic
time series is that the HF model on which the method is based is static, giving rise to some doubt about
its capability to model more sophisticated dynamic, as those usually encountered in applied econometrics
work. Strictly linked to this issue is the observation that a successful implementation of the HF regression
model (6) requires ut,u to be generated by a stationary process. This means that (6) must form a
cointegrating regression if yt,u and the related series are integrated series.
61
Figure 3: Preliminary and adjusted estimates of the Italian monthly value added
On the other side, the classical extensions of the Chow and Lin’s approach available in literature
(Fernández, 1981, Litterman, 1983) ‘expand’ the original AR(1) disturbance structure to simple integrated models (random walk and random walk-Markov model for Fernández, 1981, and Litterman,
1983, respectively), which implicitly means that and the related series are not cointegrated and are thus
to be modelled in differenced form.
In general, according to the standard Chow and Lin’s approach, the dynamic path of the unobserved
variable is derived only from the information given by the HF related series. In practice, a more reliable
description of the system under study may be given by simply adding lagged values of the variable of
interest to the basic high-frequency regression equation.
Gregoir (1995) and Salazar et al. (1997, 1998) considered a simple dynamic regression model as a basis
to perform the temporal disaggregation of a variable of interest, deriving the estimates in a framework
of constrained optimization of a quadratic loss function (see section 3). In both cases, however, the
algorithms needed to calculate the estimates and their standard errors seem rather complicated and not
straightforward to be implemented in a computer program.
Santos Silva and Cardoso (2001, hereafter SSC) developed an extension of the Chow and Lin’s temporal
disaggregation method based on the same linear dynamic model considered by Gregoir (1995) and Salazar
et al. (1997, 1998). By means of a well-known transformation developed to deal with distributed lag
model (Klein, 1958, Harvey, 1990), a closed form solution is derived according to the standard BLU
approach of Chow and Lin (1971, 1976), along with a straightforward expression of the covariance
matrix of the disaggregated series.
These results seem to encourage in using the method by SSC, for both reasonability (and simplicity)
of the model, which is able to deal with simple dynamic structures as compared with the essentially
static nature of the original Chow and Lin’s approach, and for the simplicity of calculations needed to
get the estimated high frequency values, which is particularly interesting for the subjects (i.e., statistical
institutes) which are typically in charge of temporal disaggregation activities as part of their current
work of producing high-frequency series from (and coherent with) the available source data.
We present both approaches to temporal disaggregation according to a linear dynamic regression model,
to establish the practical equivalence from a modelling point of view and to point out that the only real
62
difference between the two approaches is the way the first observation is treated by each of them.
However, it should be noted that Salazar et al. (1997, 1998) formulate the link between interpoland (the
variable to be disaggregated) and related series in terms of a more general, possibly non linear, dynamic
model, which encompasses the Gregoir’s and SSC formulation (see appendix A). From a theoretical
point of view this sounds well, but practically the estimation problems deriving from a number of lags of
the interpoland variable greater than one (or, eventually, greater than the seasonal order when dealing
with raw series) are emphasized by the fact that we have to move from an observable model which
is aggregated, with consequent loss of information. As for non-linearity, in the previous section we
discussed the way to conveniently transform the data in order to preserve the quadratic-linear approach
even in presence of a logarithmic transformation, which is probably the most widely used non linear
transformation. We have seen that the disaggregated estimates present only negligible discrepancies
with the observed aggregated values.
For these reasons, according to Gregoir (1995), in what follows the attention is focused on the first order
linear dynamic regression model:
(1 − φL)yt,u = x′t,u β + εt,u ,
t = 1, . . . , T
u = 1, . . . , s,
(25)
where |φ| < 1, xt,u is a (K × 1) vector of (weakly) exogenous variables, eventually comprehensive of
lagged independent variables, β is a (K × 1) vector of parameters and εt,u ∼ W N (0, σε2 ). If needed, the
model can be enlarged to take into account further lags of the dependent variable in a straightforward
manner (Salazar et al., 1998, see appendix A).
6
The solution of Salazar et al.
Salazar et. al. (1997, 1998) consider a slightly different representation of the underlying high frequency
data as compared to model (25), which explicitly takes into account the presence of the lagged dependent
variable:
t=1
u = 2, . . . , s
(1 − φL)yt,u = x′t,u β + εt,u ,
.
(26)
t = 2, . . . , T u = 1, . . . , s
Let us pre-multiply model (26) by the polynomial (1 + φL + . . . + φs−1 Ls−1 )εt,u . Given that (1 + φL +
. . . + φs−1 Ls−1 )(1 − φL) = (1 − φs Ls ), it is
(1 − φs Ls )yt,u = (1 + φL + . . . + φs−1 Ls−1 )x′t,u β + (1 + φL + . . . + φs−1 Ls−1 )εt,u .
(27)
Pre-multiplying (27) by the polynomial (c1 + c2 L + . . . + cs−1 Ls−1 ), where the weights {cu }su=1 have
been previously defined, we obtain the observable aggregated model, that is:
(1 − φs Ls )yt = (1 + φL + . . . + φs−1 Ls−1 )x′t β + (1 + φL + . . . + φs−1 Ls−1 )εt ,
P
P
where x = su=1 cu xt,u and εt = su=1 cu εt,u , t = 2, . . . , T .
6.1
t = 2, . . . , T,
(28)
The effect of temporal aggregation on a simple first order dynamic model
To make the effects of temporal aggregation on model (26) clear, without loss of generality let us consider
the simple dynamic regression model
(1 − φL)yt,u = α + βxt,u + εt,u ,
63
t=1
t = 2, . . . , T
u = 2, . . . , s
.
u = 1, . . . , s
If Y is a flow variable and s = 3, the LF observable quarterly aggregated model can be written as
(1 − φ3 B)yt = 3α(1 + φ + φ2 ) + βxt + βφz1,t + βφ2 z2,t + vt ,
t = 2, . . . , T,
(29)
where B is the LF lag operator such that Byt = yt−1 and
z1,t
z2,t
vt
=
=
=
xt−1,3 + xt,1 + xt,2
.
xt−1,2 + xt−1,3 + xt,1
(1 + φL + φ2 L2 )(1 + L + L2 )εt,3
The LF stochastic disturbance follows a zero-mean MA(1) process (Abeysinghe, 2000) with
V ar(vt ) = σv2 = σε2 (3 + 4φ + 5φ2 + 4φ3 + 3φ4 ),
Corr(vt , vt−1 ) = ρv,1 =
φ(1 + φ)2
.
3 + 4φ + 5φ2 + 4φ3 + 3φ4
The graph of ρv,1 is represented in figure 4, which shows that, for φ > 0, the autocorrelation ranges from
0 to about 0.21.
Figure 4: First order autocorrelation of the stochastic disturbance in model (29)
When s = 4 (i.e, annual aggregation of a quarterly flows variable), the LF observable annual aggregated
model can be written as
(1 − φ4 B)yt = 4α(1 + φ + φ2 + φ3 ) + βxt + βφz1,t + βφ2 z2,t + +βφ3 z3,t vt ,
where
z1,t
z2,t
z3,t
vt
=
=
=
=
xt−1,4 + xt,1 + xt,2 + xt,3
xt−1,3 + xt−1,4 + xt,1 + xt,2
.
xt−1,2 + xt−1,3 + xt−1,4 + xt,1
(1 + φL + φ2 L2 + φ3 L3 )(1 + L + L2 + L3 )εt,4
The LF stochastic disturbance follows a zero-mean MA(1) process with
σv2
ρv,1
= σε2 (4 + 6φ + 8φ2 + 8φ3 + 8φ4 + 6φ5 + 4φ6 )
.
φ(1 + 2φ + 4φ2 + 2φ3 + φ4 )
=
2
3
4
5
6
4 + 6φ + 8φ + 8φ + 8φ + 6φ + 4φ
64
t = 2, . . . , T,
(30)
Figure 5: First order autocorrelation of the stochastic disturbance in model (30)
The graph of is represented in figure 5, which shows that, overall, the autocorrelation ranges from -0.5
to about 0.23.
Finally, if a stock variable is involved, the quarterly and annual observable aggregated models are,
respectively24 ,
Quarterly model
(1 − φ3 B)Yt
vt
=
=
α(1 + φ + φ2 ) + βxt,3 + βφxt,2 + βφ2 xt,1 + vt
(1 + φL + φ2 L2 )εt,3
t = 1, . . . , T
Annual model
(1 − φ4 B)Yt
vt
= α(1 + φ + φ2 + φ3 L3 ) + βxt,4 + βφxt,3 + βφ2 xt,2 + βφ3 xt,1 + vt
= (1 + φL + φ2 L2 + φ3 L3 )εt,4
t = 1, . . . , T.
Ps−1 2
It should be noted that in both cases vt follows a white noise with σv2 = σε2 1 + j=1 φj .
6.2
GLS estimation of the observable aggregated model
In order to get closed form expressions, suitable to be implemented in a computer program, it is convenient
to express model (28) in matrix form. Let us consider the ((n − 1) × n) matrix D ∗φ such that


−φ 1
0 ··· 0 0
 0 −φ 1 · · · 0 0 




∗
0
0
−φ
·
·
·
0
0


(31)
Dφ = 
.. 
..
..
..
..

 ..
.
 .
. 
.
.
.
0
0
0
· · · −φ 1
Model (26) can thus be written as
D∗φ y h = X ∗h β + ε∗h ,
24 We
consider a stock variable observed at the end of the HF period.
65
(32)
where
X ∗h
= {xt,u },
t=1
t = 2, . . . , T
u = 2, . . . , s
u = 1, 2, . . . , s
is the ((n − 1) × K matrix of HF related series bereaved of x1,1 , and ε∗h is a zero mean stochastic vector
with E(ε∗h ε∗h ′ ) = σε2 I n−1 .
Now, let K φ be the ((n − s) × (n − 1)) matrix given by
 s−1
φ
φs−2 φs−3 · · · 1
 0
φs−1 φs−2 · · · φ


0
φs−1 · · · φ2
 0
Kφ = 
.
.
..
.
..
 .
..
. ..
 .
.

 0
0
0
··· 0
0
0
0
··· 0
0···
1
φ
..
.
0
0
0 0
··· 0
··· 0
.
..
. ..
··· 1
··· φ
0
0
..
.
0
1
and partition the aggregation matrix C previously defined as follows:
C 1 0vet′
C=
0vet C 2
where C 1 = c′ and C 2 = I T −1 ⊗ c′ . Notice that C 2 K φ D∗φ = D∗φs C,

−φs
1
0
···
0
 0
s
−φ
1
···
0


s
∗
0
0
−φ
·
·
·
0

D φs = 
..
..
..
..
 ..
.
 .
.
.
.
0
0
0
· · · −φs










(33)
(34)
where

0
0 


0 
.. 

. 
1
is a ((T − 1) × T ) matrix. Thus, pre-multiplying model (32) by C 2 K φ gives the observable aggregated
model (28) expressed in matrix form:
D∗φs y l = C 2 K φ X ∗h β + C 2 K φ ε∗h ,
that is
y ∗l,φ = X ∗l,φ β + ε∗l,φ ,
where
V ∗l,φ .
y ∗l,φ
= D φs y l ,
X ∗l,φ
=
C 2 K φ X ∗h
and
ε∗l,φ
=
C 2 K φ ε∗h ,
(35)
with
E(ε∗l,φ ε∗l,φ ′
=
σε2 (C 2 K φ K ′φ C ′2 )
=
For any fixed φ, the BLU estimator of β in model (35) is obtained through Generalized Least Squares
estimation:
−1 ∗ ′ ∗ −1 ∗
′
β̂ φ = X ∗l,φ (V ∗l,φ )−1 X ∗l,φ
X l,φ (V l,φ ) y l,φ .
The parameter φ being generally unknown, β and φ can be estimated either by minimization of the
weighted sum of squared residual,
(y ∗l,φ − X ∗l,φ β)′ (V ∗l,φ )−1 (y ∗l,φ − X ∗l,φ β),
or, assuming the gaussianity of εt,u , by maximizing the log-likelihood function
l (φ, β) = −
1
1
T −1
ln 2π − ln |V ∗l,φ | − (y ∗l,φ − X ∗l,φ β)′ (V ∗l,φ )−1 (y ∗l,φ − X ∗l,φ β).
2
2
2
66
(36)
In both cases, given an estimate φ̂, the estimated regression coefficients are calculated as
h
i−1
′
′
β̂ φ̂ = X ∗l,φ̂ (V ∗l,φ̂ )−1 X ∗l,φ̂
X ∗l,φ̂ (V ∗l,φ̂ )−1 y ∗l,φ̂ .
6.3
(37)
Estimates of y h as solution of a constrained optimization problem
The estimates of β and φ obtained so far can be used to get a ‘preliminary’ estimate of D∗φ y h , say
z ∗h = X ∗h β̂, to be used in the following constrained optimization problem:
min(D ∗φ y h − z ∗h )′ (D ∗φ y h − z ∗h )
yh
subject to y l = Cy h .
(38)
In order to conveniently deal with the first sub-period value, let us re-write the loss function to be
minimized in (38) as follows:
(A1 y 1h + A2 y 2h − z ∗h )′ (A1 y 1h + A2 y 2h − z ∗h ),
′
′
where the HF vector y h has been partitioned as y h = [y 1h y 2h ]′ , y 1h and y 2h being (s×1) and ((n−s)×1),
respectively, and A1 and A2 are ((n − 1) × s) and ((n − 1) × (n − s)) matrices, respectively, such that
.
D ∗ = [A ..A ]:
φ
1
2







A1 = 





−φ 1
0 −φ
..
..
.
.
0
0
0
0
..
..
.
.
0
0
0 ···
1 ···
.. . .
.
.
0 ···
0 ···
.. . .
.
.
0
0
..
.
0
0
..
.
0 ··· 0
0
0
..
.


0 0

0 0


.. ..

. .


A2 =  1 0 0

 −φ 1 0

.. ..
 ..
 .
. .
0 0 0






−φ 

0 

.. 
. 
0
0
0
..
.
···
···
..
.
0
0
..
.
0
0
..
.
···
···
..
.
0
0
..
.
0
0
..
.
· · · −φ 1













Let us consider the lagrangean function
L(y 1h , y 2h , λ) = (A1 y 1h + A2 y 2h − z ∗h )′ (A1 y 1h + A2 y 2h − z ∗h ) − 2λ′ (Cy h − y l ),
that is, equivalently,
L(y1h , y 2h , λ1 , λ2 ) = (A1 y 1h + A2 y 2h − z ∗h )′ (A1 y 1h + A2 y 2h − z ∗h ) − 2λ1 (C 1 y 1h − yl1 ) − 2λ′2 (C 2 y 2h − y 2l ) (39)
where the aggregation matrix C has been partitioned as in (34) and we partitioned the aggregated
′
vector as y l = [yl1 y 2l ]′ , yl1 being the (scalar) LF values of the first period, whereas y 2l has dimension
′
((T − 1) × 1), and the Lagrange multiplier vector as λ = [λ1 λ2l ]′ .
An estimate of y h coherent with the temporal aggregate y l can be obtained by minimizing the lagrangean
function (39). Given the first order condition,

∂L

= 0vet

∂ y 1h


 ∂L = 0vet
∂ y 2h
∂L

0
 ∂λ1 =


 ∂L = 0vet
∂ λ2
67
the solution for y h is given by (see the appendix B)
"
yˆ1h
yˆ2h
#
=
A′1 A1
T 21
where
T 12
T 21
7
−1 "
T 12
A′2 A2
−1 1
#
A1 z ∗h + C ′1 C ′1 (A′1 A1 )−1 C ′1
yl − C 1 (A′1 A1 )−1 A′1 z ∗h
−1 2
.
A2 z ∗h + C ′2 C ′2 (A′2 A2 )−1 C ′2
y l − C 2 (A′2 A2 )−1 A′2 z ∗h
(40)
−1
= A′1 A2 − C ′1 C ′1 (A′1 A1 )−1 C ′1
C 1 (A′1 A1 )−1 A′1 A2
.
′
′
′
′
−1 ′ −1
= A2 A1 − C 2 C 2 (A2 A2 ) C 2
C 2 (A′2 A2 )−1 A′2 A1
(41)
The method of Santos Silva and Cardoso
For notation convenience, let τ = s(t − 1) + u be the index running on the HF periods and re-write model
(25) as follows:
yτ = φyτ −1 + x′τ β + ετ ,
τ = 1, . . . , n.
(42)
Starting from an idea originally formulated by Tserkezos (1991, 1993), SSC suggest to recursively substitute (Klein, 1958) in (42), thus obtaining
τ
−1
X
yτ =
i
φ
x′τ −i
i=0
!
β + φ y0 +
Now, observe that
y0 =
+∞
X
i
φ
x′−i
i=0
τ
τ
−1
X
!
i
φ ετ −i .
i=0
!
β+
+∞
X
i
φ ε−i
i=0
!
,
!
β.
(43)
so that its expected value conditionally to the infinite past is
η = E(y0 |x0 , x−1 , . . .) =
+∞
X
φi x′−i
i=0
Then model (43) can be written as
yτ =
τX
−1
i=0
i
φ
x′τ −i
!
τ
β+φ η+
τ
−1
X
i=0
i
φ ετ −i
!
,
τ = 1, . . . , n.
(44)
According to Harvey (1990), we treat the so-called truncation remainder η as a fixed (and unknown)
parameter to be estimated. In other words, the HF relationship on which the disaggregation procedure
will be based is:
where xφ,τ
yτ = x′φ,τ β + φτ η + uτ ,
uτ = φuτ −1 + ετ , τ = 1, . . . , n,
(45)
P
τ −1 i ′
=
i=0 φ xτ −i is a (K × 1) vector containing the weighted sum of current and past values
of regressors and uτ is an error term generated by an AR(1) stationary process.
Model (45) can be written in matrix form as
y h = X h,φ β + q {phi η + uh = Z h,φ γ + uh ,
68
(46)
.
.
where q φ = (φ, φ2 , φ3 , . . . , φn )′ is a (n × 1) vector, Z h,φ = [X h,φ ..q φ is a (n × (K + 1)) matrix, γ = [β ′ ..η]
is a ((K + 1) × 1) vector of parameters, and , V h being the Toeplitz matrix


1
φ
φ2
· · · φn−2 φn−1
 φ
1
φ
· · · φn−3 φn−2 




φ2
φ
1
· · · φn−4 φn−3 
Vh =
 .
..
..
..
.. 
..
 .

.
 .
.
.
.
. 
φn−1 φn−2 φn−3 · · ·
φ
1
Notice that model (46) can be obtained by transforming the matrix counterpart of model (42), that is
D φ y h = X h β + qη + Q−1 εh = Z h γ + Q−1 εh ,
where D φ and Q are (n × n) matrices given by, respectively,


 p
1
0 0 ··· 0 0
1 − φ2
 −φ 1 0 · · · 0 0 

0






0 −φ 1 · · · 0 0  ,
0
Dφ = 
Q=
 .

..
.. . .
..
.. 
..
 .


.
 .

.
.
.
. 
.
0
0 0 · · · −φ 1
0
0 0
1 0
0 1
.. ..
. .
0 0
(47)
···
···
···
..
.
0
0
0
..
.
0
0
0
..
.
· · · −φ 1




,



q = [φ, 0, 0, . . . , 0]′ is a (n × 1) vector whose unique not zero (and equal to φ) element is the first one and
.
Z
= [X ..q].
h,φ
Given that the inverse of Dφ is the lower triangular matrix

1
0
0
··· 0
 φ
1
0
··· 0


2
−1
φ
φ
1
··· 0

Dφ = 
..
..
.
..
 ..
. ..
 .
.
.
φn−1 φn−2 φn−3 · · · φ
model (46) is obtained by pre-multiplication of (47) by D −1
φ , that is:
0
0
0
..
.
1








−1 −1
y h = D −1
εh = Z h,φ γ + uh ,
φ X h β + Dφ Q
..
−1
−1
where Z h,φ = D −1
φ Z h = [D φ X h .q] and uh = D φ Qε. Finally, it can be easily verified that
−1
−1 ′
D−1
(D −1
) =
φ Q
φ Q
1
V h.
1 − φ2
The aggregated model is given by
y l = CZ h,φ γ + Cuh .
(48)
SSC suggest to estimate φ by maximizing the log-likelihood function
l (φ, β) = −
T
1
1
ln 2π − ln |V l,φ | − (y l,φ − Z l,φ β)′ (V l,φ )−1 (y l,φ − Z l,φ β).
2
2
2
69
(49)
where Z l,φ = CZ h,φ and V l,φ = CV h,φ C ′ , via a scanning procedure on φ variable into the stationarity
region (-1,+1).
With respect to other temporal disaggregation procedures based on dynamic models (Salazar et al., 1998,
Gregoir, 1995), the method by SSC has the advantage of not having any trouble for the estimation of
HF values for the first LF period and, most of all, of producing disaggregated estimates which fulfil the
temporal aggregation constraints along with their standard errors in a straightforward way. In fact, the
estimated values can be obtained according to the classical Chow and Lin’s procedure:
ŷ h = Z h,φ̂ γ̂ h,φ̂ + V h,φ̂ C ′ V −1
[y − Z l,φ̂ γ̂ h,φ̂ ,
l,φ̂ l
−1
γ̂ h,φ̂ = Z ′l,φ̂ V −1
Z
Z ′l,φ̂ V −1
y,
l,φ̂ l,φ̂
l,φ̂ l
while an estimate of the covariance matrix of ŷ h is readily available as
−1
E(ŷ h − y h )(ŷ h − y h )′ = (I n − Lφ̂ C)V h,φ̂ + (X h − Lφ̂ ) X ′l V −1
X
(X h − Lφ̂ )′ ,
l
l,φ̂
with Lφ̂ = V h,φ̂ C ′ V −1
.
l,φ̂
8
Dynamic Chow and Lin extension at work: Two empirical applications
To illustrate the use of the dynamic extension of the classical Chow and Lin’s approach described so
far, in this section we consider two empirical applications. In the first one we replicate (and extend) the
application of SSC, that is we disaggregate annual US personal consumption, obtained by summation
of quarterly data from 1954q1 to 1983q4, using US quarterly personal disposable income (seasonally
adjusted, 1954q1-1983q4) as related indicator. The data have been taken by Table 17.5 of Greene’s
(1997) textbook. In the second example we perform the estimation of monthly Italian industrial value
added by temporal disaggregating the available quarterly series (1970q1-2001q3, seasonally adjusted, at
constant prices, source: Istat) using the monthly Italian industrial production index (1970:01-2001:09,
seasonally adjusted, source: Bank of Italy) as related series.
While in the former case the evaluation of the results can be made with respect to the ‘true’ HF data,
in the latter the results are to be judged only in terms of goodness-of-fit and ‘economic reasonableness’
of the estimated quarterly regression model and, eventually, by comparing different estimates.
In both cases a preliminary analysis of the dynamical properties of the available series is performed, in
order to assess the use of a dynamic model. A number of estimates have been then calculated according
to different specifications, both in the original data and in log-transformed form.
8.1
Quarterly disaggregation of US personal consumption
The series of interest (at annual frequency, see fig. 6) show clear upward trends. As regards their dynamic
properties, in short we find (tab. 3 and 4) that the individual series, in both levels and logs, are I(1) but
the residual based ADF tests (tab. 5) do not clearly support the cointegration hypothesis.
For illustrative purposes we pursued estimates according to the following variants of a first order dynamic
model25 using the method by SSC:
25 A feasible model selection strategy could be based on post-sample comparison of forecasts obtained according to
different LF observable models (Abeysinghe, 1998).
70
Figure 6: Annual time series used in the first temporal disaggregation example
Table 3: Unit roots tests for annual US consumption (1953-1984)
It should be noted that variant 1 has been considered by SSC in their original paper. However, the nonsignificant intercepts in both variants 1 and 3 (see tab. 6) suggest to adopt either variant 2 (dynamic
model in levels without intercept) or variant 4 (dynamic model in logarithms without intercept). As
shown in tab. 7, where also the results obtained via the classical approaches by Chow and Lin, Fernández
and Litterman are considered26 , variant 4 of the dynamic model offers the best performance in terms
of errors in levels and quarterly percentage changes, while it is outperformed by estimates according to
Litterman if annual percentage changes are considered.
The visual inspection of actual and estimated (through variant 4) levels and percentage changes (fig. 7)
confirms that the results are surely very good in terms of levels and annual changes, while the ability of
26 The
reported estimates for Chow and Lin and Litterman are different from (and correct) those in SSC.
71
Table 4: Unit roots tests for annual US disposable income (1953-1984)
Table 5: Residual-based cointegration tests: ADF(1) on Greene’s data
the estimated quarterly changes to capture the ’true’ dynamics of the series is less pronounced.
Figure 7: Actual (continuous line) and estimated (through variant 4 dynamic model, dotted line) levels and
percentage changes of US personal consumption
Finally, it should be noted that the results obtained using the method by Salazar et al. (last row of
tab. 6 and last column of tab. 7) are practically the same as those obtained according to SSC. As a
72
Table 6: Estimates of parameter of the auxiliary annual regression on Greene’s data
consequence, the discrepancies between the two estimates according variant 4 dynamic model (fig. 8)
are practically negligible.
Figure 8: Discrepancies between two estimates (obtained through variant 4 dynamic model according to either
SSC or Salazar et al.) of US personal consumption
8.2
Monthly disaggregation of quarterly Italian industrial value added
The series to be disaggregated and the chosen indicator are represented in fig. 9. As confirmed by the
unit roots tests (tab. 8 and 9), both series are I(1). Moreover, the residual based ADF test (tab. 10) is
coherent with the hypothesis of cointegration.
73
Table 7: Performance indicators of the disaggregated estimates
Figure 9: Quarterly time series used in the second temporal disaggregation example
Tab. 11 contains parameters’ estimates for dynamic models in both levels and logarithms, and precisely
according to variants 1, 2 and 3 (that is, model in levels with or without intercept, and model in logs with
intercept, which in this last case turns out to be significant). Concentrating on the estimates obtained
through variants 2 and 3, we find that the HF estimated values are very similar, as the discrepancies
reported in fig. 10 clearly show.
To conclude this practical example, the estimated monthly series of Italian industrial value added is
represented in fig. 11. Estimates obtained following SSC and Salazar et al. are practically the same in
this case too, and have not been reported.
74
Table 8: Unit roots tests for quarterly Italian industrial value added (1970q1-2001q3)
Table 9: Unit roots tests for quarterly Italian industrial production (1970q1-2001q3)
75
Table 10: Residual-based cointegration tests: ADF(4) on Italian data
Table 11: Estimates of the auxiliary quarterly regression on Italian data
Figure 10: Discrepancies between two estimates (obtained through variants 2 and 3 of a dynamic model) of
monthly Italian industrial value added
76
Figure 11: Monthly Italian industrial value added estimated through variant 3 dynamic model
77
References
Aadland D.M. (2000), Distribution and interpolation using transformed data, Journal of Applied Statistics, 27: 141-156.
Abeysinghe T. (2000), Modeling variables of different frequencies, International Journal of Forecasting,
16: 117-119.
Abeysinghe T. and A.S. Tay (2000), Dynamic regressions with variables observed at different frequencies
(mimeo).
Al-Osh M. (1989), A dynamic linear model approach for disaggregating time series data, Journal of
Forecasting, 8: 85-96
Astolfi R., D. Ladiray, G.L. Mazzi, F. Sartori and R. Soares (2000), A monthly indicator of GDP for the
Euro-zone, Eurostat (mimeo).
Barcellan R. and T. Di Fonzo (1994), Disaggregazione temporale e modelli ARIMA, Società Italiana di
Statistica, Atti della XXXVII Riunione Scientifica, 2: 355-362.
Bloem A., R.J. Dippelsman and N.Ø. Mæhle, Quarterly National Accounts Manual. Concepts, data
sources, and compilation, Washington DC, International Monetary Fund.
Boot J.C.G., W. Feibes and J.H.C. Lisman (1967), Further methods of derivation of quarterly figures
from annual data, Cahiers Economiques de Bruxelles, 36: 539-546.
Bournay J. and G. Laroque (1979), Réflexions sur la méthode d’élaboration des comptes trimestriels,
Annales de l’INSEE, 36: 3-30.
Chan (1993), Disaggregation of annual time-series data to quarterly figures: a comparative study, Journal
of Forecasting, 12: 677-688.
Chow G. and A.L. Lin (1971), Best linear unbiased interpolation, distribution and extrapolation of time
series by related series, The Review of Economics and Statistics, 53: 372-375.
Chow G. and A.L. Lin (1976), Best linear unbiased estimation of missing observations in an economic
time series, Journal of the American Statistical Association, 71: 719-721.
Davidson R. and J.G. MacKinnon (1993), Estimation and inference in econometrics, Oxford, Oxford
University Press.
Denton F.T. (1971), Adjustment of monthly or quarterly series to annual totals: An approach based on
quadratic minimization, Journal of the American Statistical Association, 66: 99-102.
Engle R.F. and C.W. Granger (1987), Co-integration and error correction: representation, estimation
and testing, Econometrica, 55: 251-276.
Di Fonzo T. (1987), La stima indiretta di serie economiche trimestrali, Padova, Cleup.
Di Fonzo T. (1994), Temporal disaggregation of a system of time series when the aggregate is known.
Optimal vs. adjustment methods, paper presented at the INSEE-Eurostat Quarterly National Accounts
workshop, Paris-Bercy, December 1994.
78
Eurostat (1999), Handbook of quarterly national accounts, Luxembourg, European Commission.
Fernández R.B. (1981), A methodological note on the estimation of time series, The Review of Economics
and Statistics, 63: 471-478.
Ginsburgh V.A. (1973), A further note on the derivation of quarterly figures consistent with annual data,
Applied Statistics, 22: 368-374.
Gómez V. (2000), Estimating missing observations in ARIMA models with the Kalman filter, forthcoming
in Annali di Statistica, Rome, Istat.
Greene W.H. (1997), Econometric analysis, Upper Saddle River, Prentice-Hall.
Gregoir S. (1995), Propositions pour une désagrégation temporelle basée sur des modèles dynamiques
simples, INSEE (mimeo).
Gudmundsson G. (1999), Disaggregation of annual flow data with multiplicative trends, Journal of
Forecasting, 18:33-37.
Guerrero V.M. (1990), Temporal disaggregation of time series: an ARIMA-based approach, International
Statistical Review, 58: 29-46.
Guerrero V.M. and J. Martinez (1995), A recursive ARIMA-based procedure for disaggregating a time
series variable using concurrent data, TEST, 2: 359-376.
Harvey A.C. (1980), On comparing regression models in levels and first differences, International Economic Review, 21: 707-720.
Harvey A.C. (1989), Forecasting, structural time series and the Kalman filter, Cambridge University
Press, Cambridge.
Harvey A.C. (1990), The econometric analysis of time series, Philip Allan, Deddington, Oxford.
Harvey A.C. and S.J. Koopman (1997), Multivariate Structural Time Series Models (with comments),
in C. Heij, J.M. Shumacher, B. Hanzon and C. Praagman (eds.), System Dynamics in Economic and
Financial Models, Wiley, New York: 269-298.
Hotta L.K. and K.L. Vasconcellos (1999), Aggregation and disaggregation of structural time series models, Journal of Time Series Analysis, 20: 155-171.
Jacobs J. (1994), ’Dividing by 4’: a feasible quarterly forecasting method?, Department of Economics,
University of Groningen (mimeo).
Klein L.R. (1958), The estimation of distributed lags, Econometrica, 26: 553-565.
Litterman R.B. (1983), A random walk, Markov model for the distribution of time series, Journal of
Business and Economic Statistics, 1: 169-173.
Maddala G.S. (2001), Introduction to econometrics, third edition, New York, Wiley.
Moauro G. and G. Savio (2001), Disaggregation of time series using common components models,
(mimeo).
79
Newey W. and K. West (1994), Automatic lag selection in covariance matrix estimation, Review of
Economic Studies, 61: 631-653.
Pinheiro M. and C. Coimbra (1992), Distribution and Extrapolation of Time Series by Related Series
Using Logarithms and Smoothing Penalties, Gabinete de Estudios Económicos, Instituto Nacional de
Estatistica, Working Paper.
Proietti T. (1999), Distribution and interpolation revisited: a structural approach, Statistica, 58: 411432.
Rossana R.J. and J. Seater (1995), Temporal aggregation and economic time series, Journal of Business
& Economic Statistics, 13: 441-451.
Salazar E.L., R.J. Smith and M. Weale (1997), Interpolation using a Dynamic Regression Model: Specification and Monte Carlo Properties, NIESR Discussion Paper n. 126.
Salazar E.L., R.J. Smith, M. Weale and S. Wright (1994), Indicators of monthly national accounts,
presented at the I.N.S.E.E.-EUROSTAT ’Quarterly National Accounts Workshop’, Paris-Bercy, 5-6 December, 1994 (mimeo).
Salazar E.L., R.J. Smith, M. Weale and S. Wright (1998), A monthly indicator of UK GDP (mimeo).
Santos Silva J.M.C. and F.N. Cardoso (2001), The Chow-Lin method using dynamic models, Economic
Modelling, 18: 269-280.
Tserkezos D.E. (1991), A distributed lag model for quarterly disaggregation of the annual personal
disposable income of the Greek economy, Economic Modelling, 8: 528-536.
Tserkezos D.E. (1993), Quarterly disaggregation of the annualy known Greek gross industrial product
using related series, Rivista Internazionale di Scienze Economiche e Commerciali, 40: 457-474.
van der Ploeg F. (1985), Econometrics and inconsistencies in the national accounts, Economic Modelling,
2: 8-16.
Wei W.W.S. and D.O. Stram (1990), Disaggregation of time series models, Journal of the Royal Statistical
Society, 52:453-467.
80
Appendix A. The general dynamic formulation of Salazar et al.
Salazar et al. (1997, 1998) consider the following non linear dynamic regression model linking the K
observed indicator variables xjt,u , j = 1, . . . , K, to the unobserved HF interpoland yt,u :
α(L)f (yt,u ) = β0 +
K
X
βj (L)xjt,u + εt,u ,
u = 1, . . . , s,
t = 0, 1, . . . , T,
(50)
j=1
Pp
Pqj
where α(L) = 1 − i=1 αi Li and βj (L) = 1 − i=1
βj,k Lj are scalar lag polynomials of orders p and qj
respectively, operating on the transformed unobserved HF dependent variable, f (yt,u ), and the observed
HF indicator variables xjt,u , j = 1, . . . , K. The functional form f (·) used in constructing the interpoland
in (50) is assumed known.
The possibility that the dependent variable in (50) is a non-linear function of the interpoland yt,u reflects
a frequent occurrence in applied macro-econometric research; for example, a logarithmic transformation
is often employed. Of course, the exogenous indicator variables {xjt,u } may themselves also be transformations of other underlying variables. It is assumed that the lag lengths p and qj , j = 1, . . . , K, are
chosen sufficiently large so that the error terms may be assumed to possess zero mean, constant variance,
and to be serially uncorrelated and uncorrelated with lagged values of f (yt,u ) and current and lagged
values of {xjt,u }.
The regression equation (50) is quite general. For example, if αi = 0 for all i > 0, then the model is
essentially static in level of f (yt,u ). If α1 = −1 and αi = 0 for all i > 1, then the model involves the
HF first difference of f (yt,u ). Other values for the parameters {αi } allow a general specification of the
dynamics in (50). In the special case in which the sum of the coefficients on the dependent variable
P
is unity, i αi = 1, the left hand side of (50) may be re-expressed as a scalar lag polynomial of order
Pp
p − 1 operating on the first difference of the dependent variable f (yt,u ). When
i=1 αi 6= 1, there
is a long-run relationship linking f (yt,u ) and {xjt,u }, j = 1, . . . , K; in particular, if f (yt,u ) and {xjt,u },
j = 1, . . . , K, are difference stationary, there exists a co-integrating relationship between f (yt,u ) and
Pp
{xjt,u }, j = 1, . . . , K. Furthermore, in this case, a test of the restriction i=1 αi = 1 corresponds to a
test of the null hypothesis that there is no co-integrating relationship (Engle and Granger, 1987).
A straightforward application of lag polynomials can transform the dynamic model so that only observed
frequencies appear. For, it is possible to transform (50) into a regression equation involving only s-order
lags of f (yt,u ) by pre-multiplying α(L) by a suitable polynomial function of the HF lag operator L whose
coefficients depend on {αi }. Salazar et al. (1998) consider only the case in which the maximum lag
length is p = 1, but they claim that a more general solution for p > 1 can be found by factoring α(L) in
terms of its roots and treating each factor using the method valid for p = 1 (see Astolfi et al., 2000).
Now, let λ(L) = 1 + λ1 L + . . . + λ(s−1)p L(s−1)p be a lag polynomial of order (s − 1)p such that
λ(l)α(L) = π(L),
where π(L) is the lag polynomial of order ps
π(L) = 1 − π1 L − . . . − πps Lps
such that πj = 0 if k = s(j − 1) + 1, . . . , sj − 1, for some j = 1, 2, . . . , p, that is
π(L) = 1 − πs Ls − π2s L2s − . . . − pips Lps .
81
(51)
Notice that pre-multiplying model (50) by the transformation polynomial λ(L), the HF unobserved model
is converted into the LF observed model
π(L)f (yτ ) = λ(1)β0 + λ(L)
K
X
βj (L)xjτ + λ(L)ετ ,
τ = (p + 1)s, (p + 2)s, . . . , T s,
j=1
where, as in section 7, for notation convenience we let τ = s(t − 1) be the index running on the HF
periods. For example, if p = 1 the transformation polynomial is
λ(L) = 1 + α1 L + . . . + αs−1
Ls−1 ,
1
as we showed in section 6, where α1 = φ. As a consequence, πs = αs1 and πj = 0, j 6= s.
Abeysinghe and Tay (2000)27 provide the general solution to the problem of finding the coefficients of
the lag transformation polynomial as functions of the autoregressive parameters in the original model
(50).
Appendix B. Derivation of the solution to the constrained optimization problem (38)
Partial derivatives of the lagrangean function (39) with respect to y 1h and y 2h are, respectively,
∂L
∂ y 1h
∂L
∂ y 2h
=
2A′1 A1 y 1h + 2A′1 A2 y 2h − 2A′1 z ∗h − 2λ1 C ′1
=
2A′2 A1 y 1h + 2A′2 A2 y 2h − 2A′2 z ∗h − 2C ′2 λ2
.
(52)
Equating (52) to zero and solving for λ1 and λ2 gives:
λ1 C ′1
C ′2 λ2
= A′1 A1 y 1h + A′1 A2 y 2h − A′1 z ∗h
.
= A′2 A1 y 1h + A′2 A2 y 2h − A′2 z ∗h
Pre-multiplying (53) by the block-diagonal matrix
"
−1
C 1 A′1 A1
0vet
C2
0vet′
−1
A′2 A2
(53)
#
and using the constraints C 1 y 1h = yl1 and C 2 y 2h = y 2l , we obtain
(
−1 ′
−1 ′
−1 ′ ∗
λ1 C 1 A′1 A1
C 1 = yl1 + C 1 A′1 A1
A1 A2 y 2h − C 1 A′1 A1
A z
−1 ′
−1 ′
−1 1′ h∗ .
′
′
′
2
1
C 2 A2 A2
C 2 λ2 = y h + C 2 A2 A2
A2 A1 y h − C 2 A2 A2
A2 z h
The Lagrange multipliers are thus given by
−1 1
−1 ′
−1 ′ ∗
λ1 = C 1 (A′1 A1 )−1 C ′1
[yl + C 1 A′1 A1
A1 A2 y 2h − C 1 A′1 A1
A1 z h ]
.
′
′
′ 1
′
−1 ′ −1
2
−1 ′
−1 ′ ∗
λ2 = C 2 (A2 A2 ) C 2
[y l + C 2 (A2 A2 ) A2 A1 y h − C 2 (A2 A2 ) A2 z h
Equating (52) to zero and substituting (54) gives the following system
#
−1 1
′
1 " ′ ∗
A1 A1
T 12
yh
A1 z h + C ′1 C 1 (A′1 A1 )−1 C ′1
[y l − C 1 (A′1 A1 )−1 A′1 z ∗h
=
−1 2
T 21
A′2 A2
y 2h
A′2 z ∗h + C ′2 C 2 (A′2 A2 )−1 C ′2
[y l − C 2 (A′2 A2 )−1 A′2 z ∗h
(54)
(55)
27 Abeysinghe and Tay (2000) consider a slightly different notation, using α(L) = 1 + α L + . . . + α Lp instead of
p
1
α(L) = 1 − α1 L − . . . − αp Lp .
82
where
−1
T 12 = A′1 A2 − C 1 C 1 (A′1 A1 )−1 C ′1
C 1 (A′1 A1 )−1 A′1 A2 ,
−1
T 21 = A′2 A1 − C 2 C 2 (A′2 A2 )−1 C ′2
C 2 (A′2 A2 )−1 A′2 A1 .
Expression (40) is just the solution of system (55), that is
#
−1 1
1 ′
−1 "
ŷ h
A1 A1
T 12
A1 z ∗h + C ′1 C 1 (A′1 A1 )−1 C ′1
[yl − C 1 (A′1 A1 )−1 A′1 z ∗h
=
−1 2
.
ŷ 2h
T 21
A′2 A2
A2 z ∗h + C ′2 C 2 (A′2 A2 )−1 C ′2
[y l − C 2 (A′2 A2 )−1 A′2 z ∗h
83
84
Temporal Disaggregation by State Space Methods: Dynamic Regression
Methods Revisited
di Tommaso Proietti (Dipartimento di Scienze Statistiche, Università di Udine)
Abstract
The paper documents and illustrates state space methods that implement time series disaggregation by regression methods, with dynamics that depend on a single autoregressive
parameter. The most popular techniques for the distribution of economic flow variables, such
as Chow-Lin, Fernández and Litterman, are encompassed by this unifying framework.
The state space methodology offers the generality that is required to address a variety of
inferential issues, such as the role of initial conditions, which are relevant for the properties of
the maximum likelihood estimates and for the the derivation of encompassing representations
that nest exactly the traditional disaggregation models, and the definition of a suitable set of
real time diagnostics on the quality of the disaggregation and revision histories that support
model selection.
The exact treatment of temporal disaggregation by dynamic regression models, when the
latter are formulated in the logarithms, rather than the levels, of an economic variable, is
also provided.
The properties of the profile and marginal likelihood are investigated and the problems with
estimating the Litterman model are illustrated. In the light of the nonstationary nature
of the economic time series usually entertained in practice, the suggested strategy is to fit
an autoregressive distribute lag model, which, under a reparameterisation and suitable initial conditions, nests both the Chow-Lin and the Fernández model, thereby incorporating
our uncertainty about the presence of cointegration between the aggregated series and the
indicators.
1
Introduction
Temporal disaggregation methods play an important role for the estimation of short term economic
indicators. This is certainly the case for a number of European countries (Eurostat, 1999), including
France, Italy, Spain, Belgium and Portugal, whose national statistical institutes make extensive use of
those methods, among which the Chow-Lin (Chow and Lin, 1971) procedure stands out prominently,
for constructing the quarterly national economic accounts from annual figures, using a set of indicators
available at the quarterly frequency. As a consequence, a large share of the Euro area quarterly gross
domestic product is actually estimated by disaggregation techniques.
85
Interpolation and distribution are the two facets of the disaggregation problem. The former deals with
the estimation of the missing values of a stock variable at points in time that have been systematically
skipped by the observation process; the latter arises when measurements over flow variables is in the
form of a linear aggregate, typically the total or the average over s consecutive periods. We shall consider
as the leading case of interest the situation when the aggregate series, concerning the annual totals of an
economic variable, have to be distributed across the quarters, using related series that are available for
the shorter subperiod.
This paper concentrates on a set of dynamic regression methods that depend on a single autoregressive
parameter and a regression kernel aiming at capturing the role of related indicators, and that encompass
the most popular techniques such as Chow-Lin (1971), Fernández (1981) and Litterman (1983), which
are are based on a regression model with autocorrelated errors generated respectively by a first order
autoregressive (AR) process, a random walk and an ARIMA(1,1,0) process. In addition, we consider
autoregressive distributed lag models (ADL) that nest the traditional models. See Di Fonzo (2003) for a
review and additional references. Discussion will be limited to the disaggregation problem; interpolation
poses slightly different issues and will not be considered further.
The class of models investigated is very restricted and its choice requires some motivation. First and
foremost, the investigation focuses on popular methods that have widespread application. During the
process of implementing them, for comparison with more sophisticated methods, we felt that certain
aspects were relatively unexplored, despite the large literature on the topic. For instance, from the
empirical standpoint, what seemed to be a plausible, stripped to the bone, representation, in the lack of
cointegration (Engle and Granger, 1987) between the series and the indicator variable, i.e. the Litterman
model, could not be estimated reliably, which called for further analysis. The role of initial conditions
and deterministic components was another unsettled issue.
Secondly, we adhere to the idea that the disaggregated model should be kept relatively simple: as it will
become clear in the course of the discussion, with particular reference to the Litterman model, parsimonious modelling is particularly compelling here, since the aggregate data may not be very informative
on the parameters of the disaggregate model.
Finally, there are excellent papers covering other methodologies, among which we mention Harvey and
Chung (2000) and Moauro and Savio (2002). These references implement a genuinely multivariate time
series approach to the disaggregation problem, which overcomes some of the limitations of the regression
based methods, namely the assumption of exogeneity of the indicators, see also Harvey (1989, sec 8.7.1),
and the assumption that the indicators are measured free of measurement error.
The unifying framework of the paper is the state space representation and the associated methods. The
statistical treatment follows Harvey (1989, ch. 6, sec. 3): starting from the the state space representation
for the disaggregated model, the derived state space model handling the aggregated observations is
derived, augmenting the state vector for a cumulated variable that is only partially observed. This
converts the disaggregation problem onto a missing values problem, that can be addressed by skipping
certain updating operations in the filtering and smoothing equations.
Particular attention is devoted to the exact initialisation of the models, and the treatment of regression
effects and initial conditions as fixed or diffuse. Diffuseness embodies uncertainty about initial conditions
or parameters and model nonstationarity. The issue is of fundamental importance for the properties of
the maximum likelihood estimates and for nesting exactly the regression methods within more general
ADL models. As far as the latter are concerned, the illustrative examples presented in the paper provide
86
further support for the modelling strategy set outlined in Hendry and Mizon (1978), which leads to
entertain the Chow-Lin or the Fernández model provided certain common factor restrictions prove valid.
The augmented Kalman filter and smoother proposed by de Jong (1991) is the key algorithm for evaluating the likelihood, and for defining the set of time series innovations, that can be used for diagnostic
checking and addressing the issue of revision of the real time estimates when the aggregate information
accrues. The relevance of diagnostic checking is usually neglected in the literature, one reason being that
the innovations are not automatically available from the implementation of these methods in a classical
regression framework. The role of revision histories for model selection is illustrated using a real life
example.
The paper also provides an exact treatment of temporal disaggregation by dynamic regression models
when the disaggregated model is formulated in the logarithms, rather than the levels, of an economic
variable. This is usually the case for flows measured on a ratio scale, such as production and income; the
logarithm provides the natural transformation under which the usual assumptions concerning the linear
model (linearity, homoscedasticity and normality of errors) are plausible.
The paper is structured as follows: section 2 introduces the main disaggregated models and their state
space representation. Section 3 discusses how the latter is modified as a consequence of temporal aggregation. The statistical treatment for linear disaggregation methods is the topic of section 4, dealing
with evaluation of the likelihood, marginalisation of regression effects, diagnostic checking, filtering and
smoothing. Section 5 addresses the problem of distribution of flows that are modelled on a logarithmic
scale.
The reader that is more interested in the applications may skip some of the more technical parts in
sections 3-5, which nevertheless form integral part of the paper, and move to section 6, which contains
five illustrations concerning the role of different assumptions on initial conditions, the virtues of nesting
the traditional procedures in more general autoregressive distributed lag models, the problems with the
estimation of the Litterman model, and the use of nonlinear disaggregation techniques for ensuring that
the estimated series can take only admissible values.
Finally, a separate appendix documents a set of functions, written in Ox (see Doornik, 2001), implementing the procedures discussed in this paper.
2
Disaggregated Time Series Models
The disaggregated time series models considered
tation:
yt = z ′ αt + x′t β,
αt = T αt−1 + Wt β + Hǫt ,
α1 = a1 + W1 β + H1 ǫ1 ,
in this paper admit the following state space represent = 1, . . . , n,
t = 2, . . . , n
ǫt ∼ NID(0, σ 2 ),
(1)
2
β ∼ N(b, σ V ).
It should be noticed that the measurement equation does not feature a measurement error, and that the
system matrices are time invariant. Both restrictions can be easily relaxed.
The vectors xt and the matrices Wt contain exogenous regressors that enter respectively the measurement
equation and the transition equation and zero elements corresponding to effects that are absent from one
or the other equations. They are usually termed ”indicators” or ”related variables” in the literature.
87
The initial state vector, α1 , is expressed as a function of fixed and known effects (a1 ), random stationary
effects (H1 ǫ1 , where the notation stresses that H1 may differ from H), and regression effects, W1 β.
Two assumptions can be made concerning β: (i) β is considered as a fixed, but unknown vector (V → 0);
this is suitable if it is deemed that the transition process governing the states has started at time t = 1;
(ii) β is a diffuse random vector, i.e. it has an improper distribution with a mean of zero (b = 0) and an
arbitrarily large variance matrix (V −1 → 0). This is suitable if the process has started in the indefinite
past.
The first case has been considered by Rosenberg (1973), who showed that β can be concentrated out
of the likelihood function, whereas (ii) is considered in de Jong (1991). Diffuseness expresses parameter
uncertainty or the nonstationarity of a particular state component and entails marginalising the likelihood
with respect to the parameter vector β.
The representation is sufficiently rich to accommodate the traditional linear disaggregation techniques
proposed in the literature. The rest of this section introduces the main models currently in use, discussing
its state space representation and the initialisation issue, which turns out to be crucial for the comparison
of the different specifications.
2.1
The Chow-Lin model
The Chow-Lin (Chow and Lin, 1971, CL henceforth) disaggregation method is based on the assumption
that yt can be represented by a linear regression model with first order autoregressive errors:
yt = αt + x′t β, αt = φαt−1 + ǫt , ǫt ∼ NID(0, σ 2 ),
(2)
with |φ| < 1 and α1 ∼ N(0, σ 2 /(1 − φ2 )).
The model is thus a particular case of (1), with scalar αt , system matrices z = 1, T = φ, H = 1. As far
the initial conditions are concerned, as αt is a stationary zero mean AR(1) process, it is assumed that
the process applies since time immemorial, giving α1 ∼ N 0, σ 2 /(1 − φ2 ) , which amounts to setting:
a1 = 0, W1 = 0, and H1 = (1 − φ2 )−1/2 .
If some elements of xt are nonstationary, the CL model postulates full cointegration between them and
the series yt .
Deterministic components (e.g. a linear trend) are handled by including appropriate regressors in the
P
set xt , e.g. by setting xt = [1, t, x3t , . . . , xkt ]′ , and writing yt = µ + γt + j βj xjt + αt , with the first two
elements of β being denoted µ and γ.
Alternatively, they can be accommodated in the transition equation, which becomes αt = φαt−1 + m +
gt + ǫt . The state space form corresponding to this case features Wt = [1, t, 0′ ], for t > 1, whereas
W1 = (1 − φ)−1 , (1 − 2φ)/(1 − φ)2 , 0′ . The first two elements of the vector of exogenous regressors xt
are zero, since m and g do not enter the measurement equation.
In fact, if it is assumed that the new transition model has applied since time immemorial,


∞
X
m
ǫ1
m
1 − 2φ
ǫ1
α1 =
+ g 1 − φ
jφj  +
=
+
g+
,
2
1−φ
1
−
φL
1
−
φ
(1
−
φ)
1
−
φL
j=1
88
recalling
P∞
j=0
jφj = φ/(1 − φ)2 ; L is the lag operator, such that Lj yt = yt−j .
Under fixed regression coefficients, the two alternative representations are exactly equivalent, with
µ=
1
φ
1
m−
g, γ =
g.
1−φ
(1 − φ)2
1−φ
(3)
A difference arise with respect to the definition of the marginal likelihood when these coefficients are
diffuse, as we shall see in section 4.1.
First of all, it ought to be noticed that first order differences ∆yt = yt − yt−1 eliminate the constant term,
whereas second order differences are required, ∆2 yt , so as to eliminate dependence on the coefficient of
the linear trend. This is true of both parameterisations, with one notable exception that arises for the
second one, when φ = 1, in which case ∆2 αt = g + ∆ǫt .
Denoting σγ2 = Var(γ) and σg2 = Var(g), from (3) we find that, for instance, σγ2 = σg2 /(1 − φ)2 , so that a
diffuse γ, σγ2 → ∞, arises both for σg2 → ∞ and φ → 1.
2.2
The Litterman and Fernández models
According to the Litterman (1983) model, the temporally disaggregated process is a regression model
with ARIMA(1,1,0) disturbances:
yt = x′t β + ut ,
∆ut = φ∆ut−1 + ǫt .
(4)
Litterman explicitly assumes that the ut process has started off at time t = 0 with u0 = ∆u0 = 0
(Litterman, 1983, last paragraph of page 170). This is usually inadequate, unless the set of indicators
includes a constant (which would capture the effect of the initial value); the inclusion of a linear trend
amounts to allowing for non zero drift in the ARIMA(1,1,0) process.
The Fernández (1981) model arises in the particular case when φ = 0 and thus ut is a random walk.
The state space representation of (4) is obtained by defining the state vector and system matrices as
follows:
ut−1
1 1
0
′
αt =
, z = [1, 1], T =
, H=
.
∆ut
0 φ
1
The Litterman initialisation implies u1 = u0 + φ∆u0 + ǫ1 = ǫ1 , which is implemented casting:
a1 = 0, W1 = 0, H1 =
0
1
.
Alternatively, including u−1 in the vector β as its first element, in which case xt features a zero element
in first position, and assuming that the stationary process has started in the indefinite past, the initial
conditions are:
1
1 0′
1
0
p
,
H
=
W1 =
+
.
1
0 0′
1
1 − φ2 φ
89
This follows from writing
u0
∆u1
=
u−1
0
+
1
φ
∆u0 +
0
1
ǫ1 ,
and taking ∆u0 ∼ N(0, σ 2 /(1 − φ2 )), ǫ1 ∼ N(0, σ 2 ). The diffuse nature of u−1 arises from the nonstationarity of the model.
It should be noticed that in this second setup we cannot include a constant in xt , since this effect is
captured by u−1 .
Finally, the ARIMA(1,1,0) process can be extended to include a constant and a trend in ∆ut = φ∆ut−1 +
m + gt + ǫt ; the parameters m and g are incorporated in the vector β and the matrices W1 and Wt are
easily extended; for instance, if β = [u−1 , m, g, β2′ ]′ where β2 corresponds to the regression effects affecting
only the measurement equation,
"
#
1
0
0
0′
0 0 0 0′
,
W
=
, xt = [0, 0, 0, x′2t ]′ .
W1 =
t
1−2φ
1
′
′
0
0 1−φ
0
1
t
0
2
(1−φ)
The alternative is to include a trend in the measurement equation; however, the remarks concerning the
inclusion of a constant when the starting value is already incorporated in the vector β continue to hold.
2.3
Autoregressive Distributed Lag Models
Following Hendry and Mizon (1978), it is well known that both the CL and Litterman models can be
nested within a more general dynamic regression model.
Consider the Autoregressive Distributed Lag model known as ADL(1,1), which takes the form:
yt = φyt−1 + m + gt + x′t β0 + x′t−1 β1 + ǫt ,
ǫt ∼ NID(0, σ 2 ).
(5)
Under suitable assumptions about initial conditions, the ADL(1,1) model nests the CL regression model
with AR(1) errors: in particular if
β1 = −φβ0
the AR polynomial and the distributed lag polynomial (β0 + β1 L) share a ”common factor”, and the
model (5) can be rewritten as yt = x′t β0 + αt where αt is the AR(1) process given in (2). The ADL(1,0)
arises instead if β1 = 0.
The benefits of this representation versus a simple regression with AR(1) errors are thoroughly discussed
in Hendry and Mizon (1978), especially as a more effective modelling framework for avoiding the insurgence of spurious regressions between economic time series. See also Banerjee et al. (1993, chapter
2). Section 6.2 illustrates some other interesting features relating more specifically to the disaggregation
problem.
The state space representation is
yt
αt
=
=
αt
φαt−1 + Wt β + ǫt
90
(6)
with system matrices z ′ = 1, T = φ, H = 1, Wt = [1, t, x′t , x′t−1 ]; notice that differently from the CL
model the regression effects are all included in the transition equation. The β vector has elements
β = [m, g, β0′ , β1′ ]′ .
As for initial conditions, assuming that the process started in the indefinite past, exact nesting of the
CL model occurs if one posits:
y1 = α1 =
1
1 − 2φ
1
1
m+
g+
x′1 (β0 + β1 ) +
ǫ1 ,
2
1−φ
(1 − φ)
1−φ
1 − φL
(7)
which corresponds to setting
a1 = 0, W1 =
1
1 − 2φ ′ ′
1
1,
, x1 , x1 , H1 = p
.
1−φ
1−φ
1 − φ2
Other initialisations are possible:
• y1 can be considered as a fixed value, which would amount to assuming that the state space model
(6) holds from t = 2 onwards, with α2 ∼ N(φy1 + m + 2g + x′2 β0 + x′1 β1 , σ 2 ), (in terms of the state
space representation a2 = φy1 , W2 = [1, 2, x′2 , x′1 ], H2 = 1). This solution has no practical relevance
in our case since, due to temporal aggregation, y1 is not available.
• y1 ∼ N (c + x′1 β, σ 2 ): y1 is random and the process is supposed to have started at time t = 0 with
a value which is fixed, but unknown; c is an additional parameter that is included in the vector β,
e.g. as the first element, so that β = [c, m, g, β0′ , β1′ ]′ .
The corresponding state space form has a1 = 0, W1 = [1, 1, 1, x′t , 0′ ], H1 = 1, and Wt = [0, 1, t, x′t , x′t−1 ], t >
1.
• Notice that when y1 is random and the process is supposed to have started in the indefinite past:
y1 =
∞
X
1 − 2φ
1
1
m+
g
+
φj−1 (x′2−j β0 + x′1−j β1 ) +
ǫ1 ,
2
1−φ
(1 − φ)
1 − φL
j=1
This poses the issue either of making xt available before the initial time and of truncating the
infinite sum on the right hand side, or, in general of back-casting the regressors. Were xt a random
walk, the initialisation would be as in (7).
Encompassing the nonstationary case
The treatment has hitherto focused on the stationary case. Formally, the ADL(1,1) with φ = 1 and
β1 = −β0 nests the Fernández model using ∆xt as an indicator. Nevertheless, to allow for diffuse
initialisation, due to nonstationarity, it is preferable to use a different parameterisation, which is obtained
by replacing m and g from (3):
yt
=
φ(yt−1 + γ) + (1 − φ)(µ + γt) + x′t β0 + x′t−1 β1 + ǫt .
(8)
The process can be initialised with y1 = µ + γ + (1 − φ)−1 x′t (β0 + β1 ) + (1 − φ2 )−1/2 ǫ1 ; treating µ as
diffuse allows to incorporate the uncertainty about the cointegrating relationship between the aggregate
series and the indicators, and effectively amounts to estimating the ADL model in first differences. The
91
drift parameter γ can also be marginalised out of the likelihood (see Shephard, 1993), in which case
the marginal likelihood is based on the second order differences of the observations; a problem arises
instead with the marginalisation of the effects associated to the indicators; as a matter of fact, unless
the common factor restriction is enforced, the data transformation which marginalises the parameters
β0 , β1 depends on φ.
Since later on we shall argue that the further flexibility of the Litterman model is more apparent than
real, due to the problem one faces in its estimation, fitting (8) is the suggested strategy, if it is desired that
the inferential process, concerning parameter estimation and disaggregation, embodies the uncertainty
on the time series characterisation of the dynamic relationship between the variables.
The ADL(1,1) model can also be formulated in the first differences of yt :
∆yt = φ∆yt−1 + x′t β0 + x′t−1 β1 + ǫt , ǫt ∼ WN(0, σ 2 ).
when xt is the first difference of the variables entering the Litterman model, the latter is nested under
the common factor restriction, β1 = −φβ0 . The Fernández model arises in two circumstances: when xt
is in levels, when φ = 0 and β0 + β1 = 0; otherwise, if xt is in the changes, when φ = 0 and β1 = 0.
The treatment of initial conditions proceeds along similar lines to the Litterman model. The state space
form features two state elements: αt = [yt−1 , ∆yt ]′ , with yt = [1, 1]αt , and the transition equation is the
same as for Litterman model (see section 2.2), the only difference being the presence of regression effects
in the transition equation.
3
Temporal aggregation and State space form
Assume now that yt is not observed, but the temporally aggregated series is available,
times τ = 1, 2, . . . [n/s], where [k] denotes the integral part of k.
Ps−1
j=0
yτ s−j at
Following the approach proposed by Harvey (1989, sec. 6.3) the state space representation of the disaggregated model is adapted to the observational constraint by defining the cumulator variable
0, t = s(τ − 1) + 1, τ = 1, . . . , [n/s]
c
ytc = ψt yt−1
(9)
+ yt , ψt =
1,
otherwise
In the case of quarterly flows whose annual total is observed (s = 4) the cumulator variable takes the
following values:
y1c = y1 , y2c = y1 + y2 , y3c = y1 + y2 + y3 ,
y5c = y5 , y6c = y5 + y6 , y7c = y5 + y6 + y7 ,
..
..
..
.
.
.
y4c = y1 + y2 + y3 + y4
y8c = y5 + y6 + y7 + y8 ,
..
.
Only a systematic sample of every s-th value of ytc process is observed, yτc s , τ = 1, . . . [n/s], so that all
the other values are missing.
The state space representation for ytc is derived as follows: first, replacing yt = z ′ αt in (5) substituting
from the transition equation, rewrite
c
ytc = ψt yt−1
+ z ′ T αt−1 + (z ′ Wt + x′t )β + z ′ Hǫt ;
92
then, appending ytc to the original state vector, and defining α∗t = [α′t , ytc ]′
′
ytc
α∗t
= z ∗ α∗t ,
= Tt∗ α∗t−1 + Wt∗ β + H ∗ ǫt , α∗1 = a∗1 + W1∗ β + H1∗ ǫ1 ,
Tt∗
=
T
z′T
0
ψt
=
a1
z ′ a1
W1∗
(10)
′
with z ∗ = [0′ 1], and
a∗1
,
,
Wt∗
=
=
Wt
z ′ Wt + x′t
W1
z ′ W1 + x′1
,
∗
,H =
H1∗
=
H
z ′H
H1
z ′ H1
,
,
Notice that the regression effects are all contained in the transition equation, and the measurement
equation has no measurement noise. Moreover, the transition matrix T is time varying.
For the CL model (2), α∗t = [αt , ytc ]′ ,
1
φ 0
0
1
0
0
1
∗
∗
∗
p
Tt∗ =
, Wt∗ =
,
H
=
,
a
=
,
W
=
,
H
=
.
1
t
t
1
φ ψt
x′t
1
0
x′1
1 − φ2 1
4
Statistical treatment
The statistical treatment of model (10) is based upon the augmented Kalman filter (KF) due to de Jong
(1991, see also de Jong and Chu-Chun-Lin, 1994), suitably modified to take into account the presence of
missing values, which is easily accomplished by skipping certain updating operations.
The algorithm enables exact inferences in the presence of fixed and diffuse regression effects; the parameters β can be concentrated out of the likelihood function, whereas the diffuse case is accommodated
by simple modification of the likelihood. See Koopman (1997) and Durbin and Koopman (2001) for an
alternative exact approach to the initialisation under diffuseness and a comparison of the approaches.
The usual KF equations are augmented by additional recursions which apply the same univariate KF to
k series of zero values, where k is the dimension of the vector β, with different regression effect in the
state equation, provided by the elements of Wt .
′
Using the initial conditions in (10), and further defining A∗1 = W1∗ , P1∗ = H1∗ H1∗ , q1 = 0, s1 = 0, S1 = 0,
the augmented Kalman filter consists of the following equations and recursions: for t = 1, . . . , n, and
t = τ s, τ = 1, . . . , [n/s] (ytc is available):
vt
ft
a∗t+1
∗
Pt+1
qt+1
St+1
=
=
=
=
=
=
′
ytc − z ∗ a∗t ,
Vt′
′
z ∗ Pt∗ z ∗ ,
Kt
∗
Tt+1
a∗t + Kt vt ,
A∗t+1
′
∗
∗′
Tt+1
Pt∗ Tt+1
+ Ht∗ Ht∗ − Kt Kt′ ft ,
qt + vt2 /ft ,
st+1
′
St + Vt Vt /ft
dt+1
′
= −z ∗ A∗t ,
= Tt∗ Pt∗ z ∗ /ft
∗
∗
= Wt+1
+ A∗t Tt+1
+ Kt Vt′
(11)
= st + Vt vt /ft
= dt + ln ft
Else, for t 6= τ s (ytc is missing),
a∗t+1
qt+1
=
=
∗
∗
∗
Tt+1
a∗t ,
A∗t+1 = Wt+1
+ A∗t Tt+1
,
qt ,
st+1 = st ,
St+1 = St ;
93
′
′
∗
∗
∗
Pt+1
= Tt+1
Pt∗ Tt+1
+ Ht∗ Ht∗ ,
(12)
The symbol Vt′ denotes a row vector with k elements. The quantities qt , St , st accumulate weighted sum
of squares and cross-products that will serve the estimation of β via generalised regression.
It should be noticed that the quantities ft , Kt (Kalman gain) and Pt do not depend on the observations and that the first two are not computed when ytc is missing. Missing values imply that updating
operations, related to the new information available, are skipped.
The augmented KF computes all the quantities that are necessary for the evaluation of the likelihood
function.
4.1
Maximum Likelihood Estimation
Under the fixed effects model, as shown in Rosenberg (1973), maximising the likelihood with respect to
β and σ 2 yields:
−1
−1
β̂ = −Sn+1
sn+1 , Var(β̂) = Sn+1
,
The profile likelihood is
σ̂ 2 =
−1
qn+1 − s′n+1 Sn+1
sn+1
,
[n/s]
Lc = −0.5 dn+1 + [n/s] ln σ̂ 2 + ln(2π) + 1 ;
(13)
(14)
it is a function of the parameter φ alone. Thus maximum likelihood estimation can be carried out via a
grid search over the interval (-1,1).
When β is diffuse (de Jong, 1991), the maximum likelihood estimate of the scale parameter is
σ̂ 2 =
−1
qn+1 − s′n+1 Sn+1
sn+1
,
[n/s] − k
and the diffuse profile likelihood, denoted L∞ , takes the expression:
L∞ = −0.5 dn+1 + ([n/s] − k) ln σ̂ 2 + ln(2π) + 1 + ln |Sn+1 | .
(15)
The diffuse likelihood is based on a linear transformation (e.g. first order differences) that makes likelihood of the transformed data invariant to β. This yields estimators of φ with better small sample
properties, as it will be illustrated in section 6.1, in agreement with Tunnicliffe-Wilson (1986) and Shephard and Harvey (1990).
Some care about the parameterisation chosen should be exercised when φ is close to 1: this point can be
illustrated with respect to the simple CL model with a constant term, which admits two representations:
(A) yt = µ+ǫt /(1−φL) and (B) yt = φyt−1 +m+ǫt; for both (A) and (B) assume y1 = µ+(1−φ2 )−1/2 ǫ1 .
As hinted in section 2.1, taking first differences yields a strictly noninvertible ARMA(1,1) process that
does not depend on µ in case (A) and on m in case (B), except when φ = 1, for which ∆yt = m + ǫt and
further differencing would be required to get rid of the nuisance parameter m. Hence, setting aside for
a moment the temporal aggregation problem, L(∆y2 , . . . , ∆yn ) cannot be made independent of m when
φ = 1.
Denoting by Lµ and Lm the two diffuse likelihood under (A) and (B), it can be shown (proof available
from the author) that Lµ = Lm + 2 ln(1 − φ), which has relevant implications if φ → 1 (see section 6.1);
all other inferences are the same.
94
If it is suspected that φ = 1 is a likely occurrence, due to the nonstationarity of the series and the lack
of cointegration with the indicators, only diffuseness under case (A) provides the correct solution.
Hence, a difference arises in the definition of the diffuse likelihood according to the parameterisation of the
deterministic kernel. The ambiguity arises from the fact that the transformation adopted (differencing)
is dependent on the parameter φ. See Severini (2000, sec. 8.3.4) for a general treatment of marginal
likelihoods based on parameter dependent transformations.
The ambiguity is resolved anyway if (B) is reparameterised as yt = φyt−1 + (1 − φ)µ + ǫt , t > 1, in which
case the two diffuse likelihood can be shown to be equivalent.
4.2
Diagnostic checking and disaggregated estimates
−1
Diagnostics and goodness of fit are based on the innovations, that are given by ṽt = vt − Vt′ S
qt st , with
variance f˜t = ft + V ′ S −1 Vt . As illustrated in section 6.3, the standardised innovations, ṽt / f˜t can be
t
t
used to check for residual autocorrelation and departure from the normality assumption. They also are
a good indicator of the process of revision of the disaggregated estimates.
The filtered, or real time, estimates of the state vector and their estimation error matrix are computed
′
∗
as follows: α̃∗t|t = a∗t − A∗t St−1 st + Pt∗ z ∗ ṽt /ft , Pt|t
= Pt∗ + A∗t St−1 A∗t − Pt∗ z ∗ z ′ Pt∗ /ft .
The smoothed estimates are obtained from the augmented smoothing algorithm proposed by de Jong
(1988), appropriately adapted to hand missing values. Defining rn = 0, Rn = 0, Nn = 0, for t = n, . . . , 1,
and t = τ s, τ = 1, . . . , [n/s] (ytc is available):
rt−1
Nt−1
=
=
′
′
z ∗ vt /ft + (Tt+1 − Kt z ∗ )rt , Rt−1 = z ∗ Vt′ /ft + (Tt+1 − Kt z ∗ )Rt ,
′
′
′
z ∗ z ∗ /ft + (Tt+1 − Kt z ∗ )Nt (Tt+1 − Kt z ∗ )′
Else, for t 6= τ s (ytc is missing),
′
rt−1 = Tt+1 rt , Rt−1 = Tt+1 Rt , Nt−1 = Tt+1 Nt Tt+1
.
The smoothed estimates are obtained as
α̃∗t|n
∗
Pt|n
= a∗t + A∗t β̃ + Pt∗ (rt−1 + Rt−1 β̃)
′
−1
= Pt∗ + A∗t Sn+1
A∗t − Pt∗ Nt−1 Pt∗
The provides disaggregated estimates of the cumulator ỹtc (corresponding to the last component of
the state vector); the latter can be decumulated by inverting (5) so as to provide estimates of the
disaggregated series; however, its estimation error variance is not available.
To remedy this situation, the strategy is to augment the state vector by appending yt to it; the new
transition equation is derived straightforwardly writing:
yt = z ′ αt + xt β = z ′ T αt−1 + (xt + z ′ Wt )β + zHǫt .
95
5
Logarithmic transformation and nonlinear disaggregation
Time series models for flow variables measured on a ratio scale (such as production, turnover, value added,
revenues, etc.) are usually formulated in terms of the logarithmic transformation of the disaggregated
values.
The assumptions underlying the disaggregated model (additivity of effects, normality and homoscedasticity of errors) appear more suitable for the logarithms, rather then the levels of economic flows. See
also Banerjee et al. (1993), section 6.3, for further arguments and discussion concerning the modelling
of the logarithms versus the levels of an economic time series.
Assuming that yt denotes the logarithm of the disaggregated values, for which the dynamic regression
model yt = z ′ αt + x′t β of section 2 applies, the aggregate value of a flow can be expressed in terms of the
y’s as follows:
s−1
X
Yτ =
exp(yτ s−j ), τ = 1, . . . , [n/s];
(16)
j=0
For instance, when s = 4, the linear time series model is formulated for the logarithms of the quarterly
values, yt , but the available data are only annual, originating from the sum of the levels of the flow
variable over the four quarters making up the year.
It should be noticed that (16) is a nonlinear aggregation constraint which is responsible eventually for the
nonlinearity of the aggregated model. The general statistical treatment of disaggregation under the class
of Box-Cox (1964) transformations is provided in Proietti (2004); Proietti and Moauro (2003) present
an application to the problem of estimation an index of coincident indicators and a monthly GDP series
using data with different frequency of observations within the Stock and Watson (1991) dynamic factor
model.
Statistical inference is carried out by an iterative method that at convergence provides a disaggregated
series {ŷt , t = 1, . . . , n}, that is a constrained posterior mode estimate of the unknown {yt }, which
satisfies the nonlinear observational constraint (16) exactly.
Defining the cumulator on the original scale of measurement,
c
Ytc = ψt Yt−1
+ exp(yt ),
(17)
and denoting by ỹt = z ′ α̃ + x′t β̃, t = 1, . . . , n, a trial disaggregated series (which needs not satisfy the
constraint (16)), let us consider the first order Taylor approximation of exp(yt ) = exp(z ′ α + x′t β) around
the trial values [α̃′ , β̃ ′ ]′ , giving:
Ytc
z̃t′
=
=
c
ψt Yt−1
+ z̃t′ αt + x̃′t β + d˜t ,
exp(ỹt )z ′ , x̃′t = exp(ỹt )x′t , d˜t = (1 − ỹt ) exp(ỹt )
(18)
Conditional on ỹt , equation (18) is linear and, replacing αt with the right hand side of the transition
equation, it is used to augment the state space representation (1). The only relevant change with respect
to the linear case is the inclusion of the sequence dt and the time varying nature of the system matrices.
Hence, conditional on ỹt the linear Gaussian approximating model (LGAM) is formulated for the state
vector α∗t = [α′t , Ytc ]′ as follows:
Ytc
α∗t
= [0′ 1]α∗t ,
= T̃t∗ α∗t−1 + [0′ , d˜t ]′ + W̃t∗ β + H̃ ∗ ǫt , α∗1 = a∗1 + [0′ , d˜1 ]′ + W̃1∗ β + H̃1∗ ǫ1
96
(19)
where
0
Wt
H
∗
∗
=
, Wt =
,H =
,
ψt
z ′ Wt + x̃′t
z̃t′ H
a1
W1
H1
∗
∗
a∗1 =
,
W
=
,
H
=
,
1
1
z̃1′ H1
z̃1′ a1
z̃1′ W1 + x̃′1
Tt∗
T
z̃t′ T
Applying the augmented KF and smoother of section 4 a new disaggregated series {ŷt } is computed.
Setting ỹt = ŷt and replacing the latter into (19) a new LGAM is obtained; iterating the process until
convergence provides the required solution. The likelihood of the nonlinear model is approximated by
that of the LGAM at convergence. An illustration is provided in section 6.5.
As shown in Proietti (2004), the iterative algorithm outlined above is a sequential linear constrained
optimisation method, see Gill et al. (1989, sec. 7), that rests upon the linearisation of the constraint
around a trial value, which does not need to be feasible. It can be viewed as a particular case of
the recursive conditional mode by extended Kalman filtering and smoothing proposed on Durbin and
Koopman (1992, 2001) and Fahrmeir (1992).
We stress that the iterative method differs from the technique known as extended Kalman filter (see
Anderson and Moore, 1979, Harvey and Pierse, 1984, sec. 5), which is the non-iterative method that
replaces the unknown states affecting the system matrices by the conditional expectation of αt given the
past observations up to time t − 1 in the Taylor expansion.
As a matter of fact, straight application of the extended Kalman filter and smoother to the linearised
model would not lead a feasible estimate of α, and a subsequent adjustment would be required to
enforce the aggregation constraint. This approach was adopted by Proietti (1998), who distributed
the approximation errors according to the Denton method (Denton, 1971). Salazar et al. (1997), see
also Mitchell et al. (2004), Aadland (2000), and Di Fonzo (2003) provide alternative approximate non
iterative solutions.
According to our approach, on the other hand, the evaluation of the likelihood is based on a linearised
model that has the same posterior mode for the states α, and thus for the missing observations, as the
true nonlinear model.
6
Illustrations
This section provides five illustrations. The first two concern the CL disaggregation method and deal
with (i) the role of marginal likelihood diffuse initial conditions for the estimation of the autoregressive
parameter; (ii) the benefits of nesting CL within a more general ADL model.
The third uses a real time experiment to set up revision histories that help in choosing between different
methods and/or deterministic kernels.
The fourth concerns the properties of the maximum likelihood estimator of the AR parameter in the
Litterman model, in particular, when the true φ is equal to zero (Fernandez, 1981).
Our final illustration deals with the nonlinear disaggregation on the logarithmic scale of a time series
characterised by the presence of small positive values, for which linear disaggregation methods fail to
provide admissible estimates.
97
The real life applications refer to a dataset, consisting of a set of annual series and the corresponding
quarterly indicators, made available by the Italian National Statistical Institute, Istat, which has recently
established a commission aiming at the assessment of its current temporal disaggregation methodology
and practices, that are based primarily, if not exclusively, on the CL procedure.
A set of Ox28 functions implementing the methods presented in this paper are available from the author
upon request and is documented briefly in the appendix.
6.1
The CL model with diffuse regression effects
We report the results of two small, but representative, Monte Carlo experiment dealing with the estimation of the AR parameter, φ, under the two alternative assumptions on the regression effects, β, fixed or
diffuse.
The first consists of simulating M = 1000 series of length n = 120 from the true model yt = µ + xt + ut ,
with µ = 0.5, xt is the random walk process ∆xt = 0.5 + ζt , ζt ∼ NID(0, 1), and ut is a stationary AR(1)
process with φ = 0.75: ut = 0.75ut−1 + ǫt , ǫt ∼ NID(0, 0.8).
The generated series are then aggregated with aggregation period s = 4, and the CL model with a
constant term and regression effects is estimated treating β respectively as a fixed unknown vector and
a random diffuse vector. We refer to section 4.1 for the choice of the parameterisation of the constant
term.
The distributions of the estimated φ coefficients are presented in the first panel of figure 1. Both
estimators suffer from a downward bias, but this is larger for the fixed case (-0.17, in the diffuse case
being equal to -0.05). The estimation mean square errors are respectively 0.206 and 0.052 for the fixed
and diffuse case, implying that assuming diffuse effects greatly improves the estimates.
In the second Monte Carlo experiment the disaggregated data are generated by the random walk with
drift: yt = yt−1 + 0.5 + ǫt , ǫt ∼ NID(0, 0.5). the other design parameter remain the same and there are
no exogenous effects.
The generating model has φ = 1 and is thus nonstationary. Again the treatment of µ as a diffuse effect
in the estimation provides a substantive improvement for the properties of the estimates, as highlighted
by the comparison of the two distributions of the φ estimates over the M replications, represented in the
second panel of figure 1.
The bias and the estimation mean square error amount respectively to to -0.09 and 0.012 for the fixed
case and to -0.05 and 0.005 for the diffuse case.
6.2
Common factor analysis: the ADL(1,1) model versus Chow-Lin
The estimation of the quarterly total production series for the Italian insurance sector (Istat A2 series)
from the annual series, carried out by Istat using the CL method, is considered problematic due to the
scarce adequacy of the quarterly indicator. The latter is reproduced, along with the annual figures to be
distributed, in the first panel of figure 2. The plot reveals that the indicator is highly volatile towards
the end of the sample, being more prone to outliers and level shifts.
28 Ox
is a matrix programming language developed by J.A. Doornik (2001).
98
MC experiment a) y t = µ+ x t +u t, φ=0.75
Fixed
−1.25
−1.00
Diffuse
−0.75
−0.50
MC experiment b). y t ∼RW (φ=1)
−0.25
0.00
0.70
0.75
0.25
0.50
0.75
1.00
Fixed
Diffuse
0.45
0.50
0.55
0.60
0.65
0.80
0.85
0.90
0.95
1.00
Figure 1: Histograms and nonparametric density estimates of the estimated φ coefficient of the CL model
under fixed and diffuse effects.
In a situation like the present one, a crucial concern is the ”volatility transfer” from the indicators to
the disaggregated estimates, that originates from the application of the CL procedure.
Before turning to this issue, let us present the solutions presented by the ADL(1,1) and Chow-Lin with
regression effects represented by a deterministic trend and the indicator xt . It should be recalled from
section 2.3 that the former nests the latter.
As far as the ADL(1,1) model is concerned, the maximised profile likelihood (plotted in the second panel
of figure 3) is -204.30, which corresponds to value φ = 0.72. Moreover, β̂0 = −0.04 (standard error 0.08),
β̂0 = 0.17 (standard error 0.09), σ̂ 2 = 11607. This would suggest dropping xt , while retaining xt−1 (due
to the nonstationarity of xt collinearity, and the reliability of standard errors, is an issue here).
For the Chow-Lin model the estimated AR parameter (restricting the grid search in the positive range)
is 0.43, with the maximised profile likelihood taking the value -212.64; also, β̂ = 0.40, (standard error
0.03), σ̂ 2 = 44712.
Hence, the likelihood ratio statistic of the restriction β1 = −φβ0 in the ADL model takes the value
16.69 and clearly does lead to a rejection. This implies that the CL specification is not supported in this
example.
The plot of the disaggregate series (figure 2) reveals interesting interpretative features of the two methods.
In particular, the CL estimates are affected to a greater extent by the volatility of the indicator, whereas
the sharp movements in the indicator appear more smoothed in the ADL estimates.
Now, recall that for scalar xt the CL disaggregation is based on yt = βxt + ut with ut ∼AR(1) (possibly
99
with a non zero mean), whereas for the ADL(1,1) case,
X
X
y t = β0
φj xt−j + β1
φj xt−j−1 + ut .
j
j
The first two terms on the right hand side can be interpreted as a distributed lag function of a filtered
version of xt , where the current and past values of xt receive weights that decline geometrically over
time.
If the indicator is affected by measurement error, so that it can be written xt = x̄t + ηt , η ∼ NID(0, ση2 ),
and the signal x̄t is generated by a random walk with disturbance variance σx2 = (1 − φ)2 ση2 /φ, then it
can be shown that E(x̄t |xt , xt−1 , . . .) is proportional to xt + φxt−1 + φ2 xt−2 + · · · = (1 − φL)−1 xt .
Our second example supports instead the CL restriction and deals with the total production of the
Wholesale and Retail Trade sector (Istat B4 series), whose annual values are plotted in the first panel
of figure 3, along with the quarterly indicator constructed by Istat.
As far as the ADL(1,1) model is concerned, the maximised profile likelihood (plotted in the second panel
of figure 3) is -241.52, which corresponds to value φ = 0.72. Moreover, β̂0 = 0.55, β̂0 = −0.43 (standard
error 0.12), σ̂ 2 = 1.8583 × 105 .
For the Chow-Lin model the estimated AR parameter is the essentially the same (0.72), with the maximised profile likelihood taking the value -241.76; also, β̂ = 0.49, (standard error 0.08), σ̂ 2 = 1.8671×105.
Thus the LR statistic of the restriction β1 = −φβ0 is 0.47 and clearly does not lead reject the null.
Figure 3 makes clear that the two models produce the same disaggregated series.
6.3
Diagnostics and reliability of estimates
The European System of National and Regional Accounts (par. 12.04) envisages the following selection
criterion:
The choice between the different indirect procedures must above all take into account the
minimisation of the forecast error for the current year, in order that the provisional annual
estimates correspond as closely as possible to the final figures.
This criterion, which refers to the discrepancy between the estimates not using the last aggregate data
and those incorporating it, is ambiguous in that it considers the estimates incorporating the yearly figure
as final, whereas any disaggregation filter will use also future aggregate observations, the CL with φ = 0
being the only notable exception.
On the other hand, it clearly points out that the decision between alternative methods should be based on
a careful assessment of the revision of the estimates as the new total, sometimes referred to as benchmark,
becomes available.
In this illustration we compare the revision histories of three methods of disaggregating the total production of the Metals sector (Istat B2 series), available for the years 1977-2003 and plotted in the first
panel of figure 4 along with its indicator: CL with regression effects represented by a constant and
100
Original series
yt c
Profile log−likelihood
xt
−205.0
ADL(1,1) × φ
Chow−Lin × φ
20000
−207.5
−210.0
10000
−212.5
−215.0
1980
1985
1990
1995
2000
2005
Smoothed Estimates
ADL(1,1)
−0.5
40
0.0
Annualised growth rates
CL
φ
0.5
1.0
ADL(1,1)
6000
CL
30
20
4000
10
2000
0
1980
1985
1990
1995
2000
2005
1980
1985
1990
1995
2000
2005
Figure 2: Total Production Insurance sector (Istat A2 series). Comparison between the ADL(1,0) model
and Chow-Lin.
Original series
yt c
Profile log−likelihood
xt
−242
200000
ADL(1,1) × φ
Chow−Lin × φ
−244
150000
100000
−246
50000
−248
1980
1985
1990
1995
2000
2005
−0.5
Smoothed Estimates
ADL(1,1)
0.0
0.5
1.0
Annualised growth rates
7.5
CL
50000
ADL(1,1)
CL
5.0
2.5
40000
0.0
30000
1980
1985
1990
1995
2000
2005
1980
1985
1990
1995
2000
2005
Figure 3: Total Production Wholesale and Retail Trade (Istat B4 series).Comparison between the
ADL(1,0) model and Chow-Lin.
101
the indicator; CL with a linear trend and the indicator; the Fernández model with a constant and the
indicator.
The filtered and smoothed estimates arising from the Fernández are reproduced in figure 4, whereas
figure 5 plots the standardised innovations, their estimated density and correlogram and compares the
revision history with the CL model with a trend.
The revision histories are generated as follows: starting from 1992 we perform a rolling forecast experiment such that at the beginning of the year we make predictions for the four quarters using the
information available up to the beginning of the year and revise the estimates concerning the four quarters of the previous year.
This assumes that the annual aggregate for the year τ accrues between the end of the quarter τ s and
the beginning of quarter τ s + 1. At the end of the experiment 12 sets of predictions are available for four
horizons (one quarter to four quarters); these are compared with the revised estimates, which incorporate
the annual aggregate information. The models are re-estimated at each new annual observations.
Table 1 presents summary statistics pertaining to the revision histories at the four horizons.
Table 1: Revision history for Istat series B2 (years 1992-2003).
Model
Mean percentage revision error
1 step 2 steps 3 steps 4 steps
Chow-Lin (constant)
1.44
1.97
2.22
2.23
Chow-Lin (trend)
1.36
1.68
1.85
1.73
Fernandez
0.67
0.83
0.94
0.88
Mean revision error
1 step 2 steps 3 steps 4 steps
Chow-Lin (constant) 259.67 372.11 435.34 443.47
Chow-Lin (trend)
250.70 324.80 372.80 353.21
Fernandez
111.75 147.85 177.93 173.44
Mean absolute revision error
1 step 2 steps 3 steps 4 steps
Chow-Lin (constant) 404.13 574.00 673.56 693.95
Chow-Lin (trend)
414.32 560.54 649.26 647.21
Fernandez
366.79 525.71 632.25 673.81
Mean square revision error
1 step 2 steps 3 steps 4 steps
Chow-Lin (constant) 247363 503008 695426 743491
Chow-Lin (trend)
336728 589829 793115 742075
Fernandez
215468 422454 603165 672839
The first rather obvious piece of evidence is that the performance of the methods deteriorates with the
horizons, except for the CL with a trend. Secondly, the random walk model (Fernández) specifications
outperforms according to all the measures presented.
The plot of the percentage revision errors in the last panel of figure 5 points out that the extent of the
revision can be anticipated from the standardised innovations; it also reveals that the performance of
the CL with trend model is strongly influenced by the inability to predict the 1996 expansion.
102
Original series
30000
yt c
xt
25000
Filtered Estimates
~y
t|t
ucl
lwl
20000
15000
10000
1980
1985
1990
1995
2000
2005
25000
~y
t|T
ucl
lwl
1980
1985
1990
1995
2000
2005
1980
1985
1990
1995
2000
2005
Yearly growth rates (∆ 4ln(~
yt|T)
Smoothed Estimates
40
30
20000
20
15000
10
10000
0
5000
−10
1980
1985
1990
1995
2000
2005
Figure 4: Total Production Metals (Istat B2 series). Filtered and smoothed estimates of the disaggregated
series obtained by the Fernández model.
KF Innovations Fernandez model
5000
0.5
2 s.e.
ut
−2 s.e
Density St. Innovations Fernandez model
N(s=1.02)
0.4
2500
0.3
0
0.2
−2500
0.1
−5000
1.0
1980
1985
Correlogram
1990
1995
2000
2005
ACF Standardised Innovations
−3
−2
−1
0
1
2
3
4
Revision history for Chow−Lin and Fernandez
10
CLt
Fe
0.5
5
0.0
0
−0.5
1
2
3
4
5
6
7
1995
2000
2005
Figure 5: Total Production Metals (Istat B2 series). Standardised innovations for the Fernández model
and revision histories for the Chow-Lin and Fernández models
103
6.4
Litterman model: small sample properties of φ estimates
The previous illustration showed that the Fernández model outperformed the Chow-Lin specification in
terms of the revision errors; as the former is also a particular case of the Litterman model arising when
the AR parameter is zero, one would expect that, when the series is not cointegrated at the long run
frequency with the indicator, the correct strategy would be to fit the Litterman model in the first place
and to entertain the Fernández model only when φ is not significantly different from zero.
This section investigates the properties of the maximum likelihood estimates of the φ parameters of the
Litterman model, and concludes that these are fairly problematic at the very least.
For the series considered in the above illustration (Istat B2 series) the estimate of φ resulted -0.73,
even though this is not significantly different from zero, which can be considered as a fairly emblematic
outcome, as confirmed by our empirical experience with the Istat dataset: out of 16 series investigated
10 φ estimates were in the range (-1, -0.5), 3 in (-0.5, 0), only one in (0,0.5) and three were greater than
0.5. In spite of their polarisation, the likelihood was very flat and the estimates were not significantly
different from zero.
The reason is that for s = 4, despite the φ parameter being theoretically identified, the data are not very
informative about the AR parameter, even in large samples, unless the true value is indeed close to the
extremes of the stationary range.
The behaviour of the large sample likelihood function of the Litterman model can be investigated using
the approach by Palm and Nijman (1984).
Letting zt denote the sum of s consecutive observations, zt = S(L)yt , with S(L) = 1 + L + L2 + · · · Ls−1 ,
originating from the ARIMA(1,1,0) process ∆yt = φ∆yt−1 + ǫt ; it immediately follows that
∆s zt = φs ∆s zt−s + S(L)2 (1 + φL + φ2 L2 + · · · + φs−1 Ls−1 )ǫt ,
where ∆s = 1 − Ls .
The first differences of the aggregated process arise as a systematic sample with step s of ∆s zt , and are
the ARMA(1,1) process: ∆Zτ = φs Zτ −1 + (1 + θL)ξτ , where the lag operator applies to the τ index, that
is ∆Zτ = Zτ − Zτ −1 , with AR parameter φs and moving average parameter, θ, provided by the invertible
θ
root of the quadratic equation 1+θ
2 = ρ(s), where ρ(s) = γ(s)/γ(0) and γ(j) is the lag j autocovariance
2
2 2
of the process S(L) (1 + φL + φ L + · · · + φs−1 Ls−1 )ǫt ; σξ2 = γ(0)/(1 + θ2 ).
Letting gφ (ω) = (1 + θ2 + 2θ cos ω)/(1 + φ2s − 2φs cos ω), so that gφ (ω)σξ2 denotes the spectral generating
function of ∆Zτ the large sample log-likelihood evaluated at φ̃ is
"
#
Z π
1 hni
gφ (ω) 2
L(φ̃) ≈ −
1 + ln
σξ dω
(20)
2 s
0 gφ̃ (ω)
˜
The integral represents the variance of the residual ξ˜t = (1 + θ̃L)−1 (1 − φL)∆Z
τ for the Litterman model
with φ̃.
When φ = 0 the true disaggregated data are generated by the Fernández model (a random walk), and
the aggregate model for Zt is IMA(1,1).
In figure 6 we use the effective device, adopted by Palm and Nijman (1984), of plotting L(φ̃) for the
values of the argument that are not significantly different from the true value φ = 0, which occurs when
104
s=4
s=3
s=12
−0.8
−0.6
−0.4
−0.2
0.0
φ
0.2
0.4
0.6
0.8
1.0
Figure 6: Large sample profile log-likelihood for φ coefficients when the true disaggregated model is a
random walk (Fernández model).
2[L(0) − L(φ̃)] < χ20.95 (1) (obviously L(0) is a global maximum), with χ20.95 (1) representing the 95-th
percentile of the chisquare distribution with 1 degree of freedom (3.84), and the fixed value L(0) −
0.5χ20.95 (1) otherwise.
The plot immediately reveals that the likelihood is very flat, and that the range of values not significantly
different from zero covers a large part of the stationary region of the parameter. When the aggregation
interval is s = 3 a secondary mode of the likelihood appears, and the situation worsens as s increases.
The previous analysis rests upon a large sample approximation. The small sample properties of the
MLE of φ in the Litterman model when the true value is φ = 0 and the aggregation interval is equal
to s = 4 are investigated by a MC experiment, designed as follows: M = 1000 series of length n = 120
are simulated from the process yt = xt + ut , where ut is a random walk ∆ut = ǫt , ǫt ∼ NID(0, 0.5), and
∆xt = 0.5 + ζt , ζt ∼ NID(0, 1).
The series are aggregated into the sum of four consecutive values (s = 4 giving 40 yearly observations)
and the Litterman model with a constant term (assuming unknown initial conditions) is fitted; for
comparison we also estimated two encompassing dynamic regression models, namely the ADL(1,1) model
in differences using the levels of xt as an indicator ∆yt = φ∆yt−1 +m+β0 xt +β1 xt−1 +ǫt , and the ADL(1,1)
using first differences of xt as an explanatory variable: ∆yt = φ∆yt−1 + m + β0 ∆xt + β1 ∆xt−1 + ǫt ,
The density of the estimates of φ are displayed in figure 7, which reveals the distribution of the Litterman’s
estimates is inherently bimodal, so that the bulk of the estimates is far away from the true value. The
true value is close to a minimum of the density.
The situation improves substantially when the ADL model using xt is adopted; the estimates are less
biased and the mode is close to the true value; moreover, although not reported here. the estimates of
105
ADL(1,1) for ∆y t using x t
Litterman model
−1
~0
φ
1
−1
~0
φ
ADL(1,1) for ∆y t using ∆x t
1
−1
~0
φ
1
Figure 7: Histograms and nonparametric density estimates of the estimated φ coefficient for disaggregated
data generated according to yt = xt + ut , where ut is a random walk, ∆ut = ǫt , ǫt ∼ NID(0, 0.5), and
∆xt = 0.5 + ζt , ζt ∼ NID(0, 1).
the regression coefficient satisfy β0 + β1 = 1. Finally, the ADL(1,1) using ∆xt , improves somewhat, but
not decisively.
In conclusion, the maximum likelihood estimates of the φ parameter characterising the Litterman model
should be taken with great care; it is usually fruitful to plot the profile likelihood, highlighting the set
of values not significantly different from the maximiser, to perceive the reliability of the point estimates.
It is also important to compare results with the ADL model in first differences.
6.5
Nonlinear Disaggregation
Our last illustration concerns the annual tax revenues on methane gas (Istat series A1), which is displayed
in the first panel of figure 8; the series is disaggregated by Istat using the Chow-Lin method using as
indicator measuring the quantity of methane gas sold (seasonally unadjsted, also reproduced in figure
8).
We leave aside the issue posed by the presence of seasonality. It suffices to say that Istat produces a
”seasonally unadjusted” version of the quarterly accounts that postulates that the seasonality in the
aggregated series is proportional to that in the indicator, where the factor of proportionality is the regression coefficient β; if the indicator is integrated at the seasonal frequencies, the underlying assumption
is that, the disaggregated series and the indicator will be seasonally cointegrated at those frequencies.
The disaggregation of the annual revenue totals provides an interesting testbed since, due to the small
numbers involved, the estimated quarterly series can take on negative values, which are outside the
admissible range of the series. As a matter of fact, standard application of the Chow-Lin procedure
106
with a constant term (φ̂ = 0.36 and β = 0.98, with standard error 0.04) yields the disaggregated series
plotted in the second panel of figure 8, which indeed becomes negative in coincidence with a few seasonal
troughs.
There are two arguments in favour of the nonlinear disaggregation procedure outlined in section 5. First
and foremost, working on the logarithmic scale guarantees that the levels of the estimated quarterly
series take only positive values, within the admissible range; moreover, the logarithmic transformation
mitigates if not eliminates the heteroscedasticity of the series. At least the logarithms of xt do not
display the increase in the amplitude of the seasonal fluctuations which characterises the levels; see the
last panel of figure 8.
When the nonlinear CL model with a constant term is fitted to the series the estimates the parameters
result φ̂ = 0.27, m = −0.53, and β = 1.09. The disaggregated estimates are obtained by the iterative
algorithm described in section 5; they are displayed in the middle panel of 8, whereas their logarithm is
shown in the last graph.
7
Conclusions
The paper has revisited the problem of disaggregating economic flows by dynamic regression models
using a state space framework.
It has discussed the state space formulation of the traditional disaggregation methods, with special
attention to the initialisation issue. The latter is crucial for a correct implementation of the methods
and for their nesting within more general dynamic specifications.
The associated filtering and smoothing algorithm, suitably modified to allow for the treatment of missing
values and fixed and diffuse regression effects, provide the unifying tools for the statistical treatment:
likelihood evaluation, diagnostic checking and ultimately disaggregation.
The empirical findings based on real life case studies and Monte Carlo experimentation can be summarised
as follows:
• The use of the marginal likelihood has been found to be beneficial for the Chow-Lin model when
the AR parameter is close to unity.
• Likelihood inference on the AR coefficient of the Litterman model proves to be very unreliable.
• Nesting the traditional disaggregation models within more general dynamic specifications, such as
the ADL in levels and first differences, is a good modelling strategy.
• The exact nonlinear disaggregation problem arising when the disaggregated model is formulated in
the logarithms of the series is feasible and computationally attractive, requiring only the iteration
of routine linear smoothing operations.
Given the previous discussion on the properties of the profile and marginal likelihood and on the difficulty
in estimating the Litterman model, and considering the nonstationary nature of the economic time series
usually entertained in practice, the suggested strategy is to fit the ADL(1,1) model, which, under a
reparameterisation and suitable initial conditions, nests both the Chow-Lin and the Fernández model,
thereby incorporating our uncertainty about the presence of cointegration between the aggregated series
and the indicators.
107
Original series
Annual series
xt
4000
3000
2000
1000
1980
2500
1985
Smoothed quarterly series
linear
1990
1995
2000
2005
1990
1995
2000
2005
1990
1995
2000
2005
nonlinear
2000
1500
1000
500
0
1980
8
1985
Logarithms of indicator and smoothed series
ln x t
nonlinear
6
4
1980
1985
Figure 8: Tax Revenues Methane Gas (Istat A1 series). Comparison between linear and nonlinear
Chow-Lin disaggregation. The linear quarterly series yields negative unadmissible estimates.
108
Acknowledgements
I greatly benefited from discussion with Filippo Moauro and Tommaso di Fonzo. I also thank Giulia
Genovino and Elisa Stoffino for assistance.
109
References
Aadland D.M. (2000). Distribution and Interpolation Using Transformed Data, Journal of Applied
Statistics, 27, 141-56.
Anderson, B.D.O., and Moore, J.B. (1979). Optimal Filtering, Englewood Cliffs NJ: Prentice-Hall.
Banerjee, A., Dolado, J., Galbraith, J.W., and Hendry, D.F. (1993). Co-integration, error-correction, and
the econometric analysis of non-stationary data, Advanced Texts in Econometrics, Oxford University
Press, Oxford, UK.
Box, G.E.P., and Cox, D.R. (1964). An analysis of transformations (with discussion), Journal of the
Royal Statistical Society, B 26, 211-246.
Chow, G., and Lin, A. L. (1971). Best Linear Unbiased Interpolation, Distribution and Extrapolation
of Time Series by Related Series, The Review of Economics and Statistics, 53, 4, 372-375.
Di Fonzo T. (2003), Temporal disaggregation of economic time series: towards a dynamic extension,
European Commission (Eurostat) Working Papers and Studies, Theme 1, General Statistics.
de Jong, P. (1989). Smoothing and interpolation with the state space model, Journal of the American
Statistical Association, 84, 1085-1088.
de Jong, P. (1991). The diffuse Kalman filter, Annals of Statistics, 19, 1073-1083.
de Jong, P., and Chu-Chun-Lin, S. (1994). Fast Likelihood Evaluation and Prediction for Nonstationary
State Space Models, Biometrika, 81, 133-142.
Eurostat (1999). Handbook on quarterly accounts, Luxembourg.
Denton F.T. (1971). Adjustment of monthly or quarterly series to annual totals: An approach based on
quadratic minimization, Journal of the American Statistical Association, 66, 99-102.
Di Fonzo, T. (2003). Temporal disaggregation using related series: log-transformation and dynamic
extension, Rivista Internazionale di Scienze Economiche e Commerciali, 50, 371-400.
Doornik, J.A. (2001). Ox 3.0 - An Object-Oriented Matrix Programming Language, Timberlake Consultants Ltd: London.
Durbin, J., and Koopman, S.J. (1992). Filtering, smoothing and estimation for time series models
when the observations come from exponential family distributions. Mimeo, Department of Statistics,
London School of Economics.
Durbin J., and Koopman, S.J. (2001). Time Series Analysis by State Space Methods, Oxford University
Press: New York.
Engle, R.F., and Granger, C.W.J. (1987). Co-integration and Error Correction: Representation, Estimation and Testing, Econometrica, 55, 251-276.
Fahrmeir, L. (1992). Posterior mode estimation by extended Kalman filtering for multivariate dynamic
generalised linear models. Journal of the American Statistical Association 97, 501-509.
Fernández, P. E. B. (1981). A methodological note on the estimation of time series, The Review of
Economics and Statistics, 63, 3, 471-478.
110
Gill, P.E., Murray, W., Saunders, M.A., and Wright, M.H. (1989). Constrained nonlinear programming,
in G.L. Nemhauser, A.H.G. Rinnooy Kan, M.J. Todd (eds.), Handbooks in Operations Research and
Management Science, Vol. 1, Optimization, 171-210, Elsevier, Amsterdam, 1989.
Harvey, A.C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge
University Press: Cambridge.
Harvey, A.C. and Chung, C.H. (2000) Estimating the underlying change in unemployment in the UK,
Journal of the Royal Statistics Society, Series A, 163, 303-339.
Harvey, A.C., and Pierse R.G. (1984). Estimating Missing Observations in Economic Time Series.
Journal of the American Statistical Association, 79, 125-131.
Hendry, D.F., and Mizon, G.E. (1978). Serial correlation as a convenient simplification, not a nuisance:
A comment on a study of the demand for money by the Bank of England. Economic Journal, 88,
549-563.
Koopman, S.J. (1997). Exact initial Kalman filtering and smoothing for non-stationary time series
models, Journal of the American Statistical Association, 92, 1630-1638.
Litterman, R. B. (1983). A random walk, Markov model for the distribution of time series, Journal of
Business and Economic Statistics, 1, 2, pp. 169-173.
Mitchell, J., Smith, R.J., Weale, M.R., Wright, S. and Salazar, E.L. (2004). An Indicator of Monthly
GDP and an Early Estimate of Quarterly GDP Growth. Discussion Paper 127 (revised) National
Institute of Economic and Social Research.
Moauro F. and Savio G. (2002). Temporal Disaggregation Using Multivariate Structural Time Series
Models. Forthcoming in the Econometric Journal.
Palm, F.C. and Nijman, T.E. (1984). Missing observations in the dynamic regression model, Econometrica, 52, 1415-1436.
Proietti T. (1998). Distribution and interpolation revisited: a structural approach, Statistica, 58, 411432.
Proietti, T. (2004). On the estimation of nonlinearly aggregated mixed models. Working paper, Department of Statistics, University of Udine.
Proietti T. and Moauro F. (2003). Dynamic Factor Analysis with Nonlinear Temporal Aggregation
Constraints. Mimeo.
Rosenberg, B. (1973). Random coefficient models: the analysis of a cross-section of time series by
stochastically convergent parameter regression, Annals of Economic and Social Measurement, 2, 399428.
Salazar, E.L., Smith, R.J. and Weale, M. (1997). Interpolation Using a Dynamic Regression Model:
Specification and Monte Carlo Properties, NIESR Discussion Paper n. 126.
Severini, Thomas A. (2000). Likelihood methods in statistics, Oxford statistical science series, 22, Oxford
University Press.
Shephard, N.G., and Harvey, A.C. (1990). On the probability of estimating a deterministic component
in the local level model, Journal of Time Series Analysis, 11, 339-347.
111
Stock, J.H., and Watson M.W. (1991). A probability model of the coincident economic indicators. In
Leading Economic Indicators, Lahiri K, Moore GH (eds); Cambridge University Press: New York.
Tunnicliffe-Wilson, G. (1989). On the use of marginal likelihood in time series model estimation, Journal
of the Royal Statistical Society, Series B, 51, 15-27.
112
Appendix A. Description of the Ox Programme
The Ox programme TDSsm.ox contains functions for the statistical treatment of disaggregation using dynamic
regression models.
The models implemented are identified by an appropriate string, assigned to the variable sSel, that result from
the collation of three components: the Model type, the Deterministic type and the Diffuse type.
The options supported for the Model type are listed below:
Acronym
"CL"
"ADL10"
"ADL10x"
"ADL11"
"ADL11x"
"L0"
"L"
"ADL10D"
"ADL11D"
Model type
Chow-Lin
ADL(1,0) in
ADL(1,0) in
ADL(1,0) in
ADL(1,1) in
Litterman
Litterman
ADL(1,0) in
ADL(1,1) in
Notes
levels
levels
levels
levels
Initialisation based on (7)
Initialisation: y1 ∼ N(c + x′1 β, σ 2 )
Initialisation based on (7)
Initialisation: y1 ∼ N(c + x′1 β, σ 2 )
u0 = ∆u0 = 0 (zero initialisation)
u1 in β, ∆u0 ∼ N(0, σ 2 /(1 − φ2 ))
1st differences
1st differences
Deterministic components may enter the model via the matrix of the regressors (see below) or are automatically
included in the transition equation by appending a c to the model string for a constant term or a t for a linear
trend.
For instance, sSel = "CLt" selects the Chow-Lin model with a linear trend; sSel = "ADL11Dc" selects the
ADL(1,1) model in first differences with a constant term, ∆yt = φ∆yt−1 + m + ǫt .
Notice that "L0" can only be used in its original specification, so that if it is desired to include a deterministic
trend into the Litterman original specification, this has to be done by including [1, t] in the set of regressors xt .
If further the elements of β are diffuse a d is appended to the model string. For instance, sSel = "Ltd" identifies
the Litterman model with a linear trend and diffuse regression effects.
The series under investigation, ytc , is subject to missing values; for instance in the quarterly case it features the
following elements:
c
{".",".",".", y4c , ".",".",".", y8c , ".", ".", ".", y12
, . . .}
c
where ”.” is a missing value and y4τ
is the annual total for the τ -th year.
Along with the string selecting a model, the key ingredients are listed below:
Ox Variable
cn
cs
vyf
dphi
mx
Symbol
n
s
{ytc , t = 1, . . . , n} (1 × n vector)
φ
[x1 , x2 , . . . , xn ] (k × n matrix)
When no indicator variables are available mx =<> (an empty matrix). Notice also that if the mx matrix has
′
xt = [1, t, x†t ]′ as its column elements, selecting sSel = "CL" yields the same results as sSel = "CLt" using only
′
x†t as regressors. As stressed in section 4.1, a difference arise under diffuse effects, that is sSel = "CLd" using
′
′
x′t = [1, t, x†t ] is no longer the same as sSel = "CLtd" with x†t as regressors, the first being preferable when it
is suspected that φ is close to unity.
113
Linear disaggregation
The linear disaggregation methods described in the paper are implemented by the following set of Ox functions
the are described below. Each function declaration lists in parenthesis its arguments and is followed by a brief
comment.
IndicatorVariable(const cn, const cs) Generates a row vector of length n with elements ψt , t = 1, . . . , n.
c
Cumulator(const mY, const cs) Generates the row vector with the cumulator values ytc = ψt yt−1
+ yt from
the series yt in argument as row vector.
DeCumulator(const mCY, const cs) Generates the row vector of the decumulated sequence yt from ytc .
SelectModel(const sSel) Set the global model selection parameters from the identifying string.
SetStateSpaceForm(const vyf, const dphi, const mx, const cs, const sSel) Builds the system matrices
z ∗ , T ∗ , H ∗ and the elements of a1 , W1∗ , H1∗ , of the state space representation for the model identified by
sSel and returns the matrix [W2 , W3 , . . . , Wn+1 ]
SsfLogLikc(const vyf, const dphi, const mx, const cs, const sSel) Evaluates the profile likelihood Lc
or the diffuse profile likelihood, L∞ at φ, given respectively by (14) and (15), by means of the augmented
Kalman filter (equations (11)).
SsfProfileLikelihood(const vyf, const mx, const cs, const sSel) Uses the previous function for evaluating the profile likelihood over the interval (-0.99,0.99) and plots it versus φ. The horizontal line is drawn
at the maximum minus one half of the 95-th percentile of the chi square distribution with one degree of
freedom. Values of φ in the region where the likelihood is above the line do not differ significantly from
the maximiser at 5% level.
GridSearch(const vyf, const mx, const cs, const sSel, ...) Performs estimation of the φ parameter
via a grid search over the interval (-1, 1). The user may modify the range of the search specifying a
different lower bound or a different range.
SsfInnovations(const vyf, const dphi, const mx, const cs, const sSel) Runs the augmented Kalman
filter (equations (11)) and computes the innovations ṽt , along with their variance f˜t , as described in
section 4.2.
SsfFilteredEst(const vyf, const dphi, const mx, const cs, const sSel) Computes the real time or filtered estimates for the state space model augmented by appending yt to the state vector. The last element
of the filtered state is the real time estimate of the disaggregated series. Only the estimation error variance
for this element is returned.
SsfSmoothedEst(const vyf, const dphi, const mx, const cs, const sSel) Computes the smoothed estimates for the state space model augmented by appending yt to the state vector. The last element of the
smoothed state vector provides the estimate of the disaggregated series. Only the estimation error variance
for this element is returned.
The function implements the smoothing algorithm proposed by de Jong (1988), appropriately adapted to
hand missing values, discussed in section 4.2.
LinearDisaggregation(const vyf, const mx, const cs, const sSel) This function performs maximum likelihood estimation, plotting the profile likelihood, computes the real time and smoothed estimates of the
disaggregated series, and plots them along with 95% confidence bounds, computes the standardised innovations and plots them along with their correlogram and nonparametric density estimate.
Nonlinear disaggregation
The functions implementing the disaggregation with nonlinear temporal aggregation constraint arising when the
model is specified in the logrithms, are described below. Here, vYf denotes the 1 × n vector
c
{".",".",".", Y4c , ".",".",".", Y8c , ".", ".", ".", Y12
, . . .}.
114
LGAMLogLikc(const vYf, const dphi, const mx, const cs, const sSel, const vyhat) Evaluates the profile likelihood (under fixed and diffuse effects) for the linear and Gaussian approximating model (19) using
{ỹt } for the Taylor expansion.
LGAMSmoothedEst(const vYf, const dphi, const mx, const cs, const sSel, const vyhat) Computes the
smoothed estimates of the disaggregate series for the linear and Gaussian approximating model (19) based
on {ỹt }.
SequentialPostMode(const vYf, const dphi, const mx, const cs, const sSel, const vyhat) Starting from
a trial disaggregated series {ỹt }, computes the final feasible estimate of the disaggregated series iterating
until convergence the constrained linear sequential algorithm described in section 5.
LGAMGridSearch(const vYf, const mx, const cs, const sSel, ...) Performs estimation of the φ parameter via a grid search over the interval (-1, 1). The user may modify the range of the search specifying a
different lower bound or a different range.
SsfNLProfileLikelihood(const vYf, const mx, const cs, const sSel) Evaluates the profile likelihood over
the interval (-0.99,0.99) and plots it versus φ.
115
116
Temporal Disaggregation Techniques of Time Series by Related Series: a
Comparison by a Monte Carlo experiment
di Anna Ciammola, Francesca Di Palma e Marco Marini (ISTAT)
Abstract
This work presents a comparison of different techniques for disaggregating annual flow time series
by a quarterly related indicator, based on a Monte Carlo experiment. A first goal of the study
is related to the estimation of the autoregressive parameter implied by the solution proposed by
Chow and Lin (1971), which is the most used technique by National Statistical Institutes (NSI).
Three estimation approaches have been considered, being the more recurrent in the literature: the
inversion of the relationship linking the first order aggregated autocorrelation and the autoregressive
parameter at the highest frequency (Chow and Lin, 1971), the maximization of the log-likelihood
(Bournay and Laroque, 1979), and the minimization of the sum of squared residual (Barbone, Bodo,
and Visco, 1981). We evaluate the accuracy of the estimated autoregressive parameter from these
approaches and compare the disaggregated series obtained with the simulated ones. Then, the
comparison is extended to other regression-based techniques based on the proposals by Fernández
(1981), Litterman (1983), Santos Silva and Cardoso (2001) and Di Fonzo (2002). Nearly one hundred
and fifty scenarios were designed, in order to detect the conditions that allow each technique to
obtain the best disaggregation (in terms of in-sample and out-of-sample accuracy), verify whether a
technique outperforms the other ones and evaluate the efficiency of the parameter estimates obtained
maximizing the log-likelihood and minimizing the sum of squared residuals.
1
Introduction
The frequency at which official statistics are released by National Statistical Institutes (NSI) or other
data producers is decided on the basis of several factors: the nature of the underlying phenomenon,
the burden of respondents, budgetary constraints, etc. It follows that official time series are often
available at lower frequency than users would like. The lack of high frequency indicators could be
overcome with the help of mathematical, statistical or econometric techniques to interpolate, distribute
or extrapolate the missing values at the desired frequency. Such methods go under the name of temporal
disaggregation (or benchmarking) techniques. Often, NSI themselves rely on such methods when a direct
estimation approach cannot be accomplished: the indirect approach followed by some European NSI in
the estimation of quarterly national accounts (QNA) is a clear example. This task is usually performed
using techniques which base the disaggregation on one (or more) indicator series, available at an higher
frequency, somehow related to the objective series. Chow and Lin (1971) derive a general formulation
of the disaggregation problem. They obtain a least-square optimal solution in the context of a linear
regression model involving the missing series and the related indicators; moreover, they suggest to impose
117
a first-order autoregressive structure to the residual term. This solution requires an estimation of the
autoregressive parameter at the high-frequency (HF) level, which, however, can only be inferred by the
relationship of the variables at the lower frequency (LF). Since the temporal aggregation alters much
of the properties of the HF autoregressive process, an exact identification from the LF residuals is not
possible.
Different strategies have been developed to get an estimate of the autoregressive parameter from the LF
data: the most applied procedures are those proposed by Chow and Lin (1971), Bournay and Laroque
(1979) , and Barbone, Bodo, and Visco (1981) and will be illustrated in the next section. In the
meantime, other authors have proposed alternative restrictions on the DGP of the disturbance series in
the HF regression model. Fernández (1981) proposes a random walk model for the disturbances that
avoids the estimation of parameters at the HF level. Litterman (1983) refines the Fernández solution by
introducing a Markov process to take account of serial correlation in the residuals. Wei and Stram (1990)
encompasses the three solutions, generalizing the restriction in the class of ARIMA (AutoRegressive
Integrated Moving Average) processes. Recently, some authors have proposed techniques based on
dynamic regression models in the identification of the relationship linking the series to be estimated and
the related indicators. We refer to the works of Salazar, Smith, and Weale (1997), Salazar, Smith, Weale,
and Wright (1997), Santos Silva and Cardoso (2001), and Di Fonzo (2002).
An empirical comparison of the performances of temporal disaggregation techniques might be obtained
by using real-world data. Many series of interest are however observed only at annual/quarterly level,
so that any judgement on the performance of a method can merely be done by measuring distances
between the disaggregated series and the related quarterly/monthly indicators. This paper instead
presents evidence based on a simulation study which investigates on the relative quality of the estimates
from alternative solutions and estimation methods. The objective series are derived as sum of two
components, the indicator series and the disturbance series both simulated at the HF level. Hence, the
series of interests are completely known at the desired frequency and can be used as benchmarks for
evaluating the disaggregated series.
Comparative studies of disaggregation techniques with simulated time series have already been developed
but, in our opinion, they are based on too restrictive hypotheses concerning the data generation process
used to simulate the series and the type and the number of alternative methods compared. Chan (1993)
compares the quarterly disaggregation procedure by Wei and Stram (1990) with other five methods
which do not make use of related indicators. A similar exercise has been recently presented by Feijoo,
Caro, and Quintana (2003), which consider a more refined simulation design to take account of seasonal
components in the simulated series. Pavia, Vila, and Escuder (2003) perform a simulation experiment in
order to assess the quality of the estimates obtained through the disaggregation procedure proposed by
Chow and Lin (1971), but only an estimation strategy (similar to those suggested by Chow and Lin) has
been used. Finally, Caro, Feijoó, and Quintana (2003) extend the comparison to other proposals based
on the best linear unbiased solution given by Chow-Lin. The evaluation of the methods is fairly different
with respect to our work because they do not consider any admissibility condition of the solutions;
furthermore, the method based on the dynamic regression model is not taken into consideration in their
analysis.
The last two references are strictly connected with our work. The scenarios considered are similar to
those composed in our experiment (in section 3 we will explain the main differences). In a first exercise
we compare the performances of the three estimation methods for the Chow-Lin solution mentioned
earlier. The methods are assessed in terms of estimation accuracy of the autoregressive parameter (and
of the disaggregated series) under a Markov-process hypothesis for the disturbance series and different
118
DGPs for the indicator series. Then, the comparison is extended to other regression-based proposals
based on the Chow-Lin approach. In this second exercise we also introduce an integration of order one
in the disturbance series, as implied by the Fernández and Litterman proposals. This allows to verify if
each method works properly when the simulated and the assumed HF disturbance series are coherent.
The dynamic solution proposed by Santos Silva and Cardoso (2001) and re-arranged by Di Fonzo (2002)
is also included in our analysis, even if we must say that the comparison with the other solutions is unfair
because the reference model used in the simulation is essentially static.
The structure of the paper is as follows. In the next section we introduce the statement of the disaggregation problem, providing a review of the methods we intend to compare. Section 3 describes the
simulation design used in our experiment. In Section 4 we present and discuss the most interesting
results obtained from the two exercises. Conclusions are drawn in the final section.
2
A brief review on temporal disaggregation methods
The objective of any temporal disaggregation technique is to derive an estimate of the underlying highfrequency (HF) observations of an observed low-frequency (LF) time series. This problem is also known
as interpolation (for stocks) or distribution (for flows) of time series (in this paper we only consider
distribution of flow time series so we will refer to it hereafter). Let us denote a vector of LF data by
yl = (y1 , y2 , . . . , yT )′
and the corresponding vector of (missing) HF observations by
yh = (y1,1 , y1,2 , . . . , yT,s−1 , yT,s )′
with s the periodicity of yh . The naive solution of the problem is to divide each value of yl by s. Such
a solution is reasonable if we suppose a straight line connecting the HF periods between subsequent
LF observations. This is certainly not the case for economic time series, because seasonal, cyclical and
irregular components do influence their movements at sub-annual frequencies. How then is it possible
to get more acceptable results both in the statistical and economic sense? The problem have been
approached in different manners by the literature. Di Fonzo (1987) provides a useful classification of the
methods into the following two categories:
• methods which derive disaggregation of the LF values using mathematical criteria or time series
models;
• methods which exploit external variables observed at the desired frequency.
The main references for the first approach are Boot, Feibes, and Lisman (1967), Lisman and Sandee (1964)
and Wei and Stram (1990). The latter proposal is more theoretically founded because the distribution
problem is solved on the basis of an ARIMA representation of the series to be disaggregated. The
methods belonging to the second group exploit the information from related time series. A further
classification in this group is between two-step adjustment methods and optimal methods. The former
techniques obtain a preliminary disaggregated series which does not fulfil the temporal constraint; a
second step is then required to adjust the HF series to the LF totals. The proposal by Denton (1971) is
the most known two-step adjustment procedure. The solutions in the second group are optimal in the
119
least-squares sense because they solve the preliminary estimation and adjustment steps in the context of
a statistical regression model which involves LF variables and HF related series.
Since optimal methods are the primary interest of this paper, we illustrate in detail the proposals (along
with estimation methods) we intend to compare. Let us first introduce some basic notation. Suppose
a n × k matrix of related time series Xh is available, with n ≥ sT . If n = sT we face a distribution
(or interpolation) problem; for n > sT , the last (n − sT ) HF sub-periods need to be extrapolated. Any
deterministic term (constant, trend or similar) might or might not be included in Xh . The following
regression model is assumed at the HF level
yh = Xh β + uh
(1)
where β is the vector of regression coefficients and uh is the disturbance series. As we will see later, the
optimal solution differs for the hypothesis on the underlying DGP of uh . For the time being, suppose
E(uh |Xh ) =
E(uh u′h |Xh )
=
0
Vh
without specifying any form for Vh .
Pre-multiplying both members of model (1) by the T × n aggregation matrix C, defined as
C = IT ⊗ 1′
where 1 is the s × 1 vector of ones, we obtain the LF counterpart of (1)
Cyh
yl
= CXh β + Cuh
= Xl β + ul .
(2)
with E(ul u′l |Xh ) = CVh C′ = Vl .
Being observable, model (2) can now be estimated by standard techniques. The optimal solution (in the
BLUE sense) is formally obtained through the expressions
bh =
y
βb =
b
Xh βb + Vh C′ Vl−1 (yl − Xl β)
[X′l Vl−1 Xl ]−1 X′l Vl−1 y0
(3)
(4)
where βb is the least square estimator of β in the LF regression (2).
bh is conditioned to the form of Vh . If
The estimator of β and, consequently, the estimated series y
Vh = In σε2 , expression (4) corresponds to the OLS formula and (3) becomes
since CC′ = sIT , then
b
bh =Xh βb + C′ (CC′ )−1 (yl − Xl β);
y
1
b
bh =Xh βb + C′ (yl − Xl β),
y
s
obtaining the naive solution we have started from. A non-spherical form of the noise is thus essential: the
problem is that this form is unknown. Two alternative strategies can be used to define the form of Vh .
First, an estimate of Vh can be inferred by the empirical measure of the aggregate covariance matrix
Vl . The form of Vh is thus suggested by the data at hand. This is the approach followed by Wei and
Stram (1990). However, two orders of problems arise from this approach. Firstly, the covariance matrix
120
of the HF disturbances cannot be uniquely identified from the relationship Vl = CVh C′ . Next, the
approach relies heavily on ARIMA model identification for the aggregate series. Economic time series
have generally a small sample size, so that the estimated autocorrelations at the LF level (say, annual)
have poor sample properties. As an alternative approach, some authors proposed to restrict the DGP of
uh to well-known structure in the class of ARIMA processes. The pioneers of this approach are surely
Chow and Lin (1971). Their work has had an enviable success in the field of temporal disaggregation:
some European NSI currently base the compilation of their QNA on this method (for example Italy,
France, and Belgium). The Chow-Lin solution is in fact understandable, easy to apply, fast and robust:
features that are very appealing from the standpoint of a data producer.
Chow and Lin (1971) present a common solution to the problems of distribution, interpolation and
extrapolation using the theory of best linear unbiased estimation. Moreover, they suggest the simple
Markov process for uh
ut = ρut−1 + εt
(5)
for t = 1, ..., n and u0 = 0. It follows that the covariance matrix Vh has a Toeplitz form


1
 ρ

1

σε2 
 2

Vh =
ρ
ρ
1


2


1−ρ
 ···

···
···
1
ρn−1 ρn−2 ρn−3 · · · 1
with E(ε2t ) = σε2 . The matrix Vh would be completely defined if the autoregressive parameter ρ were
known. In this case βb is the GLS estimator of β. The real problem is that ρ is not known and must be
b conditional to ρb, is a feasible GLS estimator of β. If ρb = 0, the matrix Vh
estimated: it follows that β,
b is simply obtained dividing by
is diagonal and the distribution of the annual discrepancies (y0 − CXβ)
four each value, inducing the spurious jumps in the series we would like to avoid. Different estimated
values of ρ imply different estimates of βb and, consequently, different estimated disaggregations.
Different estimation methods have been proposed to obtain an estimate of ρ from LF variables. We
concentrate here on three approaches, the more recurrent in the literature. A first method has been
proposed in the paper of Chow and Lin (1971). Their method considers the relationship between ρ and
cl . They propose a strategy based on the relationship
the elements of the aggregated covariance matrix V
between the autoregressive coefficient at monthly level with the first autocorrelation computed from
cl . The strategy originally proposed by Chowthe quarterly errors, which is the element [1,2] of V
Lin cannot be immediately extended to the problem of quarterly disaggregation of annual figures, as
indicated by Bournay and Laroque (1979). The quarterly autoregressive coefficient ρ and the first-order
autocorrelation of the annual disturbances φa1 are related through the following expression:
φa1 =
ρ(ρ + 1)(ρ2 + 1)2
.
2(ρ2 + ρ + 2)
(6)
An iterative procedure is then applied to derive an estimate of ρ. From an initial estimate of φa1 from the
OLS residuals of (2), the value of ρb is continuously obtained replacing the new values of φa1 in expression
(6). The iterations end up when ρb converges around a stable value with a fixed precision level.
The aggregation of a quarterly first-order autoregressive process yields an ARMA(1,1) process at annual
frequency. This means that φa1 depends on both AR and MA coefficients, so that there is not a biunivocal correspondence between the two coefficients because of the MA part (the quarterly autoregressive
parameter is simply given by ρ4 ). This is the reason why the iterative procedure of Chow-Lin does not
121
0.9
φ1a
0.7
0.5
0.3
ρ
0.1
−0.9
−0.7
−0.5
−0.3
−0.1
−0.1
0.1
0.3
0.5
0.7
0.9
−0.3
−0.5
−0.7
−0.9
Figure 1: Plot of φa1 against ρ.
obtain a solution for some values of φa1 . This occurs for φa1 < −0.13; moreover, when −0.13 < φa1 ≤ 0
equation (6) has two solutions. This can be easily verified in the plot of ρ against φa1 shown in Figure
(2). In these cases a quarterly disaggregation cannot be achieved.
Bournay and Laroque (1979) present an alternative estimation procedure based on the maximization of
the log-likelihood. Assuming normality of the residuals, the log-likelihood of a regression model with
AR(1) disturbances can be defined as
1
b = n (−1 − ln( 2π )) − n ln(b
b ′l ) − ln(|Vl |).
log L(b
ρ; β)
ul Vl−1 u
2
n
2
2
The log-likelihood can be maximized with respect to ρ in the region of stationarity (−1; 1). The optimization is obtained through an iterative computation of the matrix Vh , the vectors β and ul for a grid
b is maximum over this grid.
of values of ρ. The ML estimate of ρ is that for which log L(b
ρ; β)
The third approach is that outlined in Barbone, Bodo, and Visco (1981). The authors propose to choose
b l Vl−1 u
b ′l . They refer to it as an Estimated
the value of ρ minimizing the sum of squared residuals, u
Generalized Least Squares (EGLS) estimator. Di Fonzo (1987) shows that this solution appears to give
better results than ML when sharp movements are present in the series.
As we mentioned earlier, the quarterly disaggregation of annual national accounts aggregates are obtained
by some European NSI through the application of the Chow-Lin technique. The variant by Barbone,
Bodo, and Visco (1981) is currently applied by ISTAT, the Italian NSI, in the compilation of quarterly
national accounts. An algorithm similar to those of Chow and Lin (1971) is used by INE, the Spanish NSI,
while the Bournay and Laroque’s solution is adopted by the Belgium statistical agency. A comparative
assessment of the three estimation approaches is the objective of our first Monte Carlo experiment.
So far, we have presented an outline of the literature connected to the Chow and Lin’s suggestion of
using an AR(1) structure for uh . Some criticisms to this solution come from Fernández (1981). The
specification of a variance-covariance matrix is impossible because the data are not observed at the HF
122
level. Furthermore, the AR(1) hypothesis might introduce an artificial step between the last period
of one year and the first period of the next. The alternative structure for the HF noise proposed by
Fernández is the random walk model
ut = ut−1 + εt
(7)
with u0 = 0.
The advantage of this solution is that the form of Vh is completely known without requiring any estimation procedure. In fact, the initial condition is sufficient to guarantee its existence. Assuming


1
0 0 ··· 0 0
−1 1 0 · · · 0 0




0
−1
1
·
·
·
0
0

,
D=
..
.. . .
..
.. 
 ..

.
 .
.
.
.
.
0
0
0
· · · −1 1
the matrix Vh is defined as
Vh = σε2 (D′ D)−1

1
1


1
=
.
.
.
1

1 ···
1
1
2 ···
2
2


2 ···
3
3 .
.. . .
..
.. 

.
.
.
.
2 ··· n − 1 n
Model (1) with hypothesis (7) imply that β is not a cointegrating vector for yh and Xh : the objective
variable and the related series are thus to be modelled in difference form.
An interesting extension of the Fernández proposal is obtained by the use of the logarithmic transformation of yt (Di Fonzo, 2002). The absence of additivity of log-transformed variables can be worked
around by using Taylor approximations and benchmarking techniques to fulfil temporal constraints if
discrepancies are relatively large. Setting zh = log(yh ) and expressing the HF model in first differences,
we obtain
∆zh = ∆Xh β + εh ,
(8)
a model expressed in terms of rates of change of yh (approximated by its logarithmic difference). This
solution seems appealing because many economic models are expressed in terms of growth rates of the
variable of interest. Moreover, from the aggregation of model (8) we obtain a model expressed in terms
of the low-frequency rate of change (approximately). In other terms, the estimated model at the LF level
are fully coherent with the theoretical model supposed at the HF level.
A further modification of the procedure of Fernández (1981) is offered by Litterman (1983). In several
applications he found that the random walk assumption for the monthly error term did not remove all
of the serial correlation. As an alternative, he suggests the random walk Markov model
ut
= ut−1 + et
et
= ψet−1 + εt
(9)
with |ψ| < 1 and the initial conditions u0 = e0 = 0, for t = 1, ..., n. He compares this method with both
the Chow-Lin and Fernández solutions on some real world economic time series; his results indicate that
hypothesis (9) is more accurate than others when the estimated Markov parameter ψ̂ is positive.
123
Both Fernández and Litterman impose fixed conditions on the history of the disturbance process. During the work of the ISTAT commission (see footnote ??), Proietti (2004) and Di Fonzo (2005b) investigate on the role of the starting conditions to deal with nonstationary disturbances and provide a new
parametrization of the problem which does not need the assumption that the starting value be fixed.
The methods illustrated above are all based on a static regression model between the variable of interest
and the related indicators. This can be considered a serious drawback when the relationships are dynamic,
as those usually encountered in applied econometrics work. In the recent years there have been several
proposals to extend the use of dynamic regression models to temporal disaggregation. Di Fonzo (2002)
provides a complete technical review of this line of research. Gregoir (1995) and Salazar, Smith, and
Weale (1997) propose a simple dynamic regression model, but the algorithm needed to calculate estimates
and standard errors are rather complicated. The same linear dynamic model has been developed by
Santos Silva and Cardoso (2001) with a simpler estimation procedure. From the dynamic model
yt = φyt−1 + x′t β + εt
(10)
Santos Silva and Cardoso (2001) apply the recursive substitution suggested by Klein (1958) obtaining
the transformed model
yt
=
ut
=
t−1
X
(φi x′t−i )β + φt η + ut
i=0
φut−1 + εt .
(11)
The truncation remainder η is considered as a fixed parameter and can be estimated from the data. Since
model (2) is a static regression model with AR(1) disturbances, the classical Chow and Lin’s procedure
can be applied to obtain an estimate of φ and, consequently, of the disaggregated series in accordance
with the dynamic regression model (2).
3
The simulation design
A simulation exercise in the context of temporal disaggregation requires (at least) the generation of two
variables, the low-frequency benchmark and one (or more) high-frequency indicator. To simplify notation,
we only consider quarterly disaggregation of annual series by means of a single related indicator. We
denote the annual variable as yt and the quarterly indicator as xt,h , with t = 1, . . . , T and h = 1, . . . , 4.
The two series must be related somehow. We use the following static relationship between the indicator
series xt,h and the quarterly variable yt,h
yt,h = α + βxt,h + ut,h
(12)
where α and β are parameters to generate and ut,h represents a quarterly series generated independently
from xt,h . The series ut,h is denoted as the disturbance series. Since we are dealing with the distribution
problem, the annual benchmark yt is easily obtained by summing up the quarters yt,1 , ..., yt,4 for any
t = 1, ..., T .
The relationship (12) ensures the strong exogeneity of the indicator series with respect to yt . The signal
xt,h is “perturbated” by the noise ut,h , which can be derived from a process of any nature (stationary,
integrated, seasonal, etc.). Clearly, the more complex is the structure of ut,h the lower is the signal
preserved in yt . It follows that the exogenity of xt,h is a strong condition of our simulation model, not
allowing a joint interaction between variables and related indicators over time. Though this solution
124
may appear not very appealing from a theoretical point of view, the extent of our experiment needs a
simpler approach.
As a matter of fact, a simulation exercise based on the model (12) consists of various steps:
1. generation of both the indicator series xt,h and the disturbance term ut,h fulfilling the orthogonality
condition;
2. generation of the parameters α and β;
3. derivation of the dependent series yt,h in accordance with different level of adequacy of the model
fit;
4. computation of the annual benchmark yt ;
5. temporal (monthly or quarterly) disaggregation of yt based on the related indicator xt,h by different
disaggregation techniques and estimation methods;
6. comparison of the estimated HF data ŷt,h (and the estimated parameters α̂ and β̂) with the
generated data yt,h (and the true parameters α and β) in order to assess the accuracy of the
estimates.
Here we deal with the first four steps, whereas the last ones are described in the next section as they
concern the results and their assessment.
As far as the first step is concerned, the indicator series and the disturbance term are both generated
according to ARIMA models. An ARIMA model can be expressed, apart from a constant, as
φ(L)Φ(Ls )(1 − L)d (1 − Ls )D xt,h = θ(L)Θ(Ls )at,h ,
(13)
where: (i) φ(L) = 1 − φ1 L − . . . − φp Lp , Φ(Ls ) = 1 − Φ1 L2 − . . . − ΦP LsP , θ(L) = 1 − θ1 L − . . . − θq Lq
and Θ(Ls ) = 1 − Θ1 Ls − . . . − ΘQ LsQ are finite polynomials in the lag operator L, of order p, P , q and
Q; (ii) s = 4(12) for quarterly (monthly) observations; (iii) at,h is a sequence of N IID(0, σ 2 ) variables.
A multitude of ARIMA specifications can be defined from (13). To choose among them, hundreds of series
currently used in the process of estimating QNA were analysed using the procedure TRAMO-SEATS
(Gomez and Maravall, 1997). The ARIMA models, automatically identified, were ranked according to
the number of series they were fitted on.29 In our experiment, this empirical results allow us to consider
very simple models with d, D, p, q, Q = 0, 1, P = 0, s = 4 and σ 2 = 1; a constant term is added in
order to obtain positive data. These data are then multiplied by a scale factor to guarantee a minimum
amount of volatility in the indicators. Table 1 displays the models chosen to generate the indicators and
their coefficients (φ, θ and Θ). These are fixed to the average of the parameter estimates coming from
the real series.
29 The series currently used to estimate QNA and analysed in this paper are indicators of the industrial production, the
compensations of employees and the household consumptions. These last indicators, in number index form, are just used
to produce QNA data and are not released to the final users.
125
Table 1: ARIMA models and parameters used to generate the indicator series.
ARIMA models
(0, 1, 1)(0, 1, 1)
(0, 1, 0)(0, 1, 1)
(1, 0, 0)(0, 1, 1)
(0, 1, 1)
Parameters
φ
θ
Θ
0.4 0.6
0.6
0.4
0.4
-
Seasonal
adjustment
no / yes
no / yes
no / yes
no
Name
I1 / I1sa
I2 / I2sa
I3 / I3sa
I4
A seasonal adjusted version of the indicator series is obtained by applying TRAMO-SEATS.30 In order
to best reproduce the ordinary compilation of QNA, the seasonal adjustment is carried out identifying
the ARIMA models and estimating their parameters automatically. Clearly, the estimated models may
differ from the ARIMA processes used to generate the raw series.
The innovation series at,h is derived from a standardized normal distribution using the GAUSS function
rndn based on the algorithm proposed by Kinderman and Ramage (1976). A Ljung-Box test is performed
to verify the randomness of at,h : when the Ljung-Box statistic computed on the first 16 autocorrelations
exceeds the value corresponding to a 10% probability level, the generated series is discarded and replaced
with a new one. In order to reduce the effect of the initial condition a0 = 0, 4T + 100 observations are
first generated for the disturbance term and the first 100 observations are then discarded from the final
indicator series.
As far as the model for the disturbance series is concerned, we have to distinguish two different contexts.
The first context is the comparison of three estimation methods used in the Chow-Lin’s solution, then
the disturbance series are generated from simple first-order autoregressive model:
ut,h = ρut−1,h + εt,h .
(14)
Since different values of ρ may produce strong changes in the properties of the estimation methods, we
simulate nineteen configurations of (14) with ρ = 0.05, 0.10, . . . , 0.90, 0.95. Hereinafter, we denote this
simulation exercise by E1.
When the purpose of the simulation exercise is the comparison among various disaggregation techniques,
the disturbance series are not only generated from (14), but also from a simple random walk model
(supposed in the Fernández’s approach)
ut,h = ut−1,h + εt,h .
(15)
and from a Markov-random walk model (supposed in the Litterman’s approach)
ut,h
= ut−1,h + et,h
et,h
= φet−1,h + εt,h .
(16)
Because of the computational complexity of the experiment, in this second exercise, denoted by E2, we
simulate three configurations of (14), (15) and (16) with ρ, φ = 0.1, 0.5, 0.9 (see table 2).
30 The generation of seasonal series and the next seasonal adjustment could seem a pointless complication as in economic
short-term analysis infra-annual data are mainly used in seasonal adjusted form. Actually, such a process meets the
requirements of the European regulation. In accordance to European System of Accounts (ESA95), NSI are required to
produce both raw and seasonally adjusted QNA.
126
Table 2: ARIMA models and parameters used to generate the error series in exercise E2.
ARIMA models
(1, 0, 0)
(0, 1, 0)
ρ
0.1
0.5
0.9
Name
C1
C5
C9
-
F
ARIMA models
(1, 1, 0)
φ
0.1
0.5
0.9
Name
L1
L5
L9
The innovation series εt,h is drawn with the same properties of at,h .
Two transformations are applied to the generated disturbance series: (i) a constant term is added to
obtain positive data; (ii) the standard deviation, σu , is changed in order to perturb the signal of the
indicator series. The former transformation modifies the average of the series; the latter one implies a
modification of the coefficient of determination R2 , according to the formula:
σu∗
=
r
1 − R2 2 2
β σx .
R2
(17)
The final error series are then derived as
u∗ =
σu∗
u.
σu
(18)
Clearly, for larger values of R2 , we expect better results from all the estimation methods and the disaggregation techniques. To avoid useless and overlapping results, in the experiment E1 the coefficient of determination is fixed to R2 = 0.9, whereas in E2 three different levels are considered with R2 = 0.3, 0.6, 0.9.
Finally, to complete expression (12) we have to choose the values for the constant α and the regression
coefficient β. In order to understand their effects on the simulation results, we tried two ways. First,
we extracted different couples of values for (α, β) from uniform distributions; then, we kept fixed the
coefficients to α = 100.000 and β = 1 throughout the experiments. Given the similarity of the results
achieved, we chose the latter approach. Moreover, on the one hand, the large constant makes the
simulated series yt,h similar to a generic QNA aggregate in value and helps the interpretation of results
in terms of growth rates; on the other hand, fixed regression coefficients allow us to assess bias and
standard error of the respective estimates.
The quarterly simulated series yt,h is then aggregated over time using the matrix expression
y0
=
(IT ⊗ 1′4 )y
y0
=
( y1
y
=
( y1,1
y2
· · · yT )′
y1,2
y1,3
y1,4
· · · yT,3
where IT is the identity matrix of dimension T and 14 = ( 1
1 1
yT,4 )′
1 )′ .
The number of years used in the experiments is T = 26 (104 quarters): twenty-five years are used
for the estimation of the regression model, while the last year is left apart to evaluate the forecasting
127
performance of the methods. Combining the indicator and the disturbance series, 261 scenarios are
generated in all: 114 of them concern the exercise E1, the other ones (147) the exercise E2. For each
scenario, 500 couples (with a quarterly indicator and an annual benchmark) are first generated and then
processed using two or three estimation methods and various disaggregation approaches, running almost
two million disaggregations!
The grid search for the autoregressive parameter is performed in the interval [−0.999, 0.999]. To reduce
the computational time, we adopt a two-stage scanning procedure. In each stage a 25-step grid search is
used to optimize the objective function; when in the first stage a solution |ρ̂| > 0.92 is achieved, a finer
grid of 51 steps is used. This choice will be clearer in section 4; here we only sketch that a finer grid
near the bounds ±1 is required because some solutions (i.e. |ρ̂| > 0.998) are considered non-admissible
and rejected.
All the experiments are performed in GAUSS 4.0 and the temporal disaggregations are carried out using
the GAUSS routine TIMEDIS.g (Di Fonzo, 2005a). Two personal computers with AMD Athlon XP 2400
processor 2.4 Ghz and 240 Mb RAM are employed.
4
The simulation results
The following two sub-sections show the main results of exercises E1 and E2. The differences of the
experimental design have been already described; we now specify which methods have been taken into
account and how the results have been evaluated in each experiment.
A common element of both exercises is the distinction between admissible and non-admissible solutions.
We consider a solution acceptable when the following two conditions are contemporaneously fulfilled: the
estimated value of the autoregressive coefficient lies in the region of stationarity and the disaggregated
series does not contain any negative value. The rationale behind this choice is both theoretical and
pragmatic. When a stationary autoregressive component is supposed for the disturbance term, we
b = −0.999 or (b
b = 0.999). In
cannot accept solutions on the boundaries of the interval (namely (b
ρ, φ)
ρ, φ)
these cases a theoretical assumption is very likely to be violated and the resulting disaggregations must
not be considered. On the other hand, the presence of negative values in the quarterly disaggregation
when the annual variable assumes only positive values is an unpleasant result, especially when it is in
contrast with the definition of the variable (like GDP or consumption).
The question of admissibility of the solutions is particularly relevant in exercise E1, the comparative
study of the three estimation methods for the Chow-Lin approach illustrated in section 2, hereafter
denoted with CL (Chow and Lin, 1971), ML (Bournay and Laroque, 1979) and SSR (Barbone, Bodo,
and Visco, 1981). As we noticed in section 2, CL method provides non admissible solutions for certain
values of φa1 . The other estimation methods can instead provide estimated values for ρ on the boundaries.
We noted in the experiments that the condition ρb = 0.999 is only verified with SSR, while ML presents
several solutions with ρb = −0.999. Therefore, each method is characterized by a single non-admissible
condition. Differently from the second exercise, we restricted the region of admissibility to (−0.9; 0.999).
In fact, the disaggregated series obtained by ML show too erratic movements when ρb < −0.9 and are
excluded from the calculation of the aggregate statistics.
The estimation methods are evaluated in terms of their accuracy in reproducing the simulated coefficients
of the regression model and in terms of quality of the resulting disaggregations, in both in-sample period
(100 quarters) and out-of-sample period (4 quarters). Accordingly, the results are organized in two
128
separate sections.
The former concerns the estimation of the three coefficients (α, β, ρ). The estimated regression coefficients
α and β are compared to the fixed values used in the experiments, 100000 and 1 respectively. A boxplot
representation is used to compare the estimates of β. Given the huge amount of results, measures of
aggregation across the time and the experiment dimensions are needed and will be explained later in
this section.
The second exercise (E2) is a comparison of the performances of several techniques based on the solutions proposed by Chow and Lin (1971), Fernández (1981), Litterman (1983), Santos Silva and Cardoso
(2001) and Di Fonzo (2002). Table 3 identifies with acronyms the selected methods, which differ for the
regression model, the disturbance model, the estimation method, the deterministic term, the starting
condition and the logarithmic transformation used. However, some configurations will not be considered
in the next tables because the results obtained were not very interesting. For example, with the logarithmic transformation of the objective series we never obtained significant improvements of the results. To
simplify the readability of the tables we will only show the results from the methods shown in boldface,
which can be considered as the most performing in our exercises.
Table 3: Disaggregation methods considered in exercise E2.
acronyms
CL ssr
CL ssr -c
CL ml
CL ml -c
FER
FER -c
FER nsc
LFER
LFER -c
LIT ssr
LIT ssr -c
LLIT ssr
LLIT ssr -c
LIT ml
LIT ml -c
LLIT ml
LLIT ml -c
LIT nsc
ADL(1,0) ssr
ADL(1,0) ssr -c
ADL(1,0) ml
ADL(1,0) ml c
regression
model
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
ADL(1,0)
ADL(1,0)
ADL(1,0)
ADL(1,0)
disturbance
model
ARIMA(1,0,0)
ARIMA(1,0,0)
ARIMA(1,0,0)
ARIMA(1,0,0)
ARIMA(0,1,0)
ARIMA(0,1,0)
ARIMA(0,1,0)
ARIMA(0,1,0)
ARIMA(0,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
ARIMA(1,1,0)
WN
WN
WN
WN
estimation
method
SSR
SSR
ML
ML
SSR
SSR
SSR
SSR
ML
ML
ML
ML
ML
SSR
SSR
ML
ML
deterministic
term
constant
none
constant
none
constant
none
constant
constant
none
constant
none
constant
none
constant
none
constant
none
constant
constant
none
constant
none
starting
condition
fixed
fixed
fixed
fixed
fixed
fixed
estimated
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
estimated
fixed
fixed
fixed
fixed
logarithm
no
no
no
no
no
no
no
yes
yes
no
no
yes
yes
no
no
yes
yes
no
no
no
no
no
Again, we verify the estimation accuracy of the regression coefficients. However, to make the comparison
fair we only relate estimated and simulated coefficients when the simulated disturbance model is coherent
with the assumed one by each method. To make an example, the estimate of β provided by CL ssr is
compared with the simulated one only for the experiments with an ARIMA(1,0,0) process for the disturbance series; similar considerations hold for the other two static solutions, while the estimated coefficients
by ADL(1,0) (Autoregressive Distributed Lag) are not comparable with any simulated counterpart.
The quality of the disaggregated series is tested by standard measures of comparison. We evaluate the
goodness-of-fit of the quarterly series both in-sample and out-of-sample. Denoting with yt the simulated
series and with yˆt the estimated disaggregation, we compute for each method the root mean square
129
percentage error on the levels (RMSPEL) and the root mean square error on the first-differences (RMSE1)
as
v
u 100 2 uX yˆt − yt
t
RMSPEL =
100
100
yt
t=1
v
u 100
uX (δ yˆt − δyt )2
RMSE1 = t
99
t=2
where
δ ŷt =
ŷt − ŷt−1
100
ŷt
δyt =
yt − yt−1
100 .
yt
In exercise E1 we also compute a restricted RMSE for the rates of change corresponding to the first
quarters. These are in fact the most critical quarters, in which there might be spurious jumps introduced
by a bad disaggregation of the annual figures.
Besides, we analyse the forecasting performance of the methods over the four quarters dropped out from
the estimation period. The four extrapolated quarters are annually aggregated to derive the annual
extrapolated figure. The percentage error of the extrapolated level (PEL) is computed, while the quarterly growth rates corresponding to the four extrapolated quarters are evaluated with their mean error
(ME1) and root mean square error. The former permits to highlight bias in the forecasts, while the
latter coincides with the expression given above.
The aggregation across experiments is made by the computation of simple averages and standard errors
of the statistics. In order to have a synthetic view on the quality of the methods, we finally construct
ranking of the methods based on the averaged statistics.
4.1
Exercise E1: a comparison of alternative estimation methods for the Chow-Lin solution
The admissibility of the solutions
Table 4 presents the percentage of non-admissible solutions for the three methods in the different scenarios. These percentages are indicative of the robustness of the disaggregation methods, intended as
their ability to give acceptable results to a wide range of situations.
Firstly, the largest percentage of non-admissible solutions is found for CL while this is very low for SSR.
The number of non admissible solutions are related to the value of ρ; while ML and CL show a strong
inverse relationship, the percentage of non admissible solutions for SSR remains bounded for any value
of ρ.
For ρ = 0.1 more than 60% of solutions are discarded for CL; the percentage decreases to 30% for ρ = 0.5
while for ρ = 0.9 a full admissibility of the solutions is obtained. The percentage for ML, around 30% for
ρ ≤ 0.5, rapidly decreases to zero. SSR shows lower percentages for all values of ρ. Moreover, it is worth
noting that the non-admissibility condition set for SSR is less dangerous. In fact, when ρ̂ = 0.999 the
estimated series is coherent with the simulated ones. This cannot be said for disaggregations obtained
with ρ̂ < −0.9; in this case the estimated series is affected by sudden changes between positive and
negative values which destroy the dynamic of the original series.
130
These results show that the minimization of the sum of squared residuals is the optimization procedure
providing the highest percentage of admissible results. The statistics in the following tables and figures
are computed considering only the admissible solutions.
The estimation of parameters
Figure 2 shows the boxplot relative to the estimation of β performed by SSR, ML and CL. Different values
of ρ are displayed in columns, while the rows refer to the different indicator models (I1, I2 and I3, not
seasonally adjusted). The percentage of admissible solutions is shown in brackets for each method. It is
clear from the graphs that the best estimates of β are achieved with the scenario ρ = 0.1; as ρ increases,
the width of the box (and the length of the whiskers) becomes wider and wider. No regularities are
detected across the different indicator models. Looking at the estimation methods, SSR is that showing
the poorest performance while ML and CL give roughly the same results.
I1
ρ = 0.1
I2
ρ = 0.9
SSR (97.4)
SSR (97.2)
SSR (91.8)
ML (68)
ML (73)
ML (99.8)
CL (37.8)
CL (71.8)
CL (100)
0.70
1
1.30
0.70
1
1.30
0.70
SSR (97.4)
SSR (97.8)
SSR (95.2)
ML (68.2)
ML (73.2)
ML (99.8)
CL (36.8)
CL (70.8)
CL (99.6)
0.70
I3
ρ = 0.5
1
1.30
0.70
1
1.30
0.70
SSR (100)
SSR (99.8)
SSR (91.2)
ML (75)
ML (76)
ML (99.6)
CL (37.2)
CL (74.2)
CL (100)
0.70
1
1.30
0.70
1
1.30
0.70
1
1.30
1
1.30
1
1.30
Figure 2: Boxplot relative to the estimation of β.
Table 5 shows the percentage bias and error, averaged across the 500 experiments, relative to the GLS
estimators of α and β. Confirming the results in the previous graphs, a lower standard error of the
estimates for CL and ML is noticed relative to that of SSR, which always presents a larger volatility.
In Figure 2 the estimated and simulated values of ρ are compared for the three estimation methods
(only I1 is considered, being the graphs of the other indicator models very similar). The average and the
standard error of the estimates ρ̂ obtained across the experiments are computed and plotted, respectively
the solid line and the dot-dashed line. The dotted line represents the ideal situation, corresponding to
131
ρ̂ = ρ.
The left panel shows that the SSR estimation is stable around 0.90 with a very low variability and
therefore almost unconditional to the simulated autoregressive disturbance. Then, estimating ρ with
SSR practically corresponds to constrain the parameter in the interval (0.90, 0.95). Conversely, both ML
and CL give estimates which are positively related to ρ with a very large variability in the experiments.
In particular for ρ < 0.50 ML obtains several disaggregations with negative values of ρ̂. A reduction of
the standard error can be noticed for ρ > 0.80. An elevate standard error can either be seen for CL but
no solution with ρ̂ < 0 is actually obtained. The accuracy and stability of the estimates slightly improve
for ρ > 0.50.
ML
SSR
CL
1
0.9
0.9
0.7
0.7
0.5
0.5
0.5
0
0.3
0.3
0.1
0.1
-0.5
-0.1
-0.1
-1
-0.3
0.05
0.20
0.35
0.50
0.65
0.80
0.95
-0.3
0.05
0.20
0.35
0.50
0.65
0.80
0.95
0.05
0.20
0.35
0.50
0.65
0.80
0.95
Figure 3: Average and standard deviation of ρb for different values of ρ.
The disaggregated series
The performances of the estimation methods are compared by evaluating the relative accuracy of the
estimated disaggregations in reproducing the simulated series, in both in-sample (100 quarters) and
out-of-sample (4 quarters) periods.
The in-sample comparison among the methods has been made either for raw and seasonal adjusted
series. We report in Table 6 the RMSPE on the levels (L) of the raw and seasonal adjusted series and the
RMSE on the growth rates (G1) of the seasonal adjusted series. These series are also evaluated through
an analysis quarter by quarter: the table shows the RMSE of growth rates relative to the first quarters
(G1Q1), in order to evaluate the ability of the different methods to avoid spurious jumps between the
fourth quarter of one year and the first quarter of the next.
Considering I1, larger values of ρ improve the accuracy of the disaggregated series: we can note that the
average statistics (on levels and growth rates) are more than twice from ρ = 0.1 to ρ = 0.9. The statistics
shown by SSR and CL are equivalent for any value of ρ. Instead, those of ML are always larger for low
values of ρ with higher standard deviations; for ρ = 0.9 the results become indistinguishable from those
of SSR and CL. The RMSE on the first quarters reveals some problems of ML for low values of ρ (for
ρ = 0.1, 8.39% of ML against 5.06% of CL). This implies a larger presence of jumps in the estimated
series by ML, an unpleasant property in the distribution of a time series.
Similar comments can be made for I2 and I3. No difference arises among the methods when ρ = 0.9
while, if ρ < 0.9, a greater ability of CL and SSR can be observed. The statistics for I3 are the lowest:
132
this can be explained by the simple structure of the model, because of the absence of an integrated
component. Comparing the statistics for raw and seasonal adjusted series, any effect in the performance
seems to be due to the seasonal component.
In brief, the in-sample results show the same performances for ρ > 0.9 while, if ρ < 0.9, CL and SSR
outperform ML. This statement is confirmed by Table 7, which shows the percentage of times with the
best RMSPEL for each method. Slight changes of this proportion can be noticed in the scenarios. The
percentage of successes for SSR decreases as ρ increases while the opposite is true for ML; any clear
relationship is noticed with the three models considered. The percentage with the best RMSPEL for ML
varies from 6.2% (ρ = 0.15) to 25.4% (ρ = 0.95). The percentage relative to SSR is always greater than
50% for ρ ≤ 0.55 and even when ρ > 0.55 SSR largely outperforms the other methods. The percentage
with the best RMSPEL for CL is around 30% and does not show any clear relationship with ρ.
Combining the results from Table 6 and 7 we would give our preference to SSR, characterized by a low
percentage of non-admissible solutions and a better adequacy of the estimated series.
The out-of-sample performance is assessed for both raw and seasonal adjusted series by computing the
percentage error of the annual extrapolated level (derived as the sum of the four extrapolated quarters).
The ME and RMSE of growth rates for the extrapolated quarters are also computed. The former
statistics are needed to evaluate the accuracy of the annualized forecast on the basis of the indicator
series, while analysing the errors in terms of growth rates we try to evaluate the ability of the different
methods to reproduce the short-term information of the indicator.
In Table 8 the forecasting results are presented. The forecasting accuracy differs significantly for different
indicator models. We notice a better quality of the extrapolated figures with I3, while the worst results
are obtained with I2. ML is the most accurate estimation method when the goal is to estimate the annual
value of the objective series, while the quarterly movements are often replicated with better accuracy by
SSR and CL. For example, in the scenario I2 and ρ = 0.1 ML shows a mean absolute percentage error
of 3.85% while CL and SSR have 4.06% and 5.34% respectively. On the contrary, the RMSE on the
quarterly growth rates of ML is sensibly higher (14.17% against 10.77% of CL and 11.66% of SSR). A
further interesting result is that all the estimation methods under-estimate the true level of the quarterly
series for the scenario I2, especially for low values of ρ.
4.2
Exercise E2: a comparison of alternative regression-based techniques
The admissibility of the solutions
Table 9 shows the percentage of admissible solutions for five disaggregation approaches (CL, FER, LIT,
LIT snc and ADL(1,0)) and two estimation methods (SSR and ML). Because of the large number of the
scenarios simulated in exercise E2, results are presented aggregating them by indicator model and by
disturbance model, whereas the last column refers to all the scenarios.
The table contains percentages both in boldface and in normal font. The percentages printed in bold type
show the proportion of series for which the disaggregation approach indicated in the first column fulfil the
admissibility conditions. On the contrary, the percentages printed in normal font refer to the proportion
of series for which both the approaches indicated in the first column fulfil the admissibility conditions. In
other words, they show the size of the series subsets for which two approaches give admissible solutions
simultaneously. As it will be seen later, this helps the interpretation of the results concerning the
133
disaggregation performances and the comparison among the various approaches. Therefore we do not
describe them in this section and we only consider the results printed in bold type.
From the percentages in the last column, it can be seen that the Fernández solution always fulfils the
admissibility conditions, as it does not require the estimation of any autoregressive parameter. For CL
the percentage of admissible solutions exceeds 85% and it depends on the estimation method: 85.8% for
SSR and 98.4% for ML31 . An opposite result comes from LIT: the percentage is 98.4% for SSR, drops to
55% for ML and gets worse by estimating the starting condition (49%). Similarly, ADL(1,0) ssr (91.5%)
outperforms ADL(1,0) ml (78.9%).
This regularity is confirmed by the percentages presented in boldface in the columns 2-8, where the
results are aggregated by indicator model. This means that the ARIMA models used to generate the
indicator series, in particular their integration order and their seasonality, do not influence the number
of admissible solutions (except for ADL(1,0) ssr).
From the aggregation by disturbance model (columns 9-15, figures in boldface) it is seen that the percentage of admissible solutions depends upon the ARIMA model and the estimation method. In fact,
for ML, the larger ρ, φ and the integration order, the fewer the solutions to be discarded. An opposite
regularity is detected for SSR.
The estimation of parameters
As we stressed in Section 3, the use of fixed parameters to generate the disturbance series allows us to
assess the estimation accuracy. Table 9 shows the average bias and the standard error of the estimated
coefficients β̂, ρ̂ (for CL) and φ̂ (for LIT). These statistics are computed when the temporal disaggregation
fulfils the admissibility conditions.
As regards β, CL approach gives the best estimates. In particular the bias of the ML estimates are
larger than the SSR estimates, but the former are less unstable than the latter. For LIT approach the
ML method reduces the bias and the standard error of the SSR estimates; the estimation of the starting
condition does not improve the ML results (the discrepancy between LIT ml and LIT nsc is negligible).
As far as the estimation of ρ is concerned (see the upper left corner in the second panel of Table 10),
the results are remarkably affected by the estimation method. Firstly, SSR overestimates the generated
value of ρ, even when ρ = 0.9 (this confirms the results of exercise E1), while ML underestimates the
generated ρ. Secondly, the standard error of the estimates decreases for larger values of ρ. With regard
to φ (see the lower right corner of the table), LIT ssr results are analogous to those ones achieved by CL
ssr and do not need further discussion. LIT ml performs very well, particularly for large values of the
generated parameter, and better than LIT nsc.
The disaggregated series
The accuracy of the disaggregation techniques can be evaluated through the statistics shown in tables
11-14. In order to understand better the properties of the disaggregation methods, the results are shown
by indicator and disturbance model. The results by different R2 , on the contrary, are not considered, as
31 The result is apparently in contrast with that found in the previous exercise because here we extended the admissibility
interval of ρ̂ to [-0.999,0.999].
134
they confirm our expectations: the larger the coefficient of determination, the better the disaggregated
series. The tables also distinguish in-sample and out-of-sample periods; in the former we show the
average (first line) and standard error (second line, italic font) of RMSPEL across the experiments, while
in the latter we report the same aggregated measures of the annual absolute percentage error (APEL).
Following the same reasoning introduced in Section 4.2, we arrange the data in a way that fair comparisons of the relative performances of the methods can be made. The tables include eight sub-tables,
one for each method shown in the first row (in boldface). Each sub-table shows the statistics relative
to the method denoted in the first column, computed only considering its admissible solutions (see Table 9). The next rows show the same statistics for the other methods, but calculated on the subset of
solutions for which the method in the first row have provided admissible results. Clearly, the number
of experiments considered for these other methods can be at most the same of those obtained by the
method in the first row. Consider the following examples which make clear the understanding of figures
across the tables. From Table 11 we notice that CL ml, with 97.7% of admissible solutions, obtains a
global average RMSPEL of 7.6 for I1 (see the panel in the upper right corner of the table) but produces
an higher statistic, 8.5 (see the panel in the upper left corner of the table), when only the common
solutions with CL ssr (82.4%) are selected. At the same time, from Table 13 we see that CL ml improves
its accuracy for C1 when it is crossed with ADL(1,0) ml (from 16.2 to 13.6, with 96.6% and 66.1% of
admissible results respectively).
By crossing the admissible solutions we are able to compare each method with the others on a common
set of experiments. In this way we try to help those methods (like FER) which provide acceptable
disaggregations to experiments in which other methods normally fail. Obviously, when crossed with
FER the statistics of the other methods do not change. But we are also able to discover, for instance,
that the global accuracy of FER for I2 (6.0) is much better when only the common solutions with LIT
nsc (2.8, with 48.3% of admissible results) are considered.
Unfortunately for the reader, there is a huge amount of figures in the following tables. This makes rather
complicated to have an immediate idea on the relative positions of the methods. An attempt to rank
the methods by considering both accuracy measures and percentages of admissible results is illustrated
at the end of this section. Now we instead try to highlight some interesting aspects from the results in
Tables 11-14.
The eight sub-tables of Table 11 show the in-sample accuracy of the method in the first row by different
DGPs for indicator models, compared to those of the other methods in the common set of experiments.
The scenarios I1 and I2 are those for which the disaggregation methods meet major difficulties, in both
raw and seasonal adjusted versions. The reason is probably connected with the inclusion of a secondorder integrated component in the indicator series, while I3 and I4 both contain a first-order integrated
process. An interesting result can be noted for I3: the static solutions obtain the best accuracy for
this model among the seasonal series while, on the contrary, a higher level of RMSPEL is provided by
ADL(1,0).
CL ssr improves over CL ml for all scenarios. A lower average level of RMSPEL is always achieved by the
SSR estimation of ρ; moreover, the volatility obtained by the ML estimation is almost twice. Opposite
considerations apply for the Litterman solution: estimating by maximum likelihood we always achieve
better results than SSR. The dynamic solution ADL(1,0) (both SSR and ML) never outperforms the
results given by the static solutions.
Only LIT nsc and LIT ml improve over the results by FER (for example, 2.9 and 3.2 against 5.9 for
135
I1). However, we continue to stress that these statistics are based on a different number of experiments
because of the non-admissible results (48.9% and 55.1% for the same example). The comparison of the
Fernández solution is better achieved by looking at the relative accuracy on the common solutions with
the other methods. Using this perspective we find that FER is almost always in the first positions. If
we look at the tables relative to LIT nsc and LIT ml, we note that FER improves its accuracy obtaining
roughly the same performance. It is also clear that the common solutions to LIT nsr or LIT ml are those
for which the static methods derive very similar results; this view makes clear why the good performance
of the Litterman solution is only apparent.
Table 13 analyses the results by different DGPs for disturbance series. The scenarios C1 and C5 are
those in which the disaggregation techniques have the worse results: the level of RMSPEL is sensibly
reduced when the disturbance series contain an integrated process. This evidence confirms the results
found in exercise E1.
To synthesize the results, we try to order the methods considering both accuracy of disaggregation and
capacity to give admissible results. Firstly, we assign the ranks (in ascending order) to the methods in
each table. Then, we take the rank of the method placed in the first row. The resulting list does not
take account of the different number of solutions on which the statistics are calculated. We want now
to produce a modified classification in which the methods with lower number of admissible results are
penalized. Denote the initial rank of a generic method by r. This is adjusted through the following
formula
3
AS
ra = r + 2 −
−1
TS
(19)
where AS is the number of admissible solutions and T S is the total number of experiments. The
greater is the number of non-admissible results, the more penalized is the method. When AS = T S, no
adjustment is done. If AS = 0, the maximum penalization (7) is added to the original rank. The final
order is achieved by sorting the methods by ra .
Table 15 shows the adjusted ranking ra by indicator models, for both distribution and extrapolation
periods. We remind that each line represents the adjusted ranking of each method in the corresponding
sub-table; this explains why the same rank is associated to different methods. The simple average of
the ranks is shown in the last column of the tables; the methods are listed in ascending order by this
average. The technique by Fernández is the best method in the interpolation period (1.3); for 6 out of 7
indicator models FER provides the most accurate distributions. CL ssr is in the second position (2.1),
while LIT nsc (2.3) and LIT ml (2.7) are surely the most penalized by the adjustment for non-admissible
disaggregations. CL ml is almost always in the last place (7.1), but the worst result is provided by
ADL(1,1) ml (7.3).
In spite of the bad position in the distribution problem, CL ml results as the best method concerning
the accuracy of the extrapolated annual figures. Furthermore, FER shows a much lower performance
(5.0), surpassed by LIT nsc (1.4), LIT ml (2.4) and CL ssr (4.1).
From Table 16 we can see the same tables organized by disturbance models. This analysis shows CL ssr
in the first place (2.0), for its better accuracy with AR(1) disturbances, while FER stays in the second
position (2.9). The two Chow-Lin solutions occupy the first places in extrapolation, with CL ml the
best one when AR(1) model is used for the disturbance series. FER reaches the second place when the
simulated disturbance is I(1), coherent with the theoretical assumption of the method. Similarly, LIT
136
nsc is the best method for L9 in both distribution and extrapolation cases.
5
Conclusions
It is not simple to derive general conclusions from simulation exercises. We acknowledge that results might differ changing the simulation design, like the length of series or the models for indicators/disturbances. Nevertheless, some remarks can be made about the properties of the methods we
have tested. In particular, we refer to those properties desirable for the needs of a data producer: existence of admissible solutions, estimation accuracy of the parameters and goodness of fit with respect to
the objective series.
In the first exercise we compare three estimation approaches for the autoregressive parameter assumed
in the Chow-Lin solution. In the scenarios considered the maximum likelihood procedure (ML) and the
method suggested by Chow and Lin (CL) show a larger amount of non-admissible solutions than the
minimization of the sum of squared residuals (SSR). This is particularly true for low values of the AR
parameter. However, it is important to stress that the AR parameter rarely assumes low values with
real time series and that the percentage of successes with ML and CL should improve with real-world
variables.
Considering the admissible solutions, CL and ML provide better estimates of the regression coefficients
than SSR. In general the estimates improve for decreasing values of ρ. Opposite considerations hold for
the estimation of ρ. Both ML and CL give better results as the simulated ρ increases; they turn out to be
rather volatile with respect to SSR. In particular, for ρ < 0.50 ML is likely to provide negative estimates
of ρ. The estimates of ρ by SSR are approximately around 0.90 − 0.95 with little volatility and almost
independently to the true value of the AR parameter. Even though this guarantees a good reliability of
the estimated series, it is not theoretically correct as the parameter ρ is almost always overestimated.
Finally, the estimated series obtained by SSR are generally the closest one to simulated series in the
in-sample period while ML obtain slight improvements over the other estimation methods in the extrapolation case. It must be considered that for high values of ρ the results of the methods tend to
coincide.
In the second exercise we extend the comparison to other proposals based on the Chow-Lin formulation of
the disaggregation problem. Considering either admissibility of results and accuracy of disaggregation, we
found that the Fernández approach gives the most satisfactory results in the in-sample analysis, while it
yields intermediate results in the out-of-sample analysis. As far as the forecasting accuracy is concerned,
it is the Chow-Lin solution with maximum likelihood estimation of the autoregressive parameter which
outperforms the other methods. The admissible results for the Litterman proposal are very accurate,
but the number of solutions with φ = −0.999 is too high. The results from the dynamic solutions do
not compete with those from the static techniques, but this is certainly connected with the fact that the
simulation model used in the experiments is static.
137
138
References
Barbone, L., G. Bodo, and I. Visco (1981): “Costi e profitti in senso stretto: un’analisi su serie
trimestrali, 1970-1980,” Bollettino della Banca d’Italia, 36, 465–510.
Boot, J. C., W. Feibes, and J. Lisman (1967): “Further methods of derivation of quarterly figures
from annual data,” Applied Statistics, 16(1), 65–75.
Bournay, J., and G. Laroque (1979): “Réflexions sur le méthode d’éelaboration des comptes
trimestriels,” Annales de L’Insée, 36, 3–29.
Caro, A., S. Feijoó, and D. Quintana (2003): “A simulation study for the quarterly disaggregation
of flow variables,” mimeo.
Chan, W. (1993): “Disaggregation of Annual Time-series Data to Quarterly Figures: a Comparative
Study,” Journal of Forecasting, 12, 677–688.
Chow, G., and A. Lin (1971): “Best linear unbiased interpolation, distribution, and extrapolation of
time series by related series,” The Review of Economics and Statistics, 53, 372–375.
Denton, F. (1971): “Adjustment of monthly or quarterly series to annual totals: an approach based
on quadratic minimization,” Journal of American Statistical Association, 66(333), 99–102.
Di Fonzo, T. (1987): La stima indiretta di serie economiche trimestrali. Padova, CLEUP editore.
(2002): “Temporal disaggregation of economic time series: towards a dynamic extension,”
Dipartimento di Scienze Statistiche, Universitá di Padova, working paper 2002.17.
(2005a): “TIME DISaggregation of univariate economic time series,” mimeo.
(2005b): “The treatment of starting conditions in Fernández and Litterman models for temporal
disaggregation by related series,” mimeo.
Feijoo, S., A. Caro, and D. Quintana (2003): “Methods for quarterly disaggregation without
indicators; a comparative study using simulation,” Computational Statistics and Data Analysis, 43,
63–78.
Fernández, R. (1981): “A methodological note on the estimation of time series,” The Review of
Economics and Statistics, 63(3), 471–478.
Gomez, V., and A. Maravall (1997): “Program TRAMO and SEATS: Instructions for the user,”
mimeo.
Gregoir, S. (1995): “Propositions pour une désagrégation temporelle basée sur des modeéles dynamiques simples,” INSEE, mimeo.
139
Kinderman, and Ramage (1976): “Computer Generation of Normal Random Numbers,” Journal of
American Statistical Association, 71(356), 893–96.
Klein, L. (1958): “The estimation of distributed lags,” Econometrica, 26, 553–565.
Lisman, J. H. C., and J. Sandee (1964): “Derivation of quarterly figures from annual data,” Applied
Statistics, 13(2), 87–90.
Litterman, R. (1983): “A random walk, Markov model for the distribution of time series,” Journal of
Business and Economic Statistics, 1(2), 169–173.
Pavia, J. M., L. E. Vila, and R. Escuder (2003): “On the performance of the Chow-Lin procedure
for quarterly interpolation of annual data: Some Monte-Carlo analysis,” Spanish Economic Review, 5,
291–305.
Proietti, T. (2004): “Temporal disaggregation by state space methods: dynamic regression methods
revisited,” mimeo.
Salazar, E., R. Smith, and M. Weale (1997): “Interpolation using a dynamic regression model:
specification and Monte Carloproperties,” Discussion Paper no. 126.
Salazar, E., R. Smith, M. Weale, and S. Wright (1997): “A monthly indicator of GDP,” National
Institute Economic Review, pp. 84–90.
Santos Silva, J., and F. Cardoso (2001): “The Chow-Lin method using dynamic models,” Economic
Modelling, 18, 269–280.
Wei, W. W. S., and D. O. Stram (1990): “Disaggregation of time series model,” Journal of the Royal
Statistical Society, 52, 453–467.
140
Table 4: Percentage of non-admissible solutions for different scenarios.
I1
condition
CL
ML
SSR
ρ = 0.1
not invertible
62.2
32.0
2.6
ρb < −0.9
ρb = 0.999
I2
I3
0.5
0.9
0.1
0.5
0.9
0.1
0.5
0.9
28.2
27.0
2.8
0.0
0.2
8.2
63.2
31.8
2.6
29.2
26.8
2.2
0.4
0.2
4.8
62.8
25.0
0.0
25.8
24.0
0.2
0.0
0.4
8.8
Table 5: Average bias and error (%) of the estimated regression coefficients.
ρ = 0.1
I2
I1
CL
ML
SSR
Bias
0.59
-0.01
0.25
α
b
Error
4.99
4.81
7.86
Bias
-0.20
-0.07
-0.05
βb
Error
1.38
1.35
2.67
Bias
0.70
-0.78
-0.48
α
b
CL
ML
SSR
Bias
-0.59
-1.06
-1.19
Error
6.90
7.82
11.73
Bias
-0.01
0.04
0.05
βb
Error
2.20
2.29
3.88
Bias
0.60
0.01
-0.46
α
b
CL
ML
SSR
Bias
0.08
0.11
-1.22
Error
17.09
17.07
21.53
Error
1.34
1.37
2.84
Bias
0.08
-0.10
-0.32
Error
1.34
1.26
2.58
Error
9.53
9.26
15.26
Bias
-0.25
-0.26
0.07
βb
Error
4.92
4.89
6.34
Bias
1.59
1.41
-0.94
α
b
Error
21.70
21.72
28.06
141
Bias
-0.08
0.14
0.39
βb
Error
1.88
1.84
3.66
I3
Bias
-0.10
-0.06
0.28
βb
Error
2.13
2.09
3.59
Bias
-0.18
-0.28
-0.24
α
b
Error
2.03
2.06
3.50
ρ = 0.9
I2
I1
α
b
Bias
-0.11
0.13
0.41
α
b
ρ = 0.5
I2
I1
α
b
Error
5.53
6.35
10.97
I3
βb
Bias
0.22
0.42
0.42
βb
Error
2.86
2.95
5.0
I3
Bias
-0.12
-0.09
0.42
βb
Error
4.48
4.47
5.79
Bias
0.26
0.24
0.23
α
b
Error
2.67
2.64
2.74
Bias
-0.34
-0.31
-0.29
βb
Error
3.82
3.77
3.79
Table 6: Performance measures in the in-sample period: average error and standard deviation of RMSE
(%).
raw
I1
seasonal adjusted
raw
ρ = 0.1
I2
seasonal adjusted
raw
I3
seasonal adjusted
L
L
G1
G1Q1
L
L
G1
G1Q1
L
L
G1
G1Q1
CL
3.74
(1.87)
3.71
(1.89)
5.15
(2.49)
5.06
(2.64)
4.63
(3.94)
4.62
(3.94)
5.96
(3.42)
5.75
(3.81)
0.90
(0.21)
0.64
(0.17)
0.91
(0.25)
0.89
(0.26)
ML
5.0
(2.94)
4.96
(2.96)
6.45
(3.53)
8.39
(6.02)
6.36
(4.66)
6.36
(4.69)
7.97
(4.94)
10.65
(11.34)
1.18
(0.40)
0.84
(0.30)
1.14
(0.34)
1.40
(0.65)
3.81
(1.94)
3.77
(1.97)
5.19
(2.52)
5.14
(2.70)
4.67
(3.29)
4.65
(3.32)
6.09
(3.21)
6.04
(3.64)
0.94
(0.23)
0.66
(0.17)
0.94
(0.25)
0.93
(0.27)
SSR
raw
I1
seasonal adjusted
raw
ρ = 0.5
I2
seasonal adjusted
raw
I3
seasonal adjusted
L
L
G1
G1Q1
L
L
G1
G1Q1
L
L
G1
G1Q1
CL
3.02
(1.56)
2.99
(1.59)
4.25
(2.11)
3.63
(1.86)
3.56
(2.09)
3.55
(2.12)
4.91
(2.49)
4.24
(2.39)
0.73
(0.17)
0.52
(0.13)
0.76
(0.20)
0.66
(0.19)
ML
4.02
(3.03)
3.98
(3.04)
5.35
(3.56)
6.49
(7.27)
4.92
(3.94)
4.90
(3.98)
6.46
(4.59)
8.07
(10.64)
1.02
(0.57)
0.73
(0.42)
1.00
(0.49)
1.17
(0.96)
3.04
(1.54)
3.01
(1.58)
4.29
(2.14)
3.62
(1.84)
3.63
(2.15)
3.62
(2.18)
4.99
(2.56)
4.30
(2.45)
0.79
(0.24)
0.53
(0.14)
0.78
(0.21)
0.67
(0.19)
SSR
raw
I1
seasonal adjusted
raw
ρ = 0.9
I2
seasonal adjusted
raw
I3
seasonal adjusted
L
L
G1
G1Q1
L
L
G1
G1Q1
L
L
G1
G1Q1
1.52
CL
(0.86)
1.50
(0.88)
2.19
(1.23)
1.71
(1.00)
1.84
(1.28)
1.83
(1.30)
2.58
(1.58)
2.00
(1.21)
0.43
(0.18)
0.26
(0.09)
0.40
(0.13)
0.32
(0.11)
1.52
(0.86)
1.50
(0.88)
2.19
(1.23)
1.72
(1.01)
1.86
(1.31)
1.84
(1.33)
2.60
(1.60)
2.04
(1.39)
0.42
(0.18)
0.26
(0.08)
0.40
(0.13)
0.32
(0.11)
1.53
(0.86)
1.49
(0.88)
2.18
(1.22)
1.71
(0.99)
1.85
(1.32)
1.83
(1.34)
2.58
(1.59)
2.00
(1.21)
0.43
(0.20)
0.26
(0.09)
0.40
(0.13)
0.32
(0.11)
ML
SSR
142
Table 7: Percentage of times with the best RMSPEL: in-sample period.
ρ = 0.05
CL
22.8
ML
10.8
SSR 66.4
0.15
24.4
8.2
67.4
0.25
22.0
9.4
68.6
0.35
25.2
11.2
63.6
I1
0.45
31.4
11.2
57.4
0.55
30.0
13.4
56.6
0.65
27.4
20.2
52.4
0.75
33.6
21.0
45.4
0.85
31.4
22.4
46.2
0.95
27.6
22.0
50.4
ρ = 0.05
CL
23.0
ML
6.8
SSR 70.2
0.15
26.6
6.2
67.2
0.25
23.0
9.2
67.8
0.35
27.4
9.6
63.0
I2
0.45
27.6
12.6
59.8
0.55
25.8
16.0
58.2
0.65
25.4
16.4
58.2
0.75
30.6
18.6
50.8
0.85
31.8
20.0
48.2
0.95
26.6
22.8
50.6
ρ = 0.05
CL
24.4
ML
9.6
SSR 66.0
0.15
23.4
9.2
67.4
0.25
27.0
11.8
61.2
0.35
30.4
11.0
58.6
I3
0.45
29.4
17.8
52.8
0.55
29.0
19.6
51.4
0.65
30.4
25.2
44.4
0.75
29.8
24.0
46.2
0.85
29.0
23.8
47.2
0.95
34.2
25.4
40.4
Table 8: Performance measures in the out-of-sample period average and standard deviation of some
statistics (%).
raw
apeL
2.78
CL
(0.04)
3.02
ML
(0.05)
4.05
SSR
(0.07)
I1
seasonal adjusted
apeL
me1
rmse1
2.76
-0.60
7.12
(0.04)
(3.15)
(8.76)
3.01
0.22
8.77
(0.05)
(4.19) (11.82)
4.04
-0.37
7.62
(0.07)
(3.44)
(9.03)
raw
apeL
4.07
CL
(0.07)
3.65
ML
(0.06)
4.57
SSR
(0.09)
I1
seasonal adjusted
apeL
me1
rmse1
4.04
-0.20
6.02
(0.07)
(2.80)
(7.24)
3.62
0.25
6.65
(0.06)
(4.54) (11.14)
4.53
-0.17
5.97
(0.08)
(3.33)
(7.02)
raw
apeL
3.95
CL
(0.07)
3.93
ML
(0.07)
3.89
SSR
(0.07)
I1
seasonal adjusted
apeL
me1
rmse1
3.93
-0.11
3.11
(0.07)
(2.54)
(4.05)
3.91
-0.11
3.11
(0.07)
(2.53)
(4.06)
3.84
-0.18
3.19
(0.07)
(2.57)
(4.12)
raw
apeL
4.06
(0.08)
3.85
(0.08)
5.34
(0.10)
ρ = 0.1
I2
seasonal adjusted
apeL
me1
rmse1
4.07
-1.34
10.77
(0.07)
(5.86)
(16.24)
3.84
-1.32
14.17
(0.08)
(9.87)
(25.83)
5.26
-1.97
11.66
(0.10)
(8.67)
(21.22)
raw
apeL
0.45
(0.00)
0.47
(0.01)
0.65
(0.01)
I3
seasonal adjusted
apeL
me1
rmse1
0.32
0.01
0.91
(0.00)
(0.25)
(0.41)
0.34
0.01
1.08
(0.00)
(0.25)
(0.59)
0.47
0.01
0.93
(0.01)
(0.26)
(0.43)
raw
apeL
6.26
(0.16)
6.05
(0.15)
7.30
(0.17)
ρ = 0.5
I2
seasonal adjusted
apeL
me1
rmse1
6.20
-1.27
9.29
(0.16)
(8.82)
(20.40
5.96
-0.61
10.91
(0.15)
(9.58)
(23.51
7.19
-1.34
9.30
(0.17)
(8.01)
(18.68
raw
apeL
0.60
(0.01)
0.62
(0.01)
0.81
(0.01)
I3
seasonal adjusted
apeL
me1
rmse1
0.44
0.00
0.70
(0.01)
(0.25)
(0.32)
0.45
0.01
0.91
(0.01)
(0.26)
(0.65)
0.58
0.00
0.74
(0.01)
(0.27)
(0.34)
raw
apeL
5.67
(0.11)
5.63
(0.11)
5.95
(0.12)
ρ = 0.9
I2
seasonal adjusted
apeL
me1
rmse1
5.64
-0.16
4.37
(0.11)
(3.45)
(6.23)
5.59
-0.17
4.36
(0.11)
(3.44)
(6.24)
5.84
-0.48
4.53
(0.12)
(4.20)
(6.71)
raw
apeL
0.60
(0.01)
0.59
(0.01)
0.65
(0.01)
I3
seasonal adjusted
apeL
me1
rmse1
0.44
0.01
0.37
(0.01)
(0.18)
(0.20)
0.43
0.01
0.37
(0.01)
(0.17)
(0.19)
0.46
0.00
0.38
(0.01)
(0.19)
(0.20)
143
Table 9: Percentages of admissible solutions.
by indicator model
144
CL ssr
CL ml
FER
LIT ssr
LIT ml
LIT nsc
ADL(1,0) ssr
ADL(1,0) ml
CL ssr - CL ml
CL ssr - FER
CL ssr - LIT ssr
CL ssr - LIT ml
CL ssr - LIT nsc
CL ssr - ADL(1,0) ssr
CL ssr - ADL(1,0) ml
CL ml - FER
CL ml - LIT ssr
CL ml - LIT ml
CL ml - LIT nsc
CL ml - ADL(1,0) ssr
CL ml - ADL(1,0) ml
FER - LIT ssr
FER - LIT ml
FER - LIT nsc
FER - ADL(1,0) ssr
FER - ADL(1,0) ml
LIT ssr - LIT ml
LIT ssr - LIT nsc
LIT ssr - ADL(1,0) ssr
LIT ssr - ADL(1,0) ml
LIT ml - LIT nsc
LIT ml - ADL(1,0) ssr
LIT ml - ADL(1,0) ml
LIT nsc - ADL(1,0) ssr
LIT nsc - ADL(1,0) ml
ADL ssr - ADL(1,0) ml
by disturbance model
I1
I2
I3
I1sa
I2sa
I3sa
I4
C1
C5
84.7
97.7
100.0
97.5
55.1
48.9
87.7
78.8
82.4
84.7
82.7
41.4
35.6
77.0
65.7
97.7
95.4
55.1
48.9
85.4
78.3
97.5
55.1
48.9
87.7
78.8
53.6
47.4
85.9
76.9
48.6
43.8
51.9
38.0
46.0
67.5
86.1
97.2
100.0
97.6
54.1
48.3
87.5
78.0
83.4
86.1
84.3
41.8
36.3
77.8
66.2
97.2
94.9
54.1
48.3
84.7
77.4
97.6
54.1
48.3
87.5
78.0
52.6
46.8
85.7
76.1
48.2
42.6
50.8
37.2
45.8
66.6
86.3
99.4
100.0
99.3
55.7
50.1
96.4
77.7
85.7
86.3
86.1
42.6
37.1
83.8
68.1
99.4
98.7
55.7
50.1
95.8
77.4
99.3
55.7
50.1
96.4
77.7
55.0
49.4
95.7
77.2
50.0
52.7
45.6
47.2
41.2
74.6
84.7
97.8
100.0
97.8
55.1
48.9
87.8
78.9
82.6
84.7
82.9
41.5
35.6
77.1
65.9
97.8
95.7
55.1
48.9
85.6
78.4
97.8
55.1
48.9
87.8
78.9
53.6
47.5
86.3
77.0
48.7
43.8
51.6
38.0
45.8
67.8
86.2
97.2
100.0
97.7
54.1
48.3
87.5
78.2
83.4
86.2
84.4
41.8
36.3
77.9
66.4
97.2
94.9
54.0
48.3
84.7
77.6
97.7
54.1
48.3
87.5
78.2
52.5
46.8
85.8
76.3
48.2
42.6
50.7
37.2
45.8
66.8
86.3
100.0
100.0
99.3
55.7
50.0
98.7
80.5
86.2
86.3
86.1
42.6
37.1
85.3
71.7
100.0
99.3
55.7
50.0
98.7
80.5
99.3
55.7
50.0
98.7
80.5
55.1
49.4
98.1
80.1
49.9
54.8
44.0
49.2
39.6
79.7
86.3
99.7
100.0
99.4
54.9
48.7
95.2
80.4
86.0
86.3
86.1
41.8
35.9
82.8
70.8
99.7
99.1
54.9
48.7
94.9
80.3
99.4
54.9
48.7
95.2
80.4
54.3
48.1
94.7
80.0
48.7
50.9
45.2
44.9
40.3
77.1
98.3
96.6
100.0
99.5
1.8
1.0
99.8
66.7
95.0
98.3
97.8
1.4
0.5
98.1
66.0
96.6
96.1
1.7
0.9
96.4
66.1
99.5
1.8
1.0
99.8
66.7
1.8
1.0
99.2
66.6
0.8
1.8
1.5
1.0
0.6
66.6
98.4
92.7
100.0
98.3
4.0
2.1
99.4
60.6
91.2
98.4
96.7
3.6
1.8
97.8
60.0
92.7
91.5
3.9
2.1
92.2
58.7
98.3
4.0
2.1
99.4
60.6
3.9
2.1
97.8
60.1
1.9
4.0
3.4
2.1
1.8
60.4
C9
93.0
99.7
100.0
98.8
47.0
32.0
96.4
81.6
92.6
93.0
91.8
43.0
28.3
89.8
77.0
99.7
98.4
47.0
32.0
96.0
81.6
98.8
47.0
32.0
96.4
81.6
46.4
31.6
95.3
80.5
31.7
44.8
41.5
30.4
28.6
78.8
F
L1
L5
L9
92.0
100.0
100.0
99.4
70.6
61.4
91.0
84.1
92.0
92.0
91.5
63.8
55.1
84.6
78.0
100.0
99.4
70.6
61.4
90.9
84.1
99.4
70.6
61.4
91.0
84.1
70.1
61.0
90.5
83.6
61.2
62.7
62.5
54.3
54.9
76.5
91.8
100.0
100.0
99.1
71.2
61.8
90.0
85.0
91.8
91.8
90.9
63.9
55.5
83.7
78.6
100.0
99.0
71.2
61.8
90.0
85.0
99.1
71.2
61.8
90.0
85.0
70.6
61.2
89.2
84.2
61.8
62.8
63.6
53.9
55.6
76.4
86.2
100.0
100.0
98.8
90.3
85.0
86.6
87.5
86.2
86.2
85.3
76.9
71.9
75.8
76.5
100.0
98.8
90.3
85.0
86.6
87.5
98.8
90.3
85.0
86.6
87.5
89.3
84.0
85.8
86.4
85.0
77.6
80.4
72.4
76.1
75.5
40.9
100.0
100.0
94.7
99.9
99.9
77.7
87.0
40.9
40.9
38.7
40.9
40.9
32.0
38.7
100.0
94.7
99.9
99.9
77.7
87.0
94.7
99.9
99.9
77.7
87.0
94.6
94.6
74.5
82.3
99.9
77.6
86.9
77.6
86.9
65.9
Total
85.8
98.4
100.0
98.4
55.0
49.0
91.5
78.9
84.2
85.8
84.7
41.9
36.3
80.3
67.8
98.4
96.9
54.9
49.0
90.0
78.6
98.4
55.0
49.0
91.5
78.9
53.8
47.9
90.3
77.7
48.9
47.3
48.6
41.7
43.5
71.4
Table 10: Average bias and standard error of the estimated coefficients.
CL ssr
s.e.
CL ml
s.e.
FER
s.e.
LIT ssr
s.e.
LIT ml
s.e.
LIT ml nsc
s.e.
C1
-0.002
0.286
0.000
0.161
-
C5
0.001
0.342
-0.002
0.229
-
CL ssr
s.e.
CL ml
s.e.
LIT ssr
s.e.
LIT ml
s.e.
LIT ml nsc
s.e.
C1
0.808
0.809
-0.602
0.839
-
C5
0.427
0.428
-0.562
0.906
-
β
C9
F
L1
0.002
0.365
0.007
0.321
- -0.004
0.378
0.044
2.342
- -0.013
0.436
- -0.012
0.412
ρ (for CL) and φ (for LIT)
C9
F
L1
0.073
0.075
-0.089
0.183
0.816
0.817
0.224
0.450
0.158
0.450
145
L5
-0.001
2.029
-0.004
0.453
-0.004
0.431
L9
-0.012
0.854
-0.003
0.505
-0.005
0.485
L5
0.431
0.432
-0.015
0.262
-0.096
0.360
L9
0.074
0.076
-0.040
0.090
-0.047
0.103
Table 11: In-sample period: RMSPEL for different DGPs for indicators.
146
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
6.3
5.2
8.5
9.6
6.4
5.3
7.7
6.4
3.4
2.7
3.2
2.5
7.5
5.0
7.1
7.0
I2
6.7
5.6
8.7
9.8
6.7
5.6
7.3
6.0
3.5
2.9
3.2
2.6
7.5
5.5
6.9
7.1
I3
4.8
4.1
6.0
6.7
4.8
4.2
5.2
4.8
2.9
2.7
2.7
2.5
10.0
3.2
8.8
5.3
I1sa
6.1
5.3
8.3
9.6
6.1
5.3
6.3
5.4
3.2
2.6
2.9
2.4
6.6
5.2
6.2
6.8
I2sa
6.6
5.6
8.6
9.8
6.6
5.6
6.7
5.7
3.4
2.9
3.1
2.6
7.1
5.7
6.5
7.0
I3sa
2.2
1.9
3.3
4.6
2.2
1.9
2.3
2.0
1.2
1.0
1.1
0.9
3.7
1.6
3.5
3.7
I4
3.4
2.9
4.9
6.4
3.4
2.9
3.5
3.0
1.8
1.4
1.6
1.3
5.0
2.3
4.8
4.4
CL ml
s.d.
CL ssr
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
7.6
9.3
6.1
5.1
5.6
5.0
7.0
6.2
3.2
2.7
2.9
2.6
7.0
4.8
6.5
6.6
I2
7.9
9.5
6.3
5.3
5.8
5.3
6.5
5.7
3.2
2.8
2.9
2.6
6.8
5.2
6.2
6.6
I3
5.4
6.5
4.7
4.1
4.3
4.1
4.6
4.6
2.4
2.6
2.2
2.4
9.9
3.3
8.8
5.2
I1sa
7.4
9.3
5.9
5.1
5.4
5.0
5.6
5.1
2.8
2.6
2.6
2.4
6.0
5.0
5.5
6.4
I2sa
7.8
9.5
6.3
5.4
5.7
5.3
5.9
5.3
3.0
2.8
2.8
2.6
6.4
5.3
5.7
6.6
I3sa
2.9
4.4
2.2
1.9
2.0
1.9
2.0
2.0
1.0
0.9
0.9
0.9
3.6
1.5
3.4
3.5
I4
4.3
6.1
3.3
2.9
3.0
2.8
3.1
2.9
1.5
1.4
1.3
1.3
4.8
2.2
4.6
4.2
FER
s.d.
CL ssr
s.d.
CL ml
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
5.9
5.2
6.3
5.2
7.6
9.3
7.2
6.4
3.2
2.7
2.9
2.6
7.2
5.0
6.6
6.6
I2
6.1
5.5
6.7
5.6
7.9
9.5
6.8
6.0
3.2
2.9
2.9
2.7
7.1
5.4
6.3
6.7
I3
4.3
4.1
4.8
4.1
5.4
6.5
4.7
4.7
2.4
2.6
2.2
2.4
9.9
3.3
8.8
5.2
I1sa
5.6
5.2
6.1
5.3
7.4
9.3
5.8
5.3
2.8
2.6
2.6
2.4
6.2
5.2
5.6
6.5
I2sa
6.0
5.6
6.6
5.6
7.8
9.5
6.2
5.6
3.1
2.8
2.8
2.6
6.7
5.6
5.9
6.7
I3sa
2.0
1.9
2.2
1.9
2.9
4.4
2.0
2.0
1.0
0.9
0.9
0.9
3.6
1.5
3.4
3.5
I4
3.0
2.8
3.4
2.9
4.3
6.1
3.1
3.0
1.5
1.4
1.3
1.3
4.8
2.3
4.6
4.2
LIT ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
7.2
6.4
6.3
5.2
7.6
9.2
5.8
5.2
3.2
2.7
2.9
2.6
7.2
4.9
6.6
6.6
I2
6.8
6.0
6.7
5.6
7.9
9.5
6.1
5.5
3.2
2.9
2.9
2.7
7.1
5.4
6.4
6.7
I3
4.7
4.7
4.8
4.1
5.4
6.5
4.3
4.1
2.5
2.6
2.3
2.4
9.9
3.3
8.8
5.2
I1sa
5.8
5.3
6.1
5.2
7.4
9.3
5.6
5.2
2.8
2.6
2.6
2.4
6.2
5.2
5.6
6.5
I2sa
6.2
5.6
6.6
5.6
7.8
9.5
6.0
5.5
3.1
2.8
2.8
2.6
6.7
5.6
5.9
6.8
I3sa
2.0
2.0
2.2
1.9
2.9
4.4
2.0
1.9
1.0
0.9
0.9
0.9
3.6
1.5
3.4
3.5
I4
3.1
3.0
3.4
2.9
4.3
6.1
3.0
2.8
1.5
1.4
1.4
1.3
4.8
2.3
4.6
4.2
continued on next page
continued from previous page
147
LIT ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
3.2
2.7
3.4
2.6
3.1
2.9
3.1
2.7
4.1
3.9
2.9
2.6
4.6
2.8
4.2
2.8
I2
3.2
2.9
3.5
2.9
3.2
3.0
3.2
2.9
3.7
3.4
2.9
2.6
4.1
2.9
3.8
2.9
I3
2.4
2.6
2.9
2.7
2.5
2.5
2.5
2.5
2.6
2.8
2.2
2.4
9.4
3.2
8.3
3.8
I1sa
2.8
2.6
3.2
2.6
2.9
2.8
2.8
2.6
2.9
2.7
2.6
2.4
3.3
2.5
3.1
2.6
I2sa
3.1
2.8
3.4
2.9
3.1
3.0
3.1
2.8
3.2
2.9
2.8
2.6
3.5
2.9
3.2
3.0
I3sa
1.0
0.9
1.2
1.0
1.0
0.9
1.0
0.9
1.0
1.0
0.9
0.9
3.1
0.9
2.5
1.8
I4
1.5
1.4
1.8
1.4
1.5
1.4
1.5
1.4
1.5
1.4
1.3
1.3
3.8
1.1
3.3
1.9
LIT nsc
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
2.9
2.6
3.1
2.4
2.9
2.8
2.9
2.6
3.8
3.7
2.9
2.6
4.3
2.7
4.0
2.6
I2
2.9
2.7
3.2
2.6
2.9
2.8
2.9
2.7
3.4
3.2
2.9
2.6
3.8
2.7
3.5
2.6
I3
2.2
2.4
2.7
2.4
2.3
2.4
2.3
2.3
2.4
2.5
2.2
2.3
9.4
3.2
8.3
3.8
I1sa
2.6
2.4
2.9
2.4
2.6
2.8
2.6
2.4
2.7
2.5
2.6
2.4
3.1
2.4
2.8
2.3
I2sa
2.8
2.6
3.1
2.6
2.8
2.8
2.8
2.6
2.9
2.7
2.8
2.6
3.2
2.7
3.0
2.7
I3sa
0.9
0.9
1.1
0.9
0.9
0.9
0.9
0.8
0.9
0.9
0.9
0.9
3.1
0.8
2.5
1.7
I4
1.3
1.3
1.6
1.3
1.4
1.2
1.4
1.2
1.4
1.3
1.4
1.3
3.7
1.1
3.2
1.9
ADL(1,0) ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ml
s.d.
I1
7.2
5.0
6.7
5.3
8.4
9.6
6.4
5.3
7.8
6.6
3.4
2.9
3.2
2.8
7.1
7.0
I2
7.1
5.4
7.1
5.7
8.7
9.9
6.7
5.7
7.3
6.1
3.5
3.0
3.2
2.8
6.9
7.1
I3
9.9
3.3
4.8
4.2
5.4
6.6
4.4
4.2
4.7
4.7
2.4
2.5
2.2
2.3
8.7
5.3
I1sa
6.2
5.2
6.5
5.3
8.2
9.6
6.1
5.3
6.3
5.4
3.1
2.7
2.8
2.5
6.1
6.8
I2sa
6.7
5.6
7.0
5.7
8.6
9.9
6.6
5.7
6.7
5.8
3.3
3.0
3.0
2.8
6.5
7.0
I3sa
3.6
1.5
2.2
1.9
2.9
4.4
2.0
1.9
2.0
2.0
1.0
0.9
0.9
0.9
3.4
3.5
I4
4.8
2.3
3.4
2.9
4.4
6.2
3.1
2.9
3.2
3.0
1.5
1.4
1.3
1.2
4.6
4.3
ADL(1,0) ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
I1
6.6
6.6
5.3
4.5
5.8
7.1
4.9
4.5
6.1
5.7
3.2
2.7
2.9
2.5
6.2
4.3
I2
6.3
6.7
5.5
4.8
6.0
7.4
5.0
4.7
5.6
5.1
3.2
2.9
2.9
2.6
5.9
4.6
I3
8.8
5.2
4.5
3.8
5.2
6.0
4.1
3.8
4.4
4.3
2.6
2.6
2.4
2.4
9.7
3.2
I1sa
5.6
6.5
5.1
4.5
5.7
7.2
4.6
4.4
4.8
4.6
2.8
2.5
2.6
2.4
5.2
4.5
I2sa
5.9
6.7
5.4
4.8
6.0
7.4
4.9
4.7
5.0
4.7
3.1
2.8
2.8
2.6
5.5
4.8
I3sa
3.4
3.5
2.2
1.9
3.0
4.4
2.0
1.9
2.0
2.0
1.0
1.0
0.9
0.9
3.5
1.5
I4
4.6
4.2
3.2
2.7
4.2
5.8
2.9
2.7
3.0
2.8
1.5
1.4
1.4
1.3
4.6
2.1
Table 12: Out-of-sample period: Absolute annual percentage error for different
DGPs for indicators.
148
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
9.9
12.3
9.1
11.8
10.3
12.7
14.7
17.9
8.8
11.6
8.5
11.5
10.5
12.3
9.5
11.9
I2
10.9
14.0
10.2
13.9
11.3
14.3
16.0
19.6
9.5
12.8
9.0
12.7
11.6
14.0
10.3
13.3
I3
4.8
5.0
4.2
4.4
5.0
5.3
7.5
8.3
4.0
4.3
3.9
4.1
5.8
5.4
5.1
4.9
I1sa
9.8
12.3
9.1
11.8
10.2
12.6
14.7
18.0
8.7
11.6
8.4
11.5
10.4
12.3
9.5
11.9
I2sa
10.9
14.0
10.2
13.9
11.3
14.3
16.0
19.6
9.4
12.8
9.0
12.7
11.5
14.0
10.3
13.3
I3sa
2.9
3.1
2.5
2.6
3.0
3.3
4.5
5.1
2.4
2.7
2.3
2.5
4.0
3.7
3.3
3.2
I4
4.2
4.6
3.7
4.0
4.5
4.9
6.8
7.7
3.5
3.8
3.4
3.7
5.3
5.0
4.5
4.4
CL ml
s.d.
CL ssr
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
8.7
11.4
9.7
12.1
9.4
11.9
13.2
16.7
7.9
10.7
7.5
10.6
9.8
11.8
9.0
11.3
I2
9.8
13.6
10.6
13.7
10.5
13.6
14.5
18.1
8.7
12.3
8.3
12.2
11.0
13.5
10.0
13.1
I3
4.0
4.3
4.7
5.0
4.8
5.1
6.9
7.9
3.7
4.2
3.5
4.0
5.7
5.3
5.1
4.8
I1sa
8.6
11.3
9.6
12.1
9.3
11.9
13.2
16.8
7.8
10.7
7.4
10.5
9.7
11.7
9.0
11.3
I2sa
9.8
13.6
10.6
13.7
10.4
13.6
14.5
18.2
8.7
12.3
8.3
12.2
10.9
13.5
10.0
13.1
I3sa
2.4
2.6
2.9
3.1
2.9
3.2
4.2
5.0
2.2
2.5
2.1
2.4
4.0
3.7
3.3
3.2
I4
3.5
3.9
4.2
4.6
4.3
4.7
6.3
7.4
3.3
3.7
3.1
3.6
5.3
5.0
4.5
4.4
FER
s.d.
CL ssr
s.d.
CL ml
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
9.6
12.1
9.9
12.3
8.7
11.4
13.6
17.2
7.9
10.7
7.5
10.6
10.1
12.0
9.1
11.4
I2
10.8
14.0
10.9
14.0
9.8
13.6
15.1
19.0
8.8
12.3
8.3
12.2
11.2
13.8
10.1
13.1
I3
4.8
5.2
4.8
5.0
4.0
4.3
6.9
8.0
3.7
4.2
3.5
4.0
5.7
5.3
5.1
4.8
I1sa
9.6
12.1
9.8
12.3
8.6
11.3
13.6
17.2
7.8
10.7
7.4
10.5
10.0
11.9
9.0
11.3
I2sa
10.8
14.0
10.9
14.0
9.8
13.6
15.1
19.0
8.7
12.3
8.3
12.1
11.2
13.8
10.0
13.1
I3sa
2.9
3.2
2.9
3.1
2.4
2.6
4.2
5.0
2.2
2.5
2.1
2.4
4.0
3.7
3.3
3.2
I4
4.3
4.8
4.2
4.6
3.5
3.9
6.3
7.5
3.3
3.7
3.1
3.6
5.3
5.0
4.5
4.4
LIT ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
13.6
17.2
9.7
12.0
8.6
11.3
9.4
11.8
7.8
10.5
7.4
10.3
9.8
11.7
9.0
11.3
I2
15.1
19.0
10.8
13.8
9.7
13.6
10.7
13.8
8.7
12.2
8.2
12.1
11.1
13.6
10.0
13.1
I3
6.9
8.0
4.8
5.0
4.0
4.3
4.8
5.2
3.7
4.2
3.6
4.0
5.7
5.3
5.1
4.8
I1sa
13.6
17.2
9.6
12.0
8.5
11.2
9.3
11.8
7.7
10.5
7.3
10.4
9.8
11.7
8.9
11.3
I2sa
15.1
19.0
10.8
13.8
9.7
13.6
10.7
13.8
8.7
12.2
8.3
12.2
11.1
13.6
10.0
13.1
I3sa
4.2
5.0
2.9
3.1
2.4
2.6
2.9
3.2
2.2
2.6
2.1
2.4
4.0
3.7
3.3
3.2
I4
6.3
7.5
4.2
4.6
3.6
3.9
4.3
4.8
3.3
3.7
3.1
3.6
5.3
5.0
4.5
4.4
continued on next page
continued from previous page
149
LIT ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
7.9
10.7
8.9
11.8
8.4
11.8
8.1
10.9
10.1
13.5
7.5
10.6
9.1
11.3
8.8
11.5
I2
8.8
12.3
9.6
13.1
9.5
13.8
9.0
12.6
11.0
14.9
8.3
12.2
10.1
12.9
10.0
13.2
I3
3.7
4.2
3.9
4.2
3.5
3.8
3.8
4.1
4.9
5.8
3.5
4.0
5.4
4.8
5.2
4.6
I1sa
7.8
10.7
8.8
11.7
8.3
11.8
8.0
10.9
10.0
13.6
7.4
10.6
9.0
11.2
8.8
11.5
I2sa
8.7
12.3
9.6
13.1
9.5
13.8
9.0
12.7
11.0
14.9
8.3
12.2
10.1
12.9
9.9
13.2
I3sa
2.2
2.5
2.3
2.6
2.1
2.3
2.3
2.5
2.9
3.5
2.1
2.4
4.0
3.6
3.5
3.2
I4
3.3
3.7
3.4
3.7
3.0
3.3
3.3
3.6
4.3
5.2
3.1
3.6
5.0
4.5
4.7
4.3
LIT nsc
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
I1
7.5
10.6
8.6
11.8
8.1
11.6
7.8
10.8
9.5
12.9
7.5
10.6
8.7
11.1
8.6
11.4
I2
8.3
12.2
9.2
13.1
9.1
13.6
8.7
12.5
10.4
14.5
8.3
12.1
9.7
12.8
9.6
12.9
I3
3.5
4.0
3.7
4.0
3.3
3.6
3.6
3.9
4.6
5.4
3.6
4.0
5.3
4.7
5.1
4.6
I1sa
7.4
10.5
8.5
11.8
8.0
11.6
7.7
10.8
9.4
13.0
7.5
10.6
8.7
11.1
8.6
11.3
I2sa
8.3
12.1
9.2
13.1
9.1
13.6
8.6
12.5
10.4
14.6
8.3
12.1
9.7
12.8
9.6
12.9
I3sa
2.1
2.4
2.2
2.5
2.0
2.2
2.2
2.4
2.7
3.3
2.1
2.5
4.0
3.6
3.5
3.2
I4
3.1
3.6
3.2
3.6
2.9
3.2
3.1
3.5
4.0
5.0
3.1
3.6
4.9
4.4
4.6
4.2
ADL(1,0) ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ml
s.d.
I1
10.1
12.0
10.0
12.3
8.8
11.4
10.0
12.3
14.3
17.5
8.2
11.0
7.8
10.8
9.2
11.3
I2
11.2
13.8
11.2
14.2
10.1
13.9
11.2
14.4
15.8
19.3
9.2
13.0
8.7
12.9
10.3
13.3
I3
5.7
5.3
4.8
5.0
4.0
4.3
4.8
5.1
6.9
8.0
3.7
4.1
3.4
3.9
5.0
4.8
I1sa
10.0
11.9
10.0
12.3
8.8
11.4
9.9
12.2
14.3
17.5
8.1
10.9
7.7
10.8
9.2
11.3
I2sa
11.2
13.8
11.2
14.2
10.0
13.9
11.2
14.4
15.8
19.4
9.2
13.0
8.7
12.9
10.3
13.3
I3sa
4.0
3.7
2.9
3.1
2.4
2.6
2.9
3.2
4.2
5.0
2.2
2.5
2.1
2.4
3.3
3.1
I4
5.3
5.0
4.2
4.6
3.5
3.9
4.3
4.8
6.3
7.5
3.2
3.6
3.0
3.5
4.5
4.3
ADL(1,0) ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
I1
9.1
11.4
9.4
11.8
8.6
11.2
9.0
11.5
12.4
16.2
7.9
10.5
7.5
10.4
9.6
11.5
I2
10.1
13.1
10.3
13.3
9.7
13.3
10.0
13.2
13.5
17.2
8.7
12.2
8.3
12.0
10.7
13.2
I3
5.1
4.8
4.6
4.8
4.0
4.2
4.6
4.9
6.6
7.4
3.9
4.3
3.7
4.1
5.6
5.2
I1sa
9.0
11.3
9.4
11.9
8.6
11.2
9.0
11.5
12.4
16.2
7.8
10.4
7.4
10.4
9.6
11.5
I2sa
10.0
13.1
10.2
13.3
9.7
13.3
10.0
13.2
13.6
17.3
8.7
12.2
8.3
12.0
10.7
13.2
I3sa
3.3
3.2
2.7
3.0
2.3
2.5
2.8
3.1
4.1
4.8
2.2
2.5
2.1
2.4
3.8
3.6
I4
4.5
4.4
4.0
4.3
3.4
3.8
4.1
4.5
6.1
7.0
3.3
3.7
3.2
3.6
5.1
4.8
Table 13: In-sample period: RMSPEL for different DGPs for disturbances.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
150
FER
s.d.
CL ssr
s.d.
CL ml
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
LIT ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
C1
10.4
6.0
16.2
10.9
10.4
6.0
10.9
10.1
6.0
11.0
6.8
11.0
5.7
12.1
9.2
C5
8.5
5.0
13.7
10.6
8.5
5.0
9.1
7.7
4.6
8.2
4.7
9.3
4.6
10.5
8.7
C9
4.8
3.1
4.9
3.5
4.8
3.1
5.2
4.6
3.0
4.4
2.9
6.2
3.1
5.9
3.4
F
3.3
2.5
3.4
2.7
3.3
2.5
3.7
3.2
2.3
3.1
2.2
5.0
2.9
4.6
3.1
L1
3.0
2.3
3.1
2.4
3.1
2.3
3.3
2.9
2.2
2.8
2.1
4.8
2.9
4.4
2.9
L5
2.1
1.7
2.1
1.7
2.1
1.7
2.3
2.0
1.7
2.0
1.6
4.2
3.0
3.6
2.8
L9
0.8
0.8
0.8
0.8
0.8
0.8
0.9
0.8
0.8
0.8
0.8
2.8
2.9
2.3
2.9
C1
10.4
6.0
10.4
6.0
16.2
10.9
11.0
6.5
10.9
6.1
12.3
6.2
11.0
5.7
12.2
9.2
C1
10.9
6.1
10.1
6.0
13.3
10.0
10.9
6.2
11.6
6.9
11.9
6.5
11.5
C5
8.6
5.0
8.5
5.0
13.8
10.6
9.1
5.6
8.0
4.7
8.8
4.9
9.3
4.6
10.5
8.6
C5
8.0
4.7
7.7
4.5
8.2
5.7
8.0
4.6
8.5
5.2
8.6
4.9
8.9
C9
4.8
3.1
4.8
3.1
4.9
3.6
5.2
3.7
4.6
3.0
4.4
2.9
6.2
3.1
5.9
3.4
C9
4.6
3.0
4.5
3.0
4.5
3.0
4.6
3.0
5.0
3.6
4.4
2.9
6.1
F
3.3
2.5
3.3
2.5
3.4
2.7
3.7
3.0
3.1
2.3
3.1
2.3
5.0
2.9
4.6
3.0
F
3.1
2.3
3.1
2.3
3.1
2.3
3.1
2.3
3.4
2.7
3.1
2.3
4.9
L1
3.0
2.2
3.0
2.3
3.1
2.3
3.4
2.7
2.9
2.1
2.9
2.1
4.8
2.9
4.4
2.9
L1
2.9
2.1
2.9
2.1
2.9
2.1
2.9
2.1
3.2
2.6
2.9
2.1
4.8
L5
2.0
1.7
2.1
1.7
2.0
1.7
2.3
2.2
2.0
1.6
2.0
1.6
4.2
3.0
3.6
2.8
L5
2.0
1.6
2.0
1.7
2.0
1.6
2.0
1.6
2.3
2.1
2.0
1.6
4.2
L9
0.7
0.8
0.8
0.8
0.7
0.8
0.7
0.9
0.7
0.8
0.6
0.7
3.5
3.3
2.8
3.2
L9
0.7
0.8
0.8
0.8
0.7
0.8
0.7
0.8
0.7
0.9
0.6
0.7
3.5
CL ml
s.d.
CL ssr
s.d.
FER
s.d.
LIT ssr
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
LIT ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
LIT nsc
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
ADL(1,0) ssr
C1
16.2
10.9
10.1
5.8
10.1
5.8
10.7
10.4
5.8
11.8
5.8
10.8
5.5
12.1
9.2
C5
13.8
10.6
8.0
4.8
8.1
4.8
8.6
7.9
4.6
8.7
4.8
8.9
4.5
10.3
8.6
C9
4.9
3.6
4.8
3.1
4.8
3.1
5.2
4.6
3.0
4.4
2.9
6.2
3.1
5.9
3.4
F
3.4
2.7
3.3
2.4
3.3
2.4
3.7
3.1
2.3
3.1
2.3
5.0
2.9
4.6
3.0
L1
3.1
2.3
3.0
2.3
3.0
2.2
3.4
2.9
2.1
2.9
2.1
4.8
2.9
4.4
2.9
L5
2.0
1.7
2.1
1.7
2.0
1.7
2.3
2.0
1.6
2.0
1.6
4.2
3.0
3.6
2.8
L9
0.7
0.8
0.8
0.8
0.7
0.8
0.7
0.7
0.8
0.6
0.7
3.5
3.3
2.8
3.2
C1
11.0
6.5
10.3
6.0
16.2
10.9
10.4
6.0
10.8
6.1
12.2
6.2
11.0
5.7
12.1
9.2
C1
12.3
6.2
10.9
6.8
16.5
10.4
12.3
6.3
13.9
7.7
12.0
6.5
12.6
C5
9.1
5.6
8.4
4.9
13.6
10.5
8.5
5.0
8.0
4.7
8.8
4.9
9.3
4.6
10.4
8.5
C5
8.8
4.9
8.1
4.7
9.9
7.5
8.7
4.9
9.6
5.8
8.6
4.9
9.5
C9
5.2
3.7
4.7
3.1
4.8
3.5
4.7
3.1
4.5
3.0
4.4
2.9
6.2
3.1
5.8
3.4
C9
4.4
2.9
4.4
2.8
4.4
3.0
4.4
2.9
4.8
3.4
4.4
2.9
6.0
F
3.7
3.0
3.3
2.4
3.4
2.6
3.3
2.4
3.1
2.3
3.1
2.2
5.0
2.9
4.6
3.0
F
3.1
2.3
3.1
2.2
3.0
2.2
3.0
2.2
3.3
2.7
3.1
2.2
4.9
L1
3.4
2.7
3.0
2.3
3.0
2.3
3.0
2.2
2.9
2.1
2.8
2.1
4.8
2.9
4.3
2.9
L1
2.9
2.1
2.8
2.1
2.8
2.1
2.8
2.1
3.1
2.5
2.9
2.1
4.7
L5
2.3
2.2
2.0
1.7
2.0
1.7
2.0
1.7
2.0
1.6
1.9
1.6
4.2
3.0
3.6
2.8
L5
2.0
1.6
1.9
1.6
1.9
1.6
1.9
1.6
2.2
2.1
1.9
1.6
4.2
L9
0.7
0.9
0.8
0.8
0.7
0.8
0.7
0.8
0.6
0.7
0.6
0.7
3.5
3.3
2.8
3.2
L9
0.6
0.7
0.8
0.8
0.7
0.8
0.7
0.8
0.7
0.9
0.7
0.8
3.5
continued on next page
continued from previous page
s.d.
ADL(1,0) ml
s.d.
5.8
12.9
8.9
4.4
8.6
5.0
3.1
5.8
3.3
2.9
4.5
2.9
2.8
4.3
2.8
3.0
3.6
2.8
3.3
2.8
3.2
s.d.
ADL(1,0) ml
s.d.
5.9
13.0
7.8
4.4
9.6
5.6
3.0
5.6
3.1
2.9
4.5
2.8
2.9
4.3
2.8
3.0
3.6
2.8
3.3
2.8
3.2
ADL(1,0) ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ml
s.d.
C1
11.0
5.7
10.4
6.0
16.3
10.9
10.4
6.0
11.0
6.5
10.9
6.2
12.3
6.2
12.2
9.2
C5
9.3
4.6
8.5
5.0
13.8
10.7
8.6
5.0
9.1
5.6
8.0
4.7
8.8
4.9
10.5
8.6
C9
6.2
3.1
4.8
3.1
4.9
3.6
4.8
3.1
5.2
3.8
4.6
3.0
4.4
2.9
5.9
3.4
F
5.0
2.9
3.3
2.5
3.4
2.7
3.3
2.5
3.7
3.1
3.1
2.3
3.0
2.3
4.6
3.0
L1
4.8
2.9
3.1
2.3
3.1
2.4
3.0
2.3
3.4
2.7
2.9
2.2
2.8
2.1
4.4
2.9
L5
4.2
3.0
2.0
1.8
2.0
1.7
2.0
1.7
2.3
2.3
1.9
1.7
1.9
1.6
3.7
2.8
L9
3.5
3.3
0.8
0.8
0.7
0.8
0.7
0.8
0.7
0.9
0.6
0.7
0.6
0.7
3.0
3.4
ADL(1,0) ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
C1
12.2
9.2
8.8
5.5
13.6
9.4
8.9
5.6
9.3
6.0
10.7
6.3
11.5
6.9
9.6
5.3
C5
10.5
8.6
7.3
4.8
12.1
9.6
7.3
4.8
7.8
5.4
7.8
4.7
8.7
5.0
8.3
4.5
C9
5.9
3.4
4.8
3.1
4.9
3.3
4.9
3.2
5.3
3.7
4.7
3.0
4.6
2.9
6.2
3.1
F
4.6
3.0
3.4
2.4
3.4
2.5
3.4
2.4
3.7
3.0
3.2
2.3
3.2
2.3
5.0
2.9
L1
4.4
2.9
3.1
2.2
3.1
2.3
3.1
2.2
3.4
2.6
3.0
2.2
3.0
2.1
4.7
2.8
L5
3.6
2.8
2.1
1.7
2.1
1.6
2.1
1.7
2.4
2.2
2.0
1.6
2.0
1.6
4.0
2.8
L9
2.8
3.2
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.9
0.7
0.8
0.7
0.8
3.2
3.1
151
Table 14: Out-of-sample period: Absolute annual percentage error for different
DGPs for disturbances.
152
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
C1
7.5
8.8
5.3
6.0
8.4
9.8
14.0
16.3
8.5
9.3
4.4
4.4
7.6
8.8
4.7
5.3
C5
10.2
12.9
7.4
9.6
11.0
13.6
17.1
20.1
11.2
17.7
14.8
20.9
10.2
12.6
7.0
9.7
C9
8.8
12.4
8.6
12.3
8.9
12.5
13.2
17.0
8.7
10.9
8.4
10.9
9.5
12.5
8.8
11.5
F
6.7
9.7
7.2
11.0
6.8
9.7
9.4
13.0
6.7
9.4
6.4
8.8
7.4
9.3
7.9
10.6
L1
6.5
9.1
6.6
9.6
6.6
9.2
8.8
12.1
6.1
8.7
5.9
8.7
7.6
9.3
7.4
9.1
L5
6.7
11.5
7.1
13.3
6.7
11.5
8.2
12.9
6.8
11.4
6.8
11.7
7.9
11.4
8.4
12.5
L9
5.1
8.4
5.4
9.1
5.1
8.3
4.3
6.7
4.1
6.3
4.2
6.4
6.7
9.0
6.9
9.7
CL ml
s.d.
CL ssr
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
C1
5.3
6.0
7.2
8.3
8.1
9.2
13.4
15.3
7.9
8.3
6.2
5.0
7.2
8.3
4.6
5.2
C5
7.5
9.7
9.5
12.1
10.2
12.7
15.9
18.7
11.7
18.0
14.3
20.9
9.5
11.8
6.8
9.5
C9
8.5
12.1
8.8
12.4
8.9
12.3
13.2
16.8
8.7
10.7
8.3
10.7
9.4
12.3
8.8
11.3
F
7.2
10.7
6.7
9.7
6.7
9.5
9.3
12.8
6.7
9.2
6.4
8.5
7.4
9.1
7.9
10.4
L1
6.7
9.8
6.5
9.1
6.7
9.3
9.0
12.2
6.2
8.9
6.0
8.9
7.7
9.4
7.5
9.5
L5
7.2
13.0
6.7
11.5
6.8
11.2
8.2
12.5
6.8
11.1
6.8
11.3
8.0
11.3
8.4
12.4
L9
4.3
7.5
5.1
8.4
4.2
7.0
3.3
5.7
3.2
5.5
3.2
5.5
5.6
7.2
5.9
8.2
FER
s.d.
CL ssr
s.d.
CL ml
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
C1
8.5
9.8
7.5
8.8
5.3
6.0
14.1
16.3
8.2
8.5
6.3
4.9
7.6
8.7
4.7
5.3
C5
11.0
13.6
10.2
12.9
7.5
9.7
17.1
20.1
11.7
17.9
14.4
20.8
10.2
12.6
7.1
9.8
C9
8.9
12.3
8.8
12.4
8.5
12.1
13.3
16.8
8.7
10.7
8.3
10.7
9.5
12.3
8.8
11.3
F
6.7
9.5
6.7
9.7
7.2
10.7
9.3
12.8
6.7
9.2
6.4
8.5
7.4
9.2
7.9
10.4
L1
6.7
9.3
6.5
9.1
6.7
9.8
9.0
12.2
6.2
8.9
6.0
8.9
7.7
9.4
7.5
9.5
L5
6.8
11.2
6.7
11.5
7.2
13.0
8.2
12.5
6.8
11.1
6.8
11.3
8.0
11.3
8.4
12.4
L9
4.2
7.0
5.1
8.4
4.3
7.5
3.3
5.7
3.2
5.5
3.2
5.5
5.6
7.2
5.9
8.2
LIT ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
C1
14.1
16.3
7.5
8.8
5.3
6.0
8.4
9.8
8.2
8.5
6.3
4.9
7.5
8.7
4.7
5.3
C5
17.1
20.1
9.9
12.6
7.4
9.6
10.6
13.1
11.7
18.0
14.6
21.1
9.9
12.2
7.0
9.8
C9
13.3
16.8
8.6
12.1
8.4
11.9
8.7
12.0
8.4
10.2
8.1
10.3
9.2
12.0
8.7
11.2
F
9.3
12.8
6.7
9.5
7.1
10.7
6.7
9.3
6.6
9.0
6.3
8.4
7.3
9.0
7.9
10.4
L1
9.0
12.2
6.4
9.0
6.5
9.7
6.6
9.2
6.2
8.9
5.9
8.8
7.6
9.3
7.4
9.4
L5
8.2
12.5
6.7
11.5
7.2
13.0
6.8
11.2
6.7
11.0
6.7
11.2
8.0
11.2
8.4
12.4
L9
3.3
5.7
5.0
8.1
4.1
7.1
4.0
6.8
3.1
5.3
3.2
5.4
5.5
7.1
5.8
7.9
continued on next page
continued from previous page
153
LIT ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
C1
8.2
8.5
7.9
8.9
6.0
6.5
8.2
8.3
14.2
15.1
5.7
4.7
7.9
8.5
6.3
7.4
C5
11.7
17.9
10.3
16.4
7.8
12.1
11.3
16.8
17.5
24.9
15.6
21.3
10.8
16.0
8.2
11.9
C9
8.7
10.7
8.5
10.9
8.2
11.1
8.5
10.7
11.9
13.5
8.4
10.7
9.0
10.6
8.8
11.1
F
6.7
9.2
6.5
9.2
6.7
9.8
6.5
9.0
8.9
12.2
6.4
8.6
7.1
8.3
7.5
9.9
L1
6.2
8.9
6.1
8.9
6.2
9.6
6.2
9.1
8.4
11.7
6.0
8.9
7.3
9.1
7.2
9.2
L5
6.8
11.1
6.7
11.6
7.2
13.0
6.8
11.3
8.0
12.2
6.8
11.3
8.0
11.3
8.5
12.4
L9
3.2
5.5
5.1
8.4
4.3
7.5
4.2
7.0
3.3
5.7
3.2
5.5
5.6
7.2
5.9
8.2
LIT nsc
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
ADL(1,0) ssr
s.d.
ADL(1,0) ml
s.d.
C1
6.3
4.9
3.9
4.2
5.0
4.9
6.6
5.5
15.7
17.3
5.7
4.7
5.8
4.7
4.6
4.1
C5
14.4
20.8
13.8
19.9
10.4
14.3
14.0
19.9
23.6
30.2
16.1
22.0
14.0
19.3
10.5
14.7
C9
8.3
10.7
8.2
10.9
7.7
10.6
8.1
10.6
11.1
13.0
8.5
10.8
8.6
10.4
8.4
10.6
F
6.4
8.5
6.2
8.7
6.3
8.9
6.2
8.4
8.7
11.5
6.4
8.6
6.8
7.7
7.3
9.1
L1
6.0
8.9
5.9
8.8
5.9
9.5
6.0
9.1
8.0
11.4
6.0
8.9
6.9
8.9
6.9
9.0
L5
6.8
11.3
6.7
11.8
7.2
13.2
6.7
11.4
8.0
12.3
6.8
11.2
8.0
11.5
8.4
12.5
L9
3.2
5.5
5.1
8.4
4.3
7.5
4.2
7.0
3.3
5.7
3.2
5.5
5.6
7.2
5.9
8.2
ADL(1,0) ssr
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ml
s.d.
C1
7.6
8.7
7.5
8.8
5.3
6.0
8.5
9.8
14.1
16.3
8.0
8.3
6.3
4.9
4.7
5.3
C5
10.2
12.6
10.1
12.6
7.5
9.7
10.9
13.3
16.8
19.3
11.7
17.9
14.5
20.9
7.0
9.7
C9
9.5
12.3
8.8
12.4
8.5
12.0
8.8
12.4
13.2
16.8
8.6
10.7
8.2
10.6
8.7
11.3
F
7.4
9.2
6.5
9.3
6.9
10.1
6.5
9.1
9.1
12.9
6.3
8.6
6.0
7.8
7.6
9.5
L1
7.7
9.4
6.5
9.2
6.6
10.0
6.7
9.4
8.9
12.2
6.2
9.0
5.8
8.9
7.5
9.5
L5
8.0
11.3
6.7
11.9
7.2
13.2
6.8
11.6
8.1
12.8
6.7
11.4
6.7
11.7
8.5
12.7
L9
5.6
7.2
5.0
8.8
3.9
7.2
3.8
6.9
3.1
5.5
3.0
5.2
3.0
5.3
5.8
7.9
ADL(1,0) ml
s.d.
CL ssr
s.d.
CL ml
s.d.
FER
s.d.
LIT ssr
s.d.
LIT ml
s.d.
LIT nsc
s.d.
ADL(1,0) ssr
s.d.
C1
4.7
5.3
6.1
7.0
4.5
5.2
6.9
7.8
11.4
12.9
8.2
8.8
5.0
4.1
6.2
7.0
C5
7.1
9.8
8.9
12.4
6.8
9.5
9.5
13.0
14.6
18.7
12.1
18.8
15.0
22.0
8.9
12.1
C9
8.8
11.3
8.7
11.9
8.5
11.4
8.9
11.9
13.4
17.1
8.9
10.9
8.6
11.0
9.4
11.9
F
7.9
10.4
6.9
9.4
7.3
10.3
6.9
9.2
9.5
12.6
6.9
9.3
6.6
8.9
7.4
8.9
L1
7.5
9.5
6.5
8.8
6.7
9.5
6.7
9.1
9.1
12.0
6.3
8.7
6.1
8.5
7.7
9.2
L5
8.4
12.4
6.9
11.5
7.5
13.1
7.0
11.2
8.4
12.6
7.0
11.2
7.0
11.3
8.2
11.2
L9
5.9
8.2
5.3
8.5
4.6
7.8
4.4
7.3
3.6
6.0
3.5
5.7
3.4
5.8
5.8
7.6
Table 15: Ranking (adjusted for non admissible solutions) by different DGPs for indicators.
FER
CL ssr
LIT nsc
LIT ml
LIT ssr
ADL(1,0) ssr
CL ml
ADL(1,0) ml
I1
1
1
4
4
7
6
8
8
I2
1
1
4
4
6
6
8
8
I3
1
1
1
2
3
8
6
7
CL ml
LIT nsc
LIT ml
CL ssr
FER
ADL(1,0) ml
ADL(1,0) ssr
LIT ssr
I1
1
1
2
5
5
5
7
8
I2
1
1
2
5
5
5
6
8
I3
1
2
3
3
5
6
7
8
In-sample period
I1sa
I2sa
I3sa
1
3
1
3
5
3
3
2
1
2
3
2
5
5
3
5
5
8
8
8
6
7
7
7
Out-of-sample
I1sa
I2sa
I3sa
1
1
1
1
1
2
2
2
3
5
5
3
5
5
5
4
5
6
7
6
7
8
8
8
I4
1
1
1
2
3
8
6
7
Total
1.3
2.1
2.3
2.7
4.6
6.6
7.1
7.3
I4
1
2
3
3
5
6
7
8
Total
1.0
1.4
2.4
4.1
5.0
5.3
6.7
8.0
Table 16: Ranking (adjusted for non admissible solutions) by different DGPs for disturbances.
CL ssr
FER
LIT ml
LIT nsc
CL ml
LIT ssr
ADL(1,0) ml
ADL(1,0) ssr
C1
1
2
2
3
5
3
5
4
C5
1
2
3
4
6
3
5
4
CL ssr
CL ml
LIT nsc
FER
LIT ml
ADL(1,0) ml
ADL(1,0) ssr
LIT ssr
C1
3
1
6
5
6
2
4
6
C5
4
1
6
5
6
2
4
6
In-sample period
C9
F L1
L5
1
2
2
3
2
3
3
4
5
5
5
4
4
5
5
4
5
5
5
3
6
6
6
6
7
7
7
7
8
8
8
8
Out-of-sample
C9
F L1
L5
4
2
2
1
1
5
4
5
4
4
4
2
6
2
5
4
5
5
5
4
3
7
6
7
7
6
7
6
8
8
8
7
154
L9
4
4
2
1
5
4
6
7
Total
2.0
2.9
3.7
3.7
4.9
4.9
6.3
6.7
L9
5
5
1
4
2
7
6
3
Total
3.0
3.1
3.9
4.4
4.7
4.9
5.7
6.6
Temporal Disaggregation Procedures: a Forecast-based Evaluation
Experiment on some Istat series
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova),
Miguel Jerez (Universidad Complutense de Madrid) e Filippo Moauro (ISTAT)
Abstract
The paper documents a comparison between temporal disaggregation procedures through a
forecasting experiment based on Istat series. The series are taken from the database of the
Istat Study Commision on temporal disaggregation methods; the competing procedures are
given from the Sutse approach and a number of classical regression methods: the Chow and
Lin, Fernandez and Litterman procedures and the specifications ADL(1,0) and ADL(1,1)
of autoregressive distributed lag models, both in the levels and in first differences. The
main evidence is that the Sutse approach and the procedures by Fernandez and Chow and
Lin including deterministic components outperform the other methods. Finally, the study
verifies a further upgrade of performances for all the regression methods in case.either the
logarithmic trasformation and diffuse initialization are talken into account.
155
156
New Features for Time Series Temporal Disaggregation in the Modeleasy+
Environment
di Giuseppe Bruno e Giancarlo Marra (Banca d’Italia)
Abstract
Temporal disaggregation techniques play a fundamental role in the applied economic analysis. In this paper we describe the enhancements for the temporal disaggregation of time
series implemented in the DISAGGR command which is available in the Modeleasy+ software package. In particular we provide the users with the most popular extensions of the
Chow-Lin (1971) method based on a given residual errors variance-covariance matrix such as
Fernández, Litterman and the Santos-Silva and Cardoso dynamic extension. Furthermore we
make available the ARIMA data based Guerrero’s (1990) method along with an automatic
ARIMA identification. One point of strength of the implementation is its usability for the
mass production of temporally disaggregated time series without the burden of interpreted
looping structures.
1
Introduction
The high costs of collecting statistical information often push the affected institutions to conduct large
sample surveys less frequently (e.g. annually instead of quarterly). Furthermore, different information
providers may release data at different paces and/or with different policies. For these and other reasons,
the data for the economic analysis are often available as time series of different frequencies.
To avoid a significant loss of information, models are specified at the highest available frequency; data
available at a lower frequency must therefore be temporally disaggregated in some way, trying to achieve
some desired statistical properties.
Two general approaches have been proposed in the literature to fulfill this purpose: methods which
rely essentially upon purely time series models to derive a smooth path for the unobserved series, and
methods based on the use of related indicators at a higher frequency.
In this note we will consider an implementation of the latter class of these methods in the computing
framework provided by the Speakeasy-Modeleasy+ environment.
This extendable package has being used since the ’70s both at the Bank of Italy and at ISTAT on the
different platforms available from time to time (mainframes running MVS/OS, departmental systems
running UNIX, personal computers running Windows, and, finally, any machine running Linux). It provides both an interactive computing shell and a simple but complete programming language. Users can
157
write their own procedures using the internal interpreted programming language, but can also enlarge the
command vocabulary of the package with precompiled functions (called ”linkules” and usually written in
FORTRAN), which are dynamically loaded in the environment and are much faster than the interpreted
procedure.
The DISAGGR command we will discuss is built in this second way, and its seamless integration with
the Speakeasy-Modeleasy+ environment is one of its points of strength. This environment provides
a comprehensive set of commands for manipulating time series with linear and nonlinear operations,
along with flexible graphical and reporting capabilities. It also features a powerful ”metacommand”
implementing an implicit looping which allows a fast processing of large sets of time series using a single
line of code (see the usage of the command MULTCALL in the examples reported in the Appendices).
2
Time series disaggregation in Speakeasy-Modeleasy+
Although more than 30 years have passed since the seminal work of Chow and Lin (1971), their method
for disaggregating time series has been the work-horse for many National Statistical Offices and Central
Banks, and is still the most widely used.
Very few general purpose statistical packages have devoted great attention to the issues concerned with
time disaggregation. Therefore, the advanced users of these packages in the academic and institutional
world developed several special purpose programs, usually as add-on enhancements of those packages.
The wide use of the Speakeasy package at both the Research Department of the Bank of Italy and the
Italian National Statistical Office (ISTAT) led to the development of a command (DISAGGR) within this
package implementing the basic Chow-Lin (1971) and Denton (1971) algorithms.
Given the theoretical evolution in the field, and after an overview on the programs available nowadays
(the examination of DYNCHOW, a program developed by EUROSTAT to be used with the GAUSS
package, was particularly useful), we decided to update the original DISAGGR command, enhancing it
with the most important algorithms published in the statistical and econometric literature since its first
release.
In the classical framework put forward by Chow-Lin (1971), the time disaggregation problem can be
represented using the following high frequency model:
yt = β ′ xt + ut
(1)
where:
yt
is the s · N components unobserved high frequency variable, of which we only know the N aggregated values yl = c′ · [yl 1 , · · · , yl s ], l = 1, . . . , N, yl k = y[(l−1)·s+k] , k = 1, . . . , s, being c a
s-elements disaggregation vector as defined below and the frequency disaggregation ratio (F DR)
s = highf requency/lowf requency, is the number of high frequency observations in each low frequency period;
β
is a p-dimensional vector of parameters;
xt
is a p-dimensional vector of observable indicators;
ut
is an unobservable stochastic term.
158
Starting from the original Chow-Lin method based on model (1) with a stochastic term following a
first order autoregressive process ut = ρut−1 + ǫt , with |ρ| < 1 and ǫt ∼ N ID(0, σǫ ), we added to the
DISAGGR command the methods proposed by the following authors:
Fernández (1981), who postulates an error process following a simple random walk
ut = ut−1 + ǫt with ǫt ∼ N ID(0, σǫ ) and initial condition u0 = 0;
Litterman (1983), who postulates an error process described by:
ut = ut−1 + at with at = φat−1 + ǫt and ǫt ∼ N ID(0, σǫ ) and initial conditions a−1 = a0 = 0;
Santos Silva and Cardoso (2001) , (henceforth SSC) who provide a dynamic extension of the original Chow-Lin method by including the lagged endogenous variable among the explanatory variables:
yt = φyt−1 + β ′ xt + ǫt with |φ| < 1 and ǫt ∼ N ID(0, σǫ ) ;
Guerrero (1990), who proposes a data-driven method which produces the Best Linear Unbiased Estimator of the disaggregated series given a preliminary estimate and the constraints induced by the
low frequency observations.
All of the previous algorithms can be used with one of the three following aggregation criteria, corresponding to different definitions of the aggregation matrix C:
SUM : the given low frequency values are the sum of the high frequency values in the corresponding
period, i.e. the unknown series is the distribution of a series of flows; c = (1, 1, · · · , 1);
AVERAGE : the given low frequency values are the means of the high frequency values in the corresponding period, i.e. the unknown series is the distribution of a series of indexes; c = (1/s, 1/s, · · · , 1/s)
where s is the F DR ;
STOCK : the given low frequency values are the values of the high frequency at the beginning or at
the end of corresponding period, i.e. the unknown series is the interpolation of a series of stock;
c = (1, 0, · · · , 0) or c = (0, 0, · · · , 1).
Given the aggregation vector c, the aggregation matrix C(N, s · N ) is defined by the Kroenecker product
C = IN ⊗ c′ where IN is the N unit matrix.
In order to provide forecasting capabilities, the disaggregation procedures also allow the extrapolation
of series of stock or flows as far as the high frequency indicators data is available.
3
Technical description of the algorithms
In this section we give the basic description of the different procedures the new version of the DISAGGR
command offers. In section 3.1 we will consider the Chow-Lin, Fernández, Litterman and the Santos
Silva and Cardoso methods which are based on a given variance-covariance matrix. A section 3.2 is
devoted to the the Guerrero algorithm which is based on different assumptions.
159
3.1
Methods based on a priori residual variance-covariance matrix
When using any of the first four disaggregating procedures mentioned above, the user can estimate the
parameters of model (1 or 2)choose the objective function selecting the appropriate options as follows.
OPTIMIZE LIKELIHOOD will maximize the log-likelihood function:
l(β, σǫ2 |ρ) = −
N
1
1
· ln(2πσǫ2 ) − · ln|Vl (ρ)| − 2 (yl − CX β̂(ρ))′ Vl (ρ)−1 (yl − CX β̂(ρ))
2
2
2σǫ
(2)
where N is the number of low frequency observations
OPTIMIZE SSR will minimize the Weighted Least Squares given by
SSR(ρ) = (yl − CX β̂(ρ))′ Vl (ρ)−1 (yl − CX β̂(ρ))
(3)
as originally proposed in Barbone et al. (1981).
In the formulae above, the boldface is used to represent vectors, the pedix l refers to the quantities
temporally aggregated and the matrix C is the temporal aggregation matrix linking the low frequency
observed vector yl with the corresponding estimand high frequency vector yh . We have yl = Cyh . Once
the function to optimize has been chosen, the high frequency variance-covariance matrices are defined. A
remarkable improvement over the former version is the direct evaluation of the high frequency variancecovariance matrices from their definition, avoiding any matrix inversion and thus achieving a higher
numerical accuracy. The elements of the matrix are defined as follows:
1) Chow-Lin: VChow−Lin [i, j] =
ρabs(i−j)
1−ρ2
2) Fernández: VF ernandez [i, j] = min(i, j)


1
if i = j = 1



k
Σi−1
φ
if j = 1, i > 1
k=0
3) Litterman: VLitterman [i, j] =
j−1 k

Σk=0 φ
if i = 1, j > 1


 Σi−1 (Σj−1 Z(l, k)) if i > 1, j > 1
k=0
l=0


1
0
0 ··· 0
 −φ 1
0 ··· 0 




′
−1
0
−φ
1
·
·
·
0

.
where the matrix Z = (A A) with A = 

..
..
..


.
.
.


0
···
0
−φ 1
All the three previously given high frequency variance-covariance matrices are defined without the factor
given by the variance of the residual white noise σǫ2 . The low-frequency variance-covariance matrices
are built using the standard aggregation matrices along with these high frequency variance-covariance
matrices according to Vl = CVh C ′ .
Using these elements, the GLS method is applied to estimate the β coefficients. When requested by the
algorithm (Chow-Lin, Litterman and SSC ), the optimization is carried out numerically, evaluating the
objective function on the autoregressive parameters of the model (either ρ or φ). This set is defined
iteratively, under the control of parameters provided by the user.
The user can specify :
160
Niters : the number of iterations ;
Nsamples : the number of samples to be taken in the current interval ;
RhoLef t, RhoRight: the bounds of the initial interval.
In each iteration j, after the first one, a new interval centered around the current ”best value” of ρ and
1
wide ∆j = Nsamples
· ∆j−1 is defined.
Using the appropriate values, the user can uniformly scan the initial interval (Niters = 1 and e.g.
Nsamples = 101), or, relying on the smoothness of the objective function (to be proved), make the search
focus in the neighbourhood of the current ”best value” (for example Niters = 3 and Nsamples = 10),
reducing the computational effort and increasing the numerical accuracy of the result.
The Chow-Lin method has been extended with dynamic features as proposed in Santos Silva and Cardoso
(2001) by using an high-frequency Autoregressive Distributed Lag (ADL) model of order (1, 0). Starting
from model (2), this method has been implemented reusing most of the code for the static version of
Chow-Lin.
3.2
A data based method: the Guerrero approach
The last procedure available as an option of the DISAGGR command is that put forward by Guerrero. In
his 1990 paper, Guerrero proposed an ARIMA based approach. Here the starting point is a preliminary
estimate, Wt , and an ARIMA representation of the unobserved high frequency time series Zt which is
φ(B)d(B)Zt = τ (B)aZ,t where aZ,t is the zero-mean white noise Gaussian driving process.
Given these two basic elements, the linear Minimum Mean Square Error estimator subject to the low
frequency constraints is derived. In practical applications the preliminary estimate Wt is usually obtained
through the use of linearly related series Xt which are observed over all the high frequency time periods
of interest N · s. An ARIMA model is fitted to the preliminary estimate Wt and this model is assumed
to hold for the unknown series to be disaggregated. Once this ARIMA model is established we compute
the pure moving average representation of the forecast error
Zt − E0 (Zt ) = Σt−1
j=0 θj aZ,t−j
(4)
This expression is obtained applying the Wold’s Decomposition Theorem after suitable differencing for
removing any non stationarity.
The moving average coefficients are then arranged in the N · s-by-N · s lower triangular matrix of weights
:


1
0
··· 0
 θ
1
0
··· 0 
1




θ1
1
0 ··· 0 
 θ2


..
..
..
..
..
θ=
.
.
.
.
.


.


..


..
.

.
θ1
1 0 
θN ·s−1 θN ·s−2 · · · θ2 θ1 1
161
Once this matrix is defined, the Best Linear Unbiased Estimator of Zt , given the preliminary estimate
Wt and the aggregation constraints is obtained according to
Ẑt = Wt + Â · (Y − C · W )
 = θP θ′ C ′ (CθP θ′ C ′ )−1
(5)
where C is the aggregation matrix and P is the errors variance-covariance matrix conditioned on the
preliminary estimate. The task of fitting an ARIMA model for a time series has been carried out by
using TRAMO/SEATS package (see Gomez and Maravall (1998) ). For this purpose a procedure for
calling the TRAMO/SEATS program and automatically extracting and automatically extracting the
regular and seasonal ARIMA parameters from the relevant output files has been developed.
4
Extension for models with log-transformed variables
Many econometric applications based on macroeconomic time series require the logarithmic transformation of the original data to achieve better statistical properties (e.g. stationarity of first differences).
For the logarithmic transformation the additivity does not hold because of its nonlinearity.
In order to deal with this issue, as suggested in Aadland (2000), assuming zt,u = log(yt,u ) and starting with the basic identity showing the logarithm of the s-period geometric sum of the basic series
log(Πsu=1 (yt,u )) = Σsu=1 log(yt,u ), we have approximated this expression with the first order Taylor series
approximation
log(Πsu=1 (yt,u )) ≈ s[log(Σsu=1 yt,u ) − log(s)]
(6)
This adjustment is used for correcting the dependent variable of the observable low frequency model.
Once computed the logarithm of the disaggregated variable, its level is obtained in two steps. First we
carry out the exponentiation: ŷt,u = exp(ẑt,u ). Because of the previous first order approximation this
estimate of the levels does not satisfy the aggregation constraint, therefore yt − Σsu=1 yt,u = rt 6= 0. Then,
as suggested in Proietti (1999) , the residuals rt have been distributed over the high frequency periods
by using the Denton (1971) first order additive benchmarking procedure. This procedure is needed only
for flows distributions while no adjustment is required for stock interpolation.
5
Examples
We have carried out some performance evaluations with this new version of the DISAGGR command.
Preliminary results seem very satisfactory. Using a 3.2 GHz Pentium 4 platform having 1.0 Gbytes
RAM we disaggregated 8 time series with 65 observations in 1.8 seconds of cpu time for the Chow-Lin,
Litterman and SSC methods whereas it took only .5 seconds for the Fernández method 32 .
Appendix A lists the help document available online. The other two appendices show some examples
featuring the main options and an example of a single-line command for operating on a batch of time
series.
Then, some examples of the disaggregating command on a single series is shown, along with the output
that can be obtained. Finally, we show some results of the application of the Guerrero method on some
economic time series provided by ISTAT. For each picture we show the indicator along with disaggregated
time series produced using Chow-Lin and Guerrero’s methods.
32 The
Chow-Lin, Litterman and SSC iterative methods have been applied with Niters = 2, 20
162
References
Aadland, D. M. (2000): “Distribution and Interpolation using transformed data.,” Journal of Applied
Statistics, 27, 141–156.
Barbone, L., G. Bodo, and I. Visco (1981): “Costi e Profitti in senso stretto: un’analisi su serie
trimestrali .,” Bollettino economico, Banca d’Italia.
Chow, G., and A. Lin (1971): “Best Linear Interpolation, Distribution and Extrapolation of Time
Series by Related Series,” Review of Economics and Statistics, 53, 372–375.
Denton, F. (1971): “Adjustment of Monthly or Quarterly Series to Annual Totals: An Approach Based
on Quadratic Minimization.,” Journal of American Statistical Association, 66, 99–102.
Fernández, R. (1981): “A Methodological note on the Estimation of Time Series,” Review of Economics
and Statistics, 63, 471–476.
Gomez, V., and A. Maravall (1998): “TRAMO SEATS Instructions for the users,” Discussion Paper
383, Banco de Espana.
Guerrero, V. (1990): “Temporal disaggregation of Time Series: An ARIMA-based Approach.,” International Statistical Review, 58, 29–46.
Litterman, R. (1983): “A Random Walk, Markov Model for the Distribution of Time Series,” Journal
of Business and Economics Statistics, 1, 169–173.
Proietti, T. (1999): “Distribution and Interpolation revisited: a Structural approach.,” Statistica, 58,
411–432.
Santos Silva, J., and F. Cardoso (2001): “The Chow-Lin Method Using Dynamic Models.,” Economic Modelling, 18, 269–280.
163
Appendix A.
Modeleasy+ ”help document” for the DISAGGR command
DISAGGR distributes or interpolates data into a finer grid.
The command
YY=DISAGGR(Y,[C,] X1,...,Xn [:options])
returns in YY the distribution (or interpolation) of Y at a higher
frequency.
Y is assumed to contain a set of equally spaced (e.g. annual) data.
X1,...,Xn must identify sets of data with a higher frequency
(e.g. quarterly) modelling the distribution of YY.
Once aggregated, X1,...,Xn are used as independent variables in a
GLS regression of Y.
Using the keyword C as the 2nd argument, the regression will include
a constant term.
The estimates of the high frequency values are obtained using the
regression coefficients and the residuals from the low-frequency model.
The data can be all timeseries or all real arrays objects.
Each X1,...,Xn can also be a nameliteral object whose values identify
the data objects.
If some data object is 2-dimensional, its rows or columns are
considered as the regressors, depending on whether the BYROW
(default) or BYCOLUMN option is used.
The following objects are returned in the work area :
BETA
SBETA
FITTED1
ERRORS1
FITTED4
ERRORS4
VARIAN4
RHO
PHI
:
:
:
:
:
:
:
:
:
the
the
the
the
the
the
the
the
the
coefficients of the GLS regression
standard errors of the coefficients
fitted values at low frequency
corresponding residuals
fitted values at high frequency
corresponding residuals
variances of the high frequency disaggregated series
coefficient of the autoregressive process
coefficient of the dynamic
process
Each of these objects can be given a different name using the
optional clause :
newname = standardname
e.g. MYBETA =BETA returns the coefficients into MYBETA.
MYSBETA=SBETA returns the standard errors into MYSBETA.
Other options are :
CHOWLIN, SSC, FERNANDEZ, GUERRERO, LITTERMAN :
control the disaggregation method
(mutually exclusive, default : CHOWLIN )
SUM, FLOWS, MEAN, STOCK, INTERPOL, BOPSTOCK, EOPSTOCK:
control the disaggregation mode.
164
(mutually exclusive, default : SUM ;
SUM and FLOWS are equivalent;
STOCK, INTERPOL and EOPSTOCK are equivalent)
RHOIS
= rho
: use the given value for RHO.
(not used for FERNANDEZ,GUERRERO)
FRATIO = N
: defines the frequency ratio of the disaggregation
(default : N = 4)
EXTENDED
: use the high frequency data exceeding the range of
existence of the low frequency data to compute
"extrapolations" for the high frequency results
(YY,FITTED4,ERRORS4).
LOGMODEL
: use the log transformation in the high frequency
model for the dependent variable.
LOGLOGMOdel
: use the log transformation in the high frequency
model for both the dependent and the explanatory
variables.
Optimization settings:
Note that optimization settings are not used for
FERNANDEZ and GUERRERO methods, and when RHO
is given a value using the RHOIS option.
OPTIMIZE SSR
: search for and use the value of RHO that minimize
the weighted sum of squared low frequency residuals.
OPTIMIZE LIKELIHOOD :
search for and use the value of RHO that maximize
the log-likelihood of estimated low frequency
residuals distribution.
ITERLIM = Niters,Nsamples :
defines that the optimum RHO has to be determined
in Niters iterations, and in each step Nsamples
samples must be taken. (default : ITERLIM=2,10)
Iterative process can also be driven using the command directing
variable ITERLIM defined as above. In this case, also the extrema
of the search interval can be specified, as the third and the fourth
elements of the ITERLIM array:
ITERLIM = Niters, Nsamples, RhoLeft, RhoRight
In-line options override the effect of command directing variables.
Options related to printed output :
LIST and/or PLOT control the production of some printed output.
OUTPUT=Filename : the printed output, requested by LIST and/or
PLOT options, is written in the file Filename.
Such a file is overwritten each time.
If not specified, any printed output is appended to
165
a system dependent default output file.
Objects that should be present in workarea:
When using the GUERRERO method, the following object must be defined:
= parameters of ARIMA model for indicator
W_DDR
W_DDS
W_ARR
W_MAR
W_ARS
W_MAS
:
:
:
:
:
:
degrees of differencing , regular
degrees of differencing , seasonal
AR operator , regular
MA operator , regular
AR operator , seasonal
MA operator , seasonal
= parameters of ARIMA model for discrepancy
D_DDR
D_DDS
D_ARR
D_MAR
D_ARS
D_MAS
:
:
:
:
:
:
degrees of differencing , regular
degrees of differencing , seasonal
AR operator , regular
MA operator , regular
AR operator , seasonal
MA operator , seasonal
___
166
Appendix B.
Example of Modeleasy+ session for a single series disaggregation
A SIMPLE DISAGGR COMMAND followed a selection of the
output
********** Modeleasy+ commands ****
yd1 = disaggr(ay3,c, ai3:CHOWLIN list OPTIMIZE LIKELIHOOD )
tstat = a1d(beta)/a1d(sbeta)
rlab = (’coeffic’ , ’t-statist’)
clab = namelist( constant, indicator)
htabula( beta,tstat:
& title=" regression coefficients and t-stat ",
& rowlabel= rlab, collabel=clab)
********** Modeleasy+ output ****
+----------------------------------------+
!
regression coefficients and t-stat
!
+------------+-------------+-------------+
!
! CONSTANT
! INDICATO
!
+------------+-------------+-------------+
! COEFFIC
! -144.04
!
33.287
!
+------------+-------------+-------------+
! T-STATIS !
-1.8104 !
26.726
!
+------------+-------------+-------------+
********** Modeleasy+ commands ****
tabit rho serho
********** Modeleasy+ output ****
+-----+--------+
| RHO | SERHO |
+-----+--------+
| .81 | .11286 |
+-----+--------+
********** Modeleasy+ commands ****
yd1 = disaggr(ay3,c, ai3: list FERNANDEZ )
tstat = a1d(beta)/a1d(sbeta)
rlab = (’coeffic’ , ’t-statist’)
clab = namelist( constant, indicator)
htabula( beta,tstat:
& title=" regression coefficients and t-stat ",
& rowlabel= rlab, collabel=clab)
167
********** Modeleasy+ output ****
+----------------------------------------+
!
regression coefficients and t-stat
!
+------------+-------------+-------------+
!
! CONSTANT
! INDICATO
!
+------------+-------------+-------------+
! COEFFIC
!
113.46
!
30.456
!
+------------+-------------+-------------+
! T-STATIS !
1.1653 !
4.2548 !
+------------+-------------+-------------+
********** Modeleasy+ commands ****
tabit rho serho
********** Modeleasy+ output ****
+-----+-------+
| RHO | SERHO |
+-----+-------+
| 1 |
0
|
+-----+-------+
********** Modeleasy+ commands ****
yli = disaggr( by3,c, bi3: list LITTERMAN OPTIMIZE likelihood)
tstat = a1d(beta)/a1d(sbeta)
rlab = (’coeffic’ , ’t-statist’)
clab = namelist( constant, indicator)
htabula( beta,tstat:
& title=" regression coefficients and t-stat ",
& rowlabel= rlab, collabel=clab)
********** Modeleasy+ output ****
+----------------------------------------+
!
regression coefficients and t-stat
!
+------------+-------------+-------------+
!
! CONSTANT
! INDICATO
!
+------------+-------------+-------------+
! COEFFIC
! -43.388
!
70.077
!
168
+------------+-------------+-------------+
! T-STATIS !
-.16947 !
8.5574
!
+------------+-------------+-------------+
********** Modeleasy+ command ****
tabit rho serho
********** Modeleasy+ output ****
+------+--------+
| RHO | SERHO |
+------+--------+
| .297 | .18377 |
+------+--------+
********** Modeleasy+ commands ****
ysc = disaggr( ay1,c, ai1: list SSC
OPTIMIZE likelihood)
tstat = a1d(beta)/a1d(sbeta)
rlab = (’coeffic’ , ’t-statist’)
clab = namelist( constant, indicator, t_remain)
htabula( beta,tstat:
& title=" regression coefficients and t-stat ",
& rowlabel= rlab, collabel=clab)
********** Modeleasy+ output ****
+-------------------------------------------------+
!
regression coefficients and t-stat
!
+----------+------------+------------+------------+
!
! CONSTANT ! INDICATO ! T_REMAIN !
+----------+------------+------------+------------+
! COEFFIC ! -10.336
!
.65976 ! 100.86
!
+----------+------------+------------+------------+
! T-STATIS !
-.54152 !
24.951
!
.15638 !
+----------+------------+------------+------------+
********** Modeleasy+ commands ****
tabit phi sephi
********** Modeleasy+ output ****
+------+--------+
| PHI | SEPHI |
+------+--------+
| .331 | .19262 |
+------+--------+
169
Appendix C.
Example of Modeleasy+ session for batch disaggregation
$ Using the default option: Chow-Lin, SUM as aggregation criterion,
$ ITERLIM=2,10,[-.99,.99]
$ the interval -.99,.99 is divided in ten parts. The maximizing rho is
$ used to center a new interval [rho-1.98/10,rho+1.98/10]
yar=disaggr(X1 C XN )
ya =disaggr (X1 C XN :list plot )
tabit ya yy1 yy ya1
$ Using the FERN\’ANDEZ algorithm.
yar=disaggr(X1 C XN : FERNANDEZ SUM)
$ Using the LITTERMAN algorithm. The forecast for an incomplete year
$ is requested. The range over which rho is searched is divided in 20
$ sections three times. Each time the section is 20 times smaller
yl=disaggr(Y C X : SUM LITTERMAN EXTENDED)
$Restricting the
yc=disaggr(Y C X
$ Default option
yc=disaggr(Y C X
yc=disaggr(Y C X
Chow-Lin to a given value of rho
: SUM CHOWLIN RHOIS 0.76 )
for the SSC method
: DYNAMIC
)
: DYNAMIC EXTENDED MEAN
)
$Using the GUERRERO algorithm.
yg = disaggr(YV C XI: SUM GUERRERO)
$ Example to show the disaggregation of a batch of time series
xin = namelist(ACOAPRY,CFAFFY,CFAFFRY,AMMFAMY,PAUAMM,AMAAPRY,TOPEAY,CIGA)
xout = makename(’dou’ ints(noels(xin)))
MULTCALL xout DISAGGR xin c trend: sum dynamic extended
$ with these three statements I disaggregate the namelist xin with the
$ SSC method and extend the estimates in the final incomplete year.
indl = namelist(amfarq CFABCD cfabcrd ammoaq trend ammatrq indicat trend)
MULTCALL xout DISAGGR xin c indl: mean litterman
170
2500
2500
Guerrero (left scale)
Chow-Lin (left scale)
1500
1500
500
500
Indicator (right scale)
-500
1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
time
Figure 1: Methan gas tax
15
-500
150
x103
Guerrero (left scale)
Chow-Lin (left scale)
10
100
5
50
Indicator (right scale)
0
1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
time
Figure 2: Car makers production
171
0
64
72.00
x103
x103
Guerrero (left scale)
54
Chow-Lin (left scale)
60.75
44
49.50
34
38.25
Indicator (right scale)
24
1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
time
Figure 3: Trade sector production
172
27.00
The Starting Conditions in Fernàndez and Litterman Models of Temporal
Disaggregation by Related Series
di Tommaso Di Fonzo (Dipartimento di Scienze Statistiche, Università di Padova)
Abstract
A discussion about the way in which two classic static methods of temporal disaggregation
by related series deals with starting conditions on the nonstationary disturbance term.
1
Introduction
In this note it is shown that the static regression-based framework for temporal disaggregation developed
by Stram and Wei (1986) can help in understanding the role of the starting conditions used by Fernàndez
(1981) and Litterman (1983) to deal with nonstationary disturbances (random walk and ARIM A(1, 1, 0),
respectively).
The work is organized as follows. In section 2 the results by Stram and Wei (1986) are briefly summarized
and extended to cover the extrapolation case. In sections 3 the model by Fernàndez (1981) is formulated
and the estimator of the target series is derived according to the previously shown approach. The model
by Litterman (1983) is dealt with in section 4 and first conclusions are presented in section 5.
In the near future this note should be somewhat enlarged both from a theoretical and practical points of
view. On the theoric results side, a more thorough discussion about the role of starting conditions will
be entarteined following the ideas by Proietti (2004)33 . Moreover, the approach by Stram and Wei will
be used to give a closed form solution to the disaggregation model (called Fcombo) based on an idea by
Miguel Jerez, which ‘combines’ two temporal disaggregation models based on a first differences regression
model, the former with white noise disturbances (which in this note is shown to be equivalent to the
Fernàndez model with a constant included as related series), and the latter with MA(1) disturbances.
On the empirical side, a small simulation experiment will be performed in order to assess the effect of
different choices for the starting values on the estimated parameters and on the disaggregated values.
Finally, the impact on the estimates of some real-life series taken by Istat databases ABC and Trasfind
will be analyzed.
33 A
note (written in Italian) on this point is already available. I can send it on request to interested people.
173
2
A regression framework with ARIMA disturbances: the solution by Stram and Wey
(1986)
Let xt , t = 1, . . . , mn, be a ‘basic’ series (i.e., temporally disaggregated, e.g. a quarterly series) generated
by a gaussian ARIM A(p, d, q) process, where n indicates the number of years and m represents the
infraannual periods (m = 4 for quarterly series, m = 12 for monthly series, m = 3 if the basic series is
monthly and the low-frequency period is the quarter). Define wt = (1 − B)xt , t = d + 1, . . . , mn, where
B is the backshift operator working at the basic (high) frequency, the stationary series obtained after
differencing d times the basic series.
Now, let us indicate with yT , T = 1, . . . , n, the low-frequency aggregated series obtained as nonoverlapping sum of m successive values of the basic series:
yT =
mT
X
i=m(T −1)+1
xi = 1 + B + B 2 + ... + B m−1 xmT ,
T = 1, . . . , n.
h
It can be shown that this series is generated by an ARIM A(p, d, r), with r = p + d + 1 +
q−p−d−1
m
i
.
Let uT = (1 − B)yT , T = d + 1, . . . , n, where B is the backshift operator working at low-frequency. Given
that it is
uT = (1 + B + B 2 + . . . + B m−1 )d+1 (1 − B)d xmT ,
T = d + 1, . . . , n,
we can establish the following vector relationships between xt , wt , yT and uT :
w = △dmn x
u = C d w,
u = △dn y
where x, w, y and u are (nm × 1), ((nm − d) × 1), (n × 1) and (n − d × 1), respectively. △dmn is a
((mn − d) × mn) matrix performing differences of order d of a (mn × 1) vector and △dn is a ((n − d) × n)
matrix performing differences of order d of a (n × 1) vector. In general, for d > 0, △dr is a ((r − d) × r)
matrix given by


δ0 δ1 · · ·
δd
0 ···
0
0
 0 δ0 · · · δd−1 δd · · ·
0
0 


△dr =  .
.. . .
..
.. . .
..
.. 
 ..
.
.
.
.
.
.
. 
0 0 ···
0
0 · · · δd−1 δd
where δi , i = 0, . . . , d, is the coefficient of B i in (B − 1)d =
δi =
d
(−1)d−i ,
i
Pd
i=o
d
i
(−1)d−i B i , that is:
i = 0, . . . , d.
For example, for n = 5 and d = 1 and d = 2, respectively, it is

−1 1
0
0

0
−1
1
0
△15 = 
 0
0 −1 1
0
0
0 −1

0
0 

0 
1
and

1
2

△5 = 0
0

−2 1
0 0
1 −2 1 0  .
0
1 −2 1
As regards matrix C d , it is a ((n−d)×(mn−d)) matrix containing the coefficients of B i in the polynomial
174
(m−1)(d+1)
X
(1 + B + B 2 + . . . + B m−1 )d+1 =
ci B i . Denoting c = (c0 c1 . . . c(m−1)(d+1) )′ , we can write
i=0


0vet · · · 0vet
c′
· · · 0vet 

..
..  .
..
.
.
. 
0vet · · ·
c′
c′
0vet′m
..
.
0vet


Cd = 

For example, for d = 1, m = 4 and n = 4 we have:

1

C = 0
0
1
2 3
0 0
0 0
4 3
0 1
0 0
2 1
2 3
0 0
0 0
4 3
0 1
0 0
2 1
2 3
0
0
4
0 0
0 0
3 2

0
0 .
1
The vector combining differenced high and low frequency series has covariance matrix
Cov
w
u
=
Vw
CdV w
V w (C d )′
Vu
,
with V u = C d V w (C d )′ . For the moment we assume that V w is known. Given the gaussianity of the
basic process generating xt , we can write E(w|u) = ŵ as:
ŵ = V w (C d )′ V −1
u u.
Now, to derive an estimate of x we call y ∗ = (yn−d+1 . . . yn )′ the (d × 1) vector containing the last d
values of vector y and use the relationship
w
y∗
=
"
△dmn
.
0vetd,m(n−d) .. I d ⊗ 1′m y ∗
#
(1)
x.
For example, for d = 1, m = 4 and n = 4 we have:




























w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
y4

−1
  0
 
  0
 
  0
 
  0
 
  0
 
  0
 
  0
=
  0
 
  0
 
  0
 
  0
 
  0
 
  0
 
  0
0

or, in other words,
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
wt
y4
=
=
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
xt − xt−1
t = 2, . . . , 16
.
x13 + x14 + x15 + x16
175
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1














x













Stram and Wei (1986) show that the matrix
can write34
x̂ =
"
"
△dmn
.
0vetd,m(n−d) .. I d ⊗ 1′m y ∗
△dmn
.
0vetd,m(n−d) .. I d ⊗ 1′m
#
ŵ
y∗
#
has full rank, and then we
or, expressed in terms of the aggregated levels y,
x̂ =
"
△dmn
.
0vetd,m(n−d) .. I d ⊗ 1′m
#−1 "
d
V w (C d )′ V −1
u △n
..
.I
0vet
d,(n−d)
d
#
(2)
y
Expression (2) can be extended to handle also the extrapolation case, that is when xt ranges from 1 to
T , T > mn. The relationship between vectors u and w can be expressed as
.
d .
u = C . 0vet(n−d,T −mn) w ⇔ u = C d∗ w
where vector w has dimension (T × 1). Expression (1) becomes
w
y∗
=
"
△dT
.
.
0vetd,m(n−d) .. I d ⊗ 1′m .. 0vetd,T −mn
#−1 "
d
V w (C d )′ V −1
u △n
..
0vet
.I
(n−d,T −mn)
d
#
y.
Keeping these results togheter, the final estimator of x in the extrapolation case is given by:
x̂ =
"
△dT
.
.
0vetd,m(n−d) .. I d ⊗ 1′m .. 0vetd,T −mn
#−1 "
d
V w (C d )′ V −1
u △n
.
0vet(n−d,T −mn) .. I d
#−1 "
d
V w (C d )′ V −1
u △n
.
0vet(n−d,T −mn) .. I d
#
y
(3)
The estimates given by (2) and (3) have several ‘good’ properties (see Stram and Wei, 1986). However,
we are interested in the extension when the basic ARIMA process has a regression nucleus, which is
rather simple. In this case the reference model becomes
x = Zβ + ε
where εt ∼ ARIM A(p, d, q). The auxiliary aggregated regression is obtained by pre-multiplying the
previous relationship (i) by △dmn to get stationary disturbances, and then (ii) by C d to get observed
quantities on both sides of the regression relationship, thus obtaining
C d △dmn x = C d △dmn Zβ + C d △dmn ε ⇔ △dn C 0 x = △dn C 0 Zβ + u ⇔ △dn y = △dn C 0 Zβ + u,
with u = C d △dmn ε = C d w and C 0 denotes the (n×nm) aggregation matrix transforming high-frequency
levels in low-frequency sums. Using the GLS estimate of β, given by
β̂ =
△dn C 0 Z
i−1
i−1 −1 ′ h
′ h
d 0
d
d ′
△dn y,
△n C Z
△dn C 0 Z
C d V w (C d )′
C V w (C )
34 Stram and Wei don’t show that the result is invariant with respect to any choice of the d elements of vector y to be
used to estimate the disaggregated levels starting from the estimates of the differenced high-frequency series. However, it
is easy to check that using the first d values of y instead of the last ones gives the same results.
176
we can estimate the aggregate disturbances as r = y − C 0 Z β̂ and then use (2) to obtain an estimate of
the high-frequency residuals as
ε̂ =
"
△dmn
.
0vetd,m(n−d) ..I d ⊗ 1′m
#−1 "
d
V w (C d )′ V −1
u △n
..
0vet
.I
(n−d,T −mn)
d
#
r.
The final high-frequency estimates are then given by x̂ = Z β̂ + ε̂.
3
The Fernàndez model with unknown starting condition
The high-frequency basic model is35
xt = z ′t β + εt ,
wt ∼ W N (0, σ 2 ),
(1 − B)εt = wt ,
t = 1, . . . , mn.
Using the notation and the results shown in the previous section, we have
x = Zβ + ε
w = △1mn ε
V w = σ 2 I mn−1
and thus the auxiliary aggregated regression model is
△dn y = △dn C 0 Zβ + u
u = C 1 △1mn ε = C 1 w
It should be noted that the covariance matrix
and n = 5, it is

44
 10

 0
0
V u = σ 2 C 1 (C 1 )′ .
V u is known unless σ 2 . For example, for d = 1, m = 4

10 0 0
44 10 0 
.
10 44 10 
0 10 44
Put in other words, the auxiliary aggregated regression model features the first differences of interpoland
and aggregated related series, and has an autocorrelated M A(1) disturbance term, whose covariance
matrix is known unless σ 2 . The estimation of this model via GLS(≡ML) is straightforward.
4
The Litterman model with unknown starting condition
In this case the high-frequency basic model is
xt = z ′t β + εt ,
(1 − φB)(1 − B)εt = at ,
at ∼ W N (0, σ 2 ),
t = 1, . . . , mn.
With obvious notation, we can write:
x = Zβ + ε
w = △1mn ε
Vw =
σ2
Ω
1 − φ2
35 In this and in the next section, I omit the discussion about the models by Fernàndez and Litterman with fixed (known
and unknown) starting conditions. For details, see Proietti (2004).
177
where Ω is the (nm − 1, nm − 1) Toeplitz matrix given by

1
φ
· · · φmn−3

φ
1
· · · φmn−4


.
.
..
..
..
..
Ω=
.
.

 mn−3
mn−4
 φ
φ
···
1
φmn−2 φmn−3 · · ·
φ
The auxiliary aggregated regression model is thus given by
φmn−2
φmn−3
..
.
φ
1




.



△dn y = △dn C 0 Zβ + u
u = C 1 △1mn ε = C 1 w
Vu =
σ2
C 1 Ω(C 1 )′ .
1 − φ2
Put in other words, the auxiliary aggregated regression model features the first differences of interpoland
and aggregated related series and an autocorrelated ARM A(1, 1) disturbance term, whose covariance
matrix is known unless σ 2 and φ. This is the consequence of the fact that the high-frequency, firstdifferenced regression model has an AR(1) disturbance term. The estimation of this model can be
obtained via ML by concentrating β and σ 2 out of the log-likelihood function (either through a nonlinear optimization routine or a grid-search procedure).
5
Conclusions
With respect to the way in which these two methods have been implemented up to now by Enrique
Quilis (TD-Matlab), Tommaso Proietti (TDSsm-Ox), Giuseppe Bruno (Dynchow-Speakeasy) and myself
(Dynchow and Timedis-Gauss), the parameterization according to Stram and Wei (1986) does not need
the assumption that the starting value be fixed. Remember that both Fernàndez and Litterman assume
the starting value fixed and known equal to zero. Thanks to the work of Tommaso Proietti, the links
between fixed-and-known and fixed-and-unknown parameterisations for these two models have become
clear.
In particular, the original formulation by Fernàndez (known, fixed to zero starting value of the disturbance
term) encompasses the apparently more general formulation with unknown and fixed starting value,
provided a constant term is considered as an independent variable of the high-frequency regression
model.
Another important result, from my point of view, is that I finally was able to re-interpretate the general
state-space formulation of these models used by E4-Matlab (Terceiro et al., 2000) in terms of regression
models, which finally helps me in comparing the results obtained using different software.
It remains to explore the issue of differences in the estimates (parameters and disaggregated values)
obtained with ‘nominally’ equal models, but practically - and sometimes very - different.
178
References
Fernàndez R.B. (1981), A methodological note on the estimation of time series, The Review of Economics
and Statistics, 63, pp. 471-478.
Litterman R.B. (1983), A random walk, Markov model for the distribution of time series, Journal of
Business and Economic Statistics, 1, pp. 169-173.
Proietti (2004), Temporal disaggregation by state space methods: dynamic regression methods revisited
(mimeo).
Stram D.O. and W.W.S. Wei (1986), A methodological note on the disaggregation of time series totals,
Journal of Time Series Analysis, 7, 4, pp. 293-302.
Terceiro J., J.M. Casals, M. Jerez, G.R. Serrano and S. Sotoca (2000), Time series analysis using
MATLAB, Including a complete MATLAB Toolbox (mimeo).
179
180
Temporal Disaggregation and Seasonal Adjustment
di Tommaso Proietti (Dipartimento di Scienze Statistiche, Università di Udine) e
Filippo Moauro (ISTAT)
Abstract
The paper discusses the main issues arising in the construction of quarterly national accounts estimates, adjusted for seasonality and calendar effects, obtained by disaggregating
the original annual actual measurements using related monthly indicators.
It proposes and implements an approach that hinges upon the estimation of a bivariate basic
structural time series model at the monthly frequency, accounting for the presence of seasonality and calendar components. The monthly frequency enables more efficient estimation of
calendar component.
The main virtue of this approach is to enable adjustment and temporal disaggregation to be
carried out simultaneously. The proposed methodology also complies with the recommendations made by the Eurostat - European Central Bank task force on the seasonal adjustment
of Quarterly National accounts.
1
Introduction
This paper is concerned with the temporal disaggregation of economic flow series that are available only
at the annual frequency of observations; the resulting quarterly or monthly estimates incorporate the
information available from related indicators at the higher frequency, but as the indicators are affected
by seasonal and calendar variation, there arise the problem of adjusting the estimates for those effects.
Seasonality and calendar components explain a relevant part of the fluctuations of economic aggregates.
While the former refers to the intra-year movements in economic activity caused by various factors,
among which climatic and institutional ones are prominent, calendar effects result from essentially three
sources (see Cleveland and Devlin, 1980, Bell and Hillmer, 1983): i) weekly periodicity: the level of
economic activity depends on the day of the week. The aggregation of weekly seasonal effects into a
monthly series is referred to as trading day (TD) or working day (WD) effects, according as to whether
the series refers to sales or production. ii) moving festivals, such as Easter, which change their position
in the calendar from year to year. iii) the different length of the month or quarter: once TD/WD and
seasonal effects are accounted for, what residues is the leap year effect.
Providing quarterly national accounts estimates corrected for seasonality and calendar components satisfies a well established information need for both business cycle and structural analyses; this is officially
181
recognised in Eurostat’s Handbook of National accounts (Eurostat, 1999). A task force established by
Eurostat and the European Central Bank (Eurostat, 2002) has also set forth some guidelines for calendar
adjustment, some of which motivate this contribution: in particular, the use of regression methods is
recommended in the place of proportional adjustment, with the regressors constructed so as to take into
account the country specific holidays; when available, adjustment should be performed on monthly series,
as calendar effects are more easily identified at that frequency.
The Italian Statistical Institute, Istat, has started trading day adjustment of quarterly national accounts
in June 2003 and publishes seasonally adjusted and trading day corrected series since then. See Istat
(2003) for a description of the methodology. The French methodology is documented in Insee (2004).
Essentially, the current practice involves at least three operations: a separate seasonal and calendar
adjustment of the indicator series, and two temporal disaggregation of the annual aggregate using the
two versions of the indicator. The disaggregation method adopted is based on the technique proposed
by Chow and Lin (1971).
We argue that this is unnecessarily complicated; indeed, the main aim of the paper is to show that all
these operations can easily brought under the same umbrella. Within the unifying framework represented
by the estimation of a multivariate structural time series model formulated at the higher time interval,
seasonal adjustment of the indicators and the correction for calendar variation are carried out in one
step. The multivariate setup also provides a more consistent framework for using the information on
related series.
The plan of the paper is the following: the next section introduces the disaggregated basic structural
model with regression effects which lay at the basis of our approach. Section 3 discusses the effects of
temporal aggregation on the seasonal component and considers the consequences on modelling and data
dissemination policies. The modelling of the calendar component is considered in section 4. Section 5
illustrates the statistical treatment of the model, whereas section 6 presents a real life example.
2
The Bivariate Basic Structural Model
The basic structural model (BSM henceforth), proposed by Harvey and Todd (1983) for univariate time
series and extended by Harvey (1989) to the multivariate case, postulates an additive decomposition of
the series into a trend, a seasonal and an irregular component. Its name stems from the fact that it
provides a satisfactory fit to a wide range of seasonal time series, thereby playing a role analogous to the
Airline model in an unobserved components framework; see also Maravall (1985).
Without loss of generality we focus on a bivariate series yt = [y1t , y2t ]′ , where t is time in months; in
the sequel y1t will represent the indicator series, whereas y2t is subject to temporal aggregation, being
observed only at the annual frequency.
The BSM is such that each of the component series has the following representation:
2
yit = µit + γit + x′it δi + ǫit , i = 1, 2; t = 1, . . . , n, ǫit ∼ NID(0, σiǫ
)
where the series specific trend, µit , is a local linear component:
µi,t+1
βi,t+1
= µit
=
+
βit
βit
+
+
182
ηit ,
ζit ,
2
ηit ∼ N(0, σiη
)
2
ζit ∼ N(0, σiζ )
(1)
The disturbances ηit and ζit are mutually and serially uncorrelated, but are contemporaneously correlated
with the disturbances ηjt and ζjt , respectively, affecting the same equation of the trend for the other
series.
The seasonal component, γit , arises from the combination of six stochastic cycles defined at the seasonal
frequencies λj = 2πj/s, j = 1, . . . , 6, λ1 representing the fundamental frequency (corresponding to
a period of 12 monthly observations) and the remaining being the five harmonics (corresponding to
periods of 6 months, i.e. two cycles in a year, 4 months, i.e. three cycles in a year, 3 months, i.e. four
cycles in a year, 2.4, i.e. five cycles in a year, and 2 months):
6
X
γij,t+1
cos λj sin λj
γij,t
ωij,t
γit =
γijt ,
=
+
, j = 1, . . . , 5,
(2)
∗
∗
∗
γij,t+1
− sin λj cos λj
γij,t
ωij,t
j=1
∗
and γi6,t+1 = −γi6t + ωi6t . For the i-th series, the disturbances ωijt and ωijt
are normally and indepen2
∗
2
dently distributed with common variance σiω for j = 1, . . . , 5, whereas Var(ωijt ) = Var(ωijt
) = 0.5σiω
(see Proietti, 2000, for further details).
The symbol ǫit denotes the irregular component, which is taken to be series specific, in that it is also
uncorrelated with ǫjt . This restriction, which is not critical and can be removed at will, attributes this
source of variation to series specific measurement error.
The vector xt is a K × 1 vector of regressors accounting for calendar effects, which will be specified in
section 4 and δi is a vector of unknown regression coefficients for the i − th series.
According to the model specification, the indicator variable y1t and the national account flow y2t form a
Seemingly Unrelated Time Series Equations system (Harvey, 1989). There is no cause and effect relationship among them, but they are subject to the same underlying economic environment. In particular, the
first series can be viewed as a partial, possibly noisier, measurement of the same underlying phenomenon.
3
The effects of temporal aggregation on the seasonal component
The flow series y2t is not observed; the actual observations pertain to the yearly series
Y2τ =
11
X
y2,12τ −k , τ = 1, 2, . . . , [n/12],
k=0
where [a/b] denote is the integer part of a/b.
As the sum of 12 consecutive values of γ2 t is a zero mean invertible moving average process of order
P11
equal to 10 months, it immediately follows that the aggregation of γ2t , k=0 γ2,12τ −k , yields a pure white
noise, which, without the aid of external information on the indicator series, would be indistinguishable
P11
from the aggregation of the series specific measurement error, that is k=0 ǫ2,12τ −k .
As the seasonal disturbances in y2t are contemporaneously correlated with those driving the seasonal
component in the indicator, in principle the bivariate model could identify the component resulting from
P
aggregation of γ2t , as the white noise source of variation that is independent of 11
k=0 ǫ2,12τ −k and which
is due to the interaction with the disturbances ω1jt ’s.
However, in the situations typically occurring in practice, where seasonality has a slow and weak evolution
and sample sizes are small, this source of variation is negligible to an extent that trying to disentangle
it from the measurement error would be asking too much of the available data.
183
One possibility is to assume it away, as will soon be argued. An alternative feasible strategy is to
borrow the seasonal pattern from the indicator. This is what is prevailing in current practice adopted
by statistical national offices, which produce disaggregate estimates according to the scheme: ŷ2t =
b0 + b1 y1t + et , where b0 , b1 are the generalised least squares estimates of the regression coefficients based
on the Chow and Lin (1971) model, and et is the distribution residual.
The estimates ŷ2t are referred to as ”raw”; trading day adjusted series are produced by the same scheme,
in which y1t is replaced by a corrected series. The assumption underlying these operations is that the
seasonal component in the national accounts aggregate is proportional to that in the indicator, the factor
of proportionality being the same b1 that relates the annual series.
The conditions under which the seasonal behaviour of the aggregate series can be borrowed from y1t
via standard generalised regression are indeed rather stringent. Not only common seasonal features are
required but also a restricted covariance structure in the nonseasonal component.
Denoting by zt = [z1t , z2t ]′ the nonseasonal component, we rewrite the disaggregate bivariate model as
yit = zit + γit , i = 1, 2. Assume now that γ2t = λγ1t (common seasonal component) and that the
nonseasonal component follows a seemingly unrelated system of equations:
2
θ(L)
σ1κ σ12,κ
zt =
κt , κt ∼ NID(0, Σκ ), Σκ =
,
(3)
2
σ12,κ σ2κ
φ(L)
where θ(L) and φ(L) are suitable scalar lag polynomials.
P θ (L)
If zt results from the sum of several orthogonal components, zt = j φjj (L) κjt , κjt ∼ NID(0, Σjκ ), such
as zt = µt + ǫt , then (3) requires homogeneity (see Harvey, 1989, sec. 8.3), which amounts to Σjκ = qj Σκ ,
where Σκ is a constant matrix and qj is a proportionality factor which equals 1 for a selected component.
2
If further σ12,κ = λσ2κ
, then it is possible to write
∗
∗
y2t = λy1t + z1t
, z1t
=
θ(L) ∗ ∗
2
2
κ , κ ∼ NID(0, σ1κ
− λ2 σ2κ
)
φ(L) 1t 1t
and thus we can safely attribute the portion λ of the seasonality in the indicator to the aggregate series.
The restrictions under considerations are testable, say by the LR principle, although the properties of
such a test are yet to be investigated.
[This test will become integral part of the modelling strategy, in a later version of the paper]
We believe that the strategy of giving up the idea of estimating the seasonality in y2t altogether is more
neutral. Thus, in the sequel we shall assume that
11
X
γ2,12τ −k = 0,
(4)
k=0
en lieu of E
2
is σ2ω
= 0).
P
11
γ
= 0. Notice that (4) strictly holds when seasonality is deterministic (that
2,12τ
−k
k=0
In the light of the previous discussion, the ”raw” series are more a statistical artifact, than a useful
addition to the supply of official economic statistics. If the primary interest of the investigation were the
184
seasonal fluctuations on their own, it is more sensible and informative to study the monthly indicators
from the outset.
A final important point arises as a consequence of (4). The simplification preserves the accounting
relationship that the sum of the disaggregated series over 12 months adds up exactly to the annual total,
which would not hold otherwise. As for the series corrected for the calendar component, this would sum
up to the annual estimate with the calendar effects removed.
In conclusion the proposed solution has the additional merit of complying with the recommendation of
the Eurostat/ECB task force concerning time consistency with annual data (recommendation 3.c):
Time consistency of adjusted data should be maintained for practical reasons. The reference
aggregates should be the annual total of quarterly raw data for seasonally adjusted data and
annual total of quarterly data corrected for trading day effects for seasonally and trading day
adjusted data. Exceptions from the time consistency may be acceptable if the seasonality is
rapidly changing.
In situations were seasonality is not rapidly changing, our assumption seems plausible.
4
Calendar components
Calendar effects have been introduced as regression effects in the model equation for yit . Three sets of
regressors are defined to account for each of the three sources of variation mentioned in the introduction.
Trading day (working day) effects occur when the level of activity varies with the day of the week, eg.
it is lower on Saturdays and Sundays.
Letting Djt denote the number of days of type j, j = 1, . . . , 7, occurring in month t and assuming that
the effect of a particular day is constant, the differential trading day effect for series i is given by:
T Dit =
6
X
δij (Djt − D7t )
j=1
The regressors are the differential number of days of type j, j = 1 . . . , 6, compared to the number of
Sundays, to which type 7 is conventionally assigned. The Sunday effect on the i-th series is then obtained
P6
as − j=1 δij . This expedient ensures that TD effect is zero over a period corresponding to multiples of
the weekly cycle.
The regressors are then corrected to take into account the national calendars: for instance, if the Christmas falls on a Monday, one unit should be deducted from D1t and reassigned to D7t if for that particular
application a holiday can be assimilated to a Sunday. This type of correction is recommended by Eurostat
and is adopted in this paper, giving:
T Dit =
6
X
j=1
∗
∗
∗
δij
Djt
− D7t
.
It is often found that the effect of the working day from Mondays to Thursday is not significantly different
∗
and that it helps to avoid collinearity among the regressors to assume that δij
= δi∗ for j = 1, . . . 5; in
185
such case a single regressor can validly be employed, writing
T Dit = δi∗ Dt∗ , Dt∗ =
5
X
5 ∗
∗
.
Djt
− D7t
2
j=1
The only moving festival in the Italian case concerns Easter; its effect is modelled as Et = δht where ht
is the proportion of 7 days before Easter that fall in month t. Subtracting 1/12 from ht yields a regressor
h∗t = ht − 1/12 which has a zero mean over the calendar year.
Finally, the length of month (LOM) regressor results from subtracting from the number of days in each
P
month, j Djt , its long run average, which is 365.25/12.
What are the consequences of temporal aggregation from the monthly frequency to the annual one? The
holiday effect becomes constant (hτ = 1, h∗t = 0), whereas the LOM regressor takes the value 3/4 in
leap years and -1/4 in normal years, describing a four year cycle, which is an identifiable though not
necessarily significant effect.
As shown by Cleveland and Devlin, the presence of trading day effects in a monthly time series induces
a peak in the spectrum at the frequency 0.348 × 2π in radians, and a secondary peak at 0.432 × 2π.
For yearly data the relevant frequencies are 0.179 × 2π and 0.357 × 2π, corresponding to a period of 5.6
years and 2.80 years, respectively. In conclusion, the presence of a calendar component in yearly data
produces peaks at the frequencies 0.358π (TD), 0.5π (leap year), 0.714π (TD) and π (leap year).
In conclusion, the calendar component has detectable effects on an annually aggregated time series; thus,
one possibility is to let the vector δ2∗ measuring the corresponding effects unrestricted. An alternative
parsimonious strategy is to assume that δ2∗ = κδ1∗ for a scalar κ, which amounts to assume that the
calendar effects on the second series are proportional to those affecting the first. This would require the
estimation of a single coefficient. The difference with the unrestricted approach is that the disaggregated
time series including the calendar component would feature the Easter effect, which would otherwise be
absent.
5
Statistical treatment
The state space methodology provides the necessary inferences, starting from the estimation of unknown
parameters, such as the variances of the disturbances driving the components and the regression coefficients, and the estimation of the disaggregated values y2t ; the assessment of their reliability. Moreover,
diagnostic checking can be carried out on the model’s innovations, so as to detect and possibly take the
corrective actions against any departure from the stated assumptions.
As a first step, the monthly bivariate model, with temporal aggregation concerning solely the second
variable, is cast in the state space form using an approach due to Harvey (1989, sec. 6.3, 2001), which
translates the aggregation problem into a missing value problem. According to this approach, the
following cumulator variable is defined for the second variable:
c
c
y2t
= ψt y2,t−1
+ y2t , ψt =
0, t = 12(τ − 1) + 1, τ = 1, . . . , [n/12]
1,
otherwise
186
(5)
In the case of monthly flows whose annual total is observed,
c
c
c
y21
= y21 ,
y22
= y21 + y22 ,
. . . y2,12
= y21 + . . . + y2,12 ,
c
c
c
y2,13 = y2,13 , y2,14 = y2,13 + y2,14 , . . . y2,24 = y2,13 + . . . + y2,24 ,
..
..
..
.
.
···
.
c
c
Only a systematic sample of every s-th value of y2t
process is observed, y2,12τ
, τ = 1, . . . [n/12], so that
all the remaining values are missing.
The cumulator is included in the state vector and the state space representation if formed. The associated
algorithms, and in particular the Kalman filter and smoother are used for likelihood evaluation, and
estimation of the missing observations and thus of the disaggregated values of the series. The smoothed
estimates of the monthly series are then aggregated to the quarterly frequency. All the computations
concerning the illustrations presented in the next section were carried out in Ox36 . The statistical
treatment of the model was performed using the augmented Kalman filter and smoother due to de Jong
(1991, see also de Jong and Chu-Chun-Lin, 1994), suitably modified to take into account the presence of
missing values, which is accomplished by skipping certain updating operations. More technical details,
which we purposively omit for brevity, and computer programmes are available from the authors.
6
Illustrations
This section presents an illustration based on Italian time series released by Istat, dealing with the
problem of disaggregating the annual production of the food and beverages sector, resulting from the
National Accounts (NA), given the availability of the monthly industrial production index for the same
sector. The monthly index has base year 2000 =100, is seasonal unadjusted, and covers the period from
January 1970 to June 2004; the yearly NA aggregate is expressed at constant 1995 prices and is available
for the years from 1970 to 2003. The original series are plotted in figure 1.
The first step of the analysis was to estimate the bivariate basic structural model under temporal aggregation. Maximum likelihood estimation produced the following parameter estimates:
2
σ̂1η
= 0.342,
2
σ̂1ǫ = 4.474,
2
σ̂1ω
= 0.024,
2
σ̂2η
= 0.004,
2
σ̂2ǫ = 0.225,
ρ̂η = 0.567,
2
2
with σ1ζ
= σ2ζ
= 0, where the suffix 1 denotes the industrial production index and 2 the output, whereas
ρη represents the correlation between η1t and η2t ; the maximised log-likelihood is equal to −1138.839.
These results show that for both the series the trend features a constant slope, since its disturbance
variance is zero; as a result the trend is a bivariate random walk with a constant drift, with positively,
but not perfectly, correlated disturbances (ρη is estimated equal to 0.567). This suggests that the series
are not cointegrated.
2
The non-zero value for the seasonal variance parameter σ1ω
indicates that the seasonal pattern changes
in the sample period. The seasonal pattern extracted for the monthly industrial production index is
plotted in figure 2 along with the resulting seasonally adjusted series.
The model specification also included 3 regressors representing the calendar effects: the single trading
days regressor Dt∗ , the Easter variable h∗t using seven days before Easter, and the length of the month
36 Ox
is a matrix programming language developed by J.A. Doornik (2001).
187
Monthly IP index
Annual series
175
150
125
100
75
50
1970
1975
1980
1985
1990
1995
2000
2005
Figure 1: Annual production and monthly industrial production index for food and beverages.
(LOM) variable; the trading day variable accounts for Italian specific holidays (e.g. New Year’s Day,
Easter Monday, First of May, 8th of December, Christmas, etc.). The estimated coefficients for the
industrial production index, denoted respectively by δ̂1∗ , δ̂1Easter and δ̂1LOM have been
δ̂1∗ =
0.674
(0.045)
δ̂1Easter =
−1.673
(0.651)
δ̂1LOM =
1.357
(0.944)
where in parenthesis are reported the standard errors. The overall effect is shown in figure 2. All the
parameters are significant, with the exception of LOM, and have the expected sign. For the second
series the calendar effects have been restricted to be proportional to the coefficients of the indicators:
the estimated scale factor resulted 0.090, with standard error 0.012. We may thus conclude that the
calendar effect on the NA aggregate is significant.
The smoothed estimates of the disaggregate NA production series are available at both the monthly and
quarterly observation frequency. They are presented in unadjusted form in figure 3, along with their
95% upper and lower confidence limits. The size of the confidence interval informs on the reliability of
the estimates and embodies the uncertainty surrounding the estimation of the calendar effects (but not
that ascribed to the estimation of the hyperparameters - namely the variance parameters).
The quarterly estimates, adjusted for calendar effects, are presented and compared to the raw ones in
the last two panels of figure 3. The last plot refers to the estimated growth rates on an annual basis and
highlights that not only the adjusted series is smoother, but that the adjustment influences the location
and sharpness of turning points.
188
Seasonally adjusted Monthly IPI
Seasonal component Monthly IPI
Seasonally Adjusted (incl. Calendar adj.)
Monthly IPI
125
20
10
100
0
75
−10
50
1970
1980
1990
2000
1970
Calendar component
1980
1990
2000
Annual calendar component in aggregate
0.50
0.25
0.25
0.00
0.00
−0.25
−0.25
−0.50
Montly
1970
1980
Quarterly
1990
2000
1970
1980
1990
2000
Figure 2: Monthly and quarterly calendar effects estimated for the food and beverages production series.
17.5
Monthly Estimates and 95% C.I.
50
Quarterly Estimates and 95% C.I.
yt|T, smoothed
ucl
lwl
15.0
yt|T, smoothed
ucl
lwl
40
12.5
10.0
30
7.5
20
1970
1980
1990
2000
Quarterly Estimates with calendar adj.
1970
10
1980
1990
2000
Yearly Growth Rates
Raw Estimates
Calendar Adj.
Raw Estimates
Calendar Adj.
40
5
30
0
20
1970
1980
1990
2000
1970
1980
1990
Figure 3: Monthly and quarterly disaggregate production series.
189
2000
7
Conclusions
This article has proposed a disaggregation strategy for the estimation of quarterly national account
series that has several advantages over current practice. The strategy is a novel application of the ideas
contained in Harvey (1989) and Harvey and Chung (2000).
The estimates arise from fitting a multivariate structural time series model formulated at the monthly
model, which relates the national account series to the corresponding monthly indicator. The monthly
frequency allows more accurate estimation of the calendar effects.
Maximum likelihood estimation of the unknown parameters, the estimation of the disaggregated observations and their reliability, diagnostic checking and the assessment of goodness of fit, are achieved
through the state space methodology.
The approach yields automatically ”raw” and adjusted estimates of the without the need to iterate the
disaggregation procedure.
Simultaneity and statistical modelling render the proposed strategy more transparent.
From a more philosophical standpoint the approach has the merit of moving away from the exogeneity
assumption underlying the disaggregation methods based on a regression framework, such as Chow-Lin
(1971), according to which the indicator is considered as an explanatory variable.
Although we have illustrated the bivariate case, which is nevertheless the leading case of interest for
statistical agencies, the approach is immediately extended to higher dimensional systems and other
frequencies of observations.
Acknowledgments
The authors wish to thank Tommy di Fonzo, Cosimo Vitale, and Marco Marini for discussion on the
issues covered in the paper. The usual disclaimers apply.
190
References
Chow, G., and Lin, A. L. (1971). Best Linear Unbiased Interpolation, Distribution and Extrapolation
of Time Series by Related Series, The Review of Economics and Statistics, 53, 4, 372-375.
de Jong, P. (1991). The diffuse Kalman filter, Annals of Statistics, 19, 1073-1083.
de Jong, P., and Chu-Chun-Lin, S. (1994). Fast Likelihood Evaluation and Prediction for Nonstationary
State Space Models, Biometrika, 81, 133-142.
Eurostat (1999). Handbook on quarterly accounts, Luxembourg.
Eurostat (2002). Main results of the ECB/Eurostat Task Force on Seasonal Adjustment of Quarterly
National Accounts in the European Union. Informal working group on Seasonal Adjustment Fifth
meeting Luxembourg, 25-26 April 2002.
Doornik, J.A. (2001). Ox 3.0 - An Object-Oriented Matrix Programming Language, Timberlake Consultants Ltd: London.
Harvey, A.C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge
University Press: Cambridge.
Harvey, A.C. and Chung, C.H. (2000) Estimating the underlying change in unemployment in the UK,
Journal of the Royal Statistics Society, Series A, 163, 303-339.
Insee (2004). Methodology of French quarterly national accounts. Available at
http://www.insee.fr/en/indicateur/cnat trim/methodologie.htm
Istat (2003). Principali caratteristiche della correzione per i giorni lavorativi dei conti economici trimestrali. Direzione centrale della Contabilità Nazionale. Available at www.istat.it, Istat, Rome.
Proietti T. (2000). Comparing seasonal components for structural time series models. International
Journal of Forecasting, 16, 2, p. 247-260.
191
192
Appendice. Le Banche Dati utilizzate dalla Commissione
di Francesca Di Palma e Marco Marini (ISTAT)
Abstract
Vengono qui illustrate le serie annuali di contabilità nazionale ed i relativi indicatori trimestrali utilizzati negli esercizi svolti dalla Commissione per la valutazione comparativa delle
differenti tecniche di trimestralizzazione.
1
Introduzione
Uno degli obiettivi principali della Commissione è stato quello di confrontare empiricamente le
performance del metodo attualmente in uso in contabilità nazionale rispetto ad altre più recenti tecniche
di disaggregazione temporale. A tal fine sono state preparate due banche dati condivise dai membri della
Commissione dalle quali fosse possibile estrarre una o più coppie d’interesse (aggregato ed indicatore
di riferimento) sulle quali poter effettuare degli esercizi comparativi. Nella prima banca dati sono stati
inclusi differenti aggregati dei conti nazionali, prelevati dal lato della domanda, dell’offerta e del reddito.
Come vedremo nel paragrafo che segue, tali variabili sono state suddivise in base alla criticità del processo
di stima implicito nella disaggregazione temporale mediante il relativo indicatore di riferimento. La
seconda banca dati è stata invece preparata per valutare quanto un possibile cambiamento del metodo
di trimestralizzazione possa influire sulle stime dei principali aggregati economici dei conti nazionali.
A questo scopo sono state raccolti i dati relativi alle branche dell’industria manifatturiera necessari
per replicare il processo di stima del valore aggiunto a prezzi costanti seguendo la procedura della
doppia deflazione (produzione, valore aggiunto, prezzi dell’input e dell’output ed indici della produzione
industriale).
2
Le serie ABC
Il giudizio sull’accuratezza di un metodo di trimestralizzazione può spesso dipendere dal tipo di relazione
esistente fra l’aggregato di contabilità nazionale da disaggregare e l’indicatore di riferimento. In effetti,
le relazioni statistiche stimate nei conti trimestrali sono di varia natura: vi sono casi in cui l’aggregato
è strettamente legato all’indicatore da vincoli quasi contabili, altri in cui le variabili presentano una
buona correlazione statistica ed altri ancora per i quali la relazione risulta disturbata da fattori esterni
di intensità più o meno marcata. Il confronto tra differenti tecniche alternative, candidate ad essere
il perno metodologico della struttura dei conti trimestrali, non può quindi prescindere dall’impiego di
casi reali abitualmente affrontati dai ricercatori nei conti trimestrali. Per tale motivo è stata creata
preliminarmente una banca dati condivisa per gli esercizi empirici della Commissione. In essa sono
193
confluite diverse coppie di variabili d’interesse (aggregato ed indicatore), scelte sulla base dell’esperienza
maturata negli anni dai ricercatori di contabilità nazionale.
Per rappresentare al meglio le problematiche esistenti nell’attuale procedura di stima dei conti sono
stati selezionati 16 serie di contabilità con i relativi indicatori. Tali coppie sono state suddivise in
tre gruppi in base alla criticità del processo di stima sottostante (A: alta criticità, B: media criticità,
C: bassa criticità): l’assegnazione ai gruppi è stata effettuata sulla base di valutazioni soggettive dei
ricercatori ISTAT, essendo tale giudizio legato alle problematiche di stima affrontate nell’attività corrente
di trimestralizzazione. Di seguito vengono elencate le variabili scelte, mentre nelle pagine seguenti si
indicherà l’informazione congiunturale utilizzata nella disaggregazione e si mostrerà un grafico dei tassi
di crescita annuali come elemento di valutazione del grado di accostamento delle due variabili.
La banca dati ABC: una classificazione in base all’accostamento dell’indicatore
Gruppo A: Alta Criticità
A1 - Imposta sul gas metano
A2 - Produzione delle assicurazioni
A3 - Produzione degli ausiliari finanziari
A4 - Consumi delle famiglie di servizi di istruzione
Gruppo B: Media Criticità
B1 - Produzione effettiva dell’intermediazione monetaria e creditizia
B2 - Produzione di metalli e fabbricazione di prodotti in metallo
B3 - Fabbricazione di autoveicoli, rimorchi e semirimorchi
B4 - Produzione del commercio all’ingrosso, al dettaglio e riparazioni
B5 - Produzione del settore delle poste e telecomunicazioni
B6 - Produzione dell’industria alimentare, bevande e tabacco
B7 - Consumi intermedi dello stato
B8 - Imposta sul bollo
B9 - Consumi delle famiglie per autovetture e motocicli
Gruppo C: Bassa Criticità
C1 - Produzione delle coltivazioni agricole
C2 - Redditi da capitale pagati al Resto del Mondo
C3 - Redditi da lavoro dipendente ricevuti dal Resto del Mondo
194
220.0
Aggregato
Indicatore
170.0
Serie A1
Imposta sul gas metano (prezzi correnti)
120.0
Indicatore utilizzato
Quantità venduta per accisa (fonte interna Istat)
70.0
20.0
1999
2001
2003
1999
2001
2003
1997
1995
1993
1991
1989
1987
1985
1983
1981
-30.0
Tassi di crescita annuali
45.0
Serie A2
Produzione delle assicurazioni (prezzi correnti)
40.0
35.0
30.0
Indicatore utilizzato
Premi effettivi di competenza percepiti
dalle imprese di assicurazione (fonte ISVAP)
e redditi ricavati dall’investimento delle
riserve tecniche di assicurazione
25.0
20.0
15.0
10.0
5.0
0.0
-5.0
Aggregato
Indicatore
1997
1995
1993
1991
1989
1987
1985
1983
1981
-10.0
Tassi di crescita annuali
110.0
90.0
Serie A3
Produzione degli ausiliari finanziari (prezzi correnti)
70.0
50.0
Indicatore utilizzato
Trend deterministico
30.0
10.0
-10.0
Aggregato
Indicatore
Tassi di crescita annuali
195
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-30.0
40.0
Serie A4
Consumi delle famiglie di servizi di istruzione (prezzi
costanti)
30.0
20.0
10.0
0.0
Indicatore utilizzato
Bilanci di famiglia per la funzione
servizi di istruzione (fonte Istat)
-10.0
Aggregato
-20.0
Indicatore
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-30.0
Tassi di crescita annuali
Serie B1
Produzione effettiva dell’intermediazione monetaria e
creditizia
(prezzi correnti)
80.0
60.0
40.0
20.0
Indicatore utilizzato
Media ponderata dell’occupazione del settore
e dalle segnalazioni trimestrali delle banche
relative ai ricavi netti su servizi
(fonti ISTAT e Banca d’Italia)
0.0
-20.0
Aggregato
Indicatore
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-40.0
Tassi di crescita annuali
Serie B2
Produzione di metalli e fabbricazione di prodotti in
metallo
(prezzi correnti)
40.0
30.0
20.0
10.0
Indicatore utilizzato
Indice della produzione industriale dell’attività
corrispondente inflazionato con il prezzo dell’output
(fonte ISTAT)
0.0
-10.0
Aggregato
Indicatore
Tassi di crescita annuali
196
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-20.0
40.0
Serie B3
Fabbricazione di autoveicoli, rimorchi e semirimorchi
(prezzi correnti)
30.0
20.0
10.0
Indicatore utilizzato
Indice della produzione industriale dell’attività
corrispondente inflazionato con il prezzo dell’output
(fonte ISTAT)
0.0
-10.0
-20.0
Aggregato
Indicatore
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-30.0
Tassi di crescita annuali
11.0
9.0
Serie B4
Produzione del commercio all’ingrosso, al dettaglio e
riparazioni (prezzi costanti)
7.0
5.0
3.0
1.0
Indicatore utilizzato
Ammontare dei margini commerciali (fonte ISTAT)
-1.0
Aggregato
-3.0
Indicatore
1999
2001
2003
1999
2001
2003
1997
1995
1993
1991
1989
1987
1985
1983
1981
-5.0
Tassi di crescita annuali
30.0
Serie B5
Produzione del settore delle poste e telecomunicazioni
(prezzi correnti)
25.0
20.0
15.0
Indicatore utilizzato
Media ponderata del fatturato del settore delle telecomunicazioni e del fatturato delle attività postali
(fonte ISTAT)
10.0
5.0
Aggregato
Indicatore
0.0
1997
1995
1993
1991
1989
1987
1985
1983
1981
-5.0
Tassi di crescita annuali
197
25.0
Serie B6
Produzione dell’industria alimentare, bevande e tabacco
(prezzi correnti)
20.0
15.0
10.0
Indicatore utilizzato
Indice della produzione industriale dell’attività
corrispondente inflazionato con il prezzo dell’output
(fonte ISTAT)
5.0
0.0
-5.0
Aggregato
Indicatore
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-10.0
Tassi di crescita annuali
70.0
60.0
Serie B7
Consumi intermedi dello stato
(prezzi correnti)
50.0
Aggregato
40.0
Indicatore
30.0
Indicatore utilizzato
Impegni di pagamento (fonte Ragioneria generale dello
Stato)
20.0
10.0
0.0
-10.0
1999
2001
2003
2001
2003
1997
1999
1995
1993
1991
1989
1987
1985
1983
1981
-20.0
Tassi di crescita annuali
60.0
50.0
Serie B8
Imposta sul bollo
(prezzi correnti)
40.0
30.0
20.0
10.0
Indicatore utilizzato
Imposta sul bollo (fonte Ragioneria generale dello Stato)
0.0
-10.0
-20.0
Aggregato
Indicatore
1997
1995
1993
1991
1989
1987
1985
1983
1981
-30.0
Tassi di crescita annuali
198
50.0
Serie B9
Consumi delle famiglie per autovetture e motocicli
(prezzi costanti)
40.0
Aggregato
Indicatore
30.0
20.0
10.0
0.0
-10.0
-20.0
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
-30.0
1981
Indicatore utilizzato
Spesa per autovetture immatricolate dalle famiglie
(elaborazioni
su fonti Ministero dei Trasporti, UNRAE e rivista
Quattroruote)
Tassi di crescita annuali
15.0
Serie C1
Produzione delle coltivazioni agricole
(prezzi costanti)
10.0
5.0
Indicatore utilizzato
Stima trimestrale di circa il 90% del totale
della produzione coltivata (40 prodotti)
0.0
-5.0
Aggregato
Indicatore
2001
2003
2001
2003
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-10.0
Tassi di crescita annuali
120.0
100.0
Serie C2
Redditi da capitale pagati al resto del mondo
(prezzi correnti)
80.0
Aggregato
60.0
Indicatore
40.0
Indicatore utilizzato
Redditi da capitale importati (fonte Banca d’Italia-UIC)
20.0
0.0
Tassi di crescita annuali
199
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-20.0
35.0
Serie C3
Redditi da lavoro dipendente ricevuti dal resto del
mondo
(prezzi correnti)
Aggregato
25.0
Indicatore
15.0
5.0
-5.0
Indicatore utilizzato
Redditi da lavoro dipendente esportati (fonte Banca
d’Italia-UIC)
-15.0
-25.0
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
-35.0
Tassi di crescita annuali
3
Le serie della trasformazione industriale
Nel corso della prima riunione della Commissione, a seguito della presentazione delle banca dati ABC,
è stata espressa l’esigenza di valutare l’impatto su una o più variabili di riferimento di una eventuale
modifica nella strategia di disaggregazione temporale degli aggregati di contabilità nazionale. La procedura attualmente in uso per la stima degli aggregati dell’offerta (l’utilizzo della doppia deflazione) rende
molto difficile calcolare tale impatto sul Pil a prezzi costanti. Si è pertanto proposto di scegliere un
sottoinsieme di branche, per le quali studiare l’intero processo di produzione, da utilizzare per effettuare
una valutazione di impatto delle diverse tecniche sul valore aggiunto a prezzi costanti. Tale sottoinsieme
di branche può essere ben rappresentato dalle branche che costituiscono il settore della trasformazione
industriale (15 settori secondo l’attuale schema classificatorio adottato nei conti trimestrali). Si è pertanto deciso di fornire a tutti i membri della Commissione un database contenente tutte le variabili sia
di input sia di output del processo produttivo.
200