Processing the flights data

The flights data were downloaded from BTS - U.S. Department of Transportation (US DOT), Bureau of Transportation Statistics. The file for each year was renamed to BTS<year> ( BTS1990.csv, BTS1991.csv, …, BTS2014.csv ) and collected in a subdirectory BST.

The row structure in data tables is the following:

YEAR Year
QUARTER Quarter
MONTH Month
ORIGIN_AIRPORT_ID Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused.
ORIGIN_AIRPORT_SEQ_ID Origin Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time.
ORIGIN_CITY_MARKET_ID Origin Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market.
ORIGIN Origin Airport
ORIGIN_CITY_NAME Origin Airport, City Name
ORIGIN_COUNTRY Origin Airport, Country
ORIGIN_COUNTRY_NAME Origin Airport, Country Name
ORIGIN_WAC Origin Airport, World Area Code
DEST_AIRPORT_ID Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused.
DEST_AIRPORT_SEQ_ID Destination Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time.
DEST_CITY_MARKET_ID Destination Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market.
DEST Destination Airport
DEST_CITY_NAME Dest Airport, City Name
DEST_COUNTRY Destination Airport, Country
DEST_COUNTRY_NAME Destination Airport, Country Name
DEST_WAC Destination Airport, World Area Code
UNIQUE_CARRIER Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years.
AIRLINE_ID An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation.
UNIQUE_CARRIER_NAME Unique Carrier Name. When the same name has been used by multiple carriers, a numeric suffix is used for earlier users, for example, Air Caribbean, Air Caribbean (1).
UNIQUE_CARRIER_ENTITY Unique Entity for a Carrier's Operation Region.
REGION Carrier's Operation Region. Carriers Report Data by Operation Region.
CARRIER Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code.
CARRIER_NAME Carrier Name
CARRIER_GROUP Carrier Group Code. Used in Legacy Analysis
CARRIER_GROUP_NEW Carrier Group New
DISTANCE_GROUP Distance Intervals, every 500 Miles, for Flight Segment
CLASS Service Class
PASSENGERS On-Flight Market Passengers Enplaned
FREIGHT On-Flight Market Freight Enplaned (pounds)
MAIL On-Flight Market Mail Enplaned (pounds)
DISTANCE Distance between airports (miles)
 0 "PASSENGERS"             0.00                     3167.00                66206.00
 1 "FREIGHT"                118745.00                428.00                 3106805.00
 2 "MAIL"                   0.00                     0.00                   0.00
 3 "DISTANCE"               4513.00                  1037.00                3451.00
 4 "UNIQUE_CARRIER"         "ADB"                    "YV"                   "BA"
 5 "AIRLINE_ID"             20110                    20378                  19540
 6 "UNIQUE_CARRIER_NAME"    "Antonov Company"        "Mesa Airlines Inc."   "British Airways Plc"
 7 "UNIQUE_CARRIER_ENTITY"  "9488B"                  "11022"                "9493E"
 8 "REGION"                 "I"                      "L"                    "I"
 9 "CARRIER"                "ADB"                    "YV"                   "BA"
10 "CARRIER_NAME"           "Antonov Company"        "Mesa Airlines Inc."   "British Airways Plc"
11 "CARRIER_GROUP"          0                        2                      0
12 "CARRIER_GROUP_NEW"      0                        2                      0
13 "ORIGIN_AIRPORT_ID"      11548                    14107                  12478
14 "ORIGIN_AIRPORT_SEQ_ID"  1154804                  1410702                1247802
15 "ORIGIN_CITY_MARKET_ID"  31548                    30466                  31703
16 "ORIGIN"                 "EMA"                    "PHX"                  "JFK"
17 "ORIGIN_CITY_NAME"       "Derby, United Kingdom"  "Phoenix, AZ"          "New York, NY"
18 "ORIGIN_COUNTRY"         "GB"                     "US"                   "US"
19 "ORIGIN_COUNTRY_NAME"    "United Kingdom"         "United States"        "United States"
20 "ORIGIN_WAC"             493                      81                     22
21 "DEST_AIRPORT_ID"        15530                    11874                  12972
22 "DEST_AIRPORT_SEQ_ID"    1553002                  1187404                1297203
23 "DEST_CITY_MARKET_ID"    35530                    31874                  30730
24 "DEST"                   "UXP"                    "GDL"                  "LHR"
25 "DEST_CITY_NAME"         "Bay St. Louis, MS"      "Guadalajara, Mexico"  "London, United Kingdom"
26 "DEST_COUNTRY"           "US"                     "MX"                   "GB"
27 "DEST_COUNTRY_NAME"      "United States"          "Mexico"               "United Kingdom"
28 "DEST_WAC"               53                       148                    493
29 "YEAR"                   2014                     2014                   2014
30 "QUARTER"                1                        2                      2
31 "MONTH"                  1                        5                      5
31 "DISTANCE_GROUP"         10                       3                      7
32 "CLASS",                 "P"                      "F"                    "F"

CSV to Pajek

# flightsNet.py
# transforming the monthly flights data 1990-2014 from
#   http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=260
# into Pajek's multirelational temporal file
# by Vladimir Batagelj, January 22, 2015

import csv, sys, os, datetime
global air, airports, cmp, companies

def indAirport(name):
  global air, airports
  if not(name in air.keys()):
    air[name] = len(air)+1
    airports.write(str(len(air))+' "'+name+'"\n')
  return air[name]

def indCompany(name):
  global cmp, companies
  if not(name in cmp.keys()):
    cmp[name] = len(cmp)+1
    companies.write(str(len(cmp))+' "'+name+'"\n')
  return cmp[name]

print("MakeAirTen *********")
print('start:',datetime.datetime.now().ctime())
workdir = 'E:/data/airports/'
os.chdir(workdir)

arcs   = open('arcs.tmp', 'w')
arcs.write('*arcs \n')
airports = open('airports.tmp', 'w')
airports.write('*vertices \n')
companies = open('companies.tmp', 'w')
companies.write('*vertices \n')
cmp  = {}; air = {}
line = 0
for year in range(1990,2015):
  podatki = './BTS/BTS'+str(year)+'.csv'  
  with open(podatki, newline='', encoding='utf-8') as dat:
    airReader = csv.reader(dat, delimiter=',', quotechar='"')
    try:
      head = next(airReader)
      for row in airReader:
        arcs.write(str(indCompany(row[4]))+': '+str(indAirport(row[16]))+ \
          ' '+str(indAirport(row[24]))+' '+str(row[0])+' ['+ \
          str(12*(int(row[29])-1990)+int(row[31]))+']\n')  
        line = line+1
    except csv.Error as e:
      sys.exit('file {}, line {}: {}'.format(podatki, airReader.line_num, e))
  print(year,':',datetime.datetime.now().ctime())

print("lines = ",line,"\nairports = ",len(air),"\ncompanies = ",len(cmp))
arcs.close(); airports.close(); companies.close()
print('stop:',datetime.datetime.now().ctime())
>>>
MakeAirTen *********
start: Thu Jan 22 04:45:14 2015
1990 : Thu Jan 22 04:45:15 2015
1991 : Thu Jan 22 04:45:17 2015
1992 : Thu Jan 22 04:45:21 2015
1993 : Thu Jan 22 04:45:25 2015
1994 : Thu Jan 22 04:45:27 2015
1995 : Thu Jan 22 04:45:28 2015
1996 : Thu Jan 22 04:45:30 2015
1997 : Thu Jan 22 04:45:32 2015
1998 : Thu Jan 22 04:45:33 2015
1999 : Thu Jan 22 04:45:35 2015
2000 : Thu Jan 22 04:45:37 2015
2001 : Thu Jan 22 04:45:39 2015
2002 : Thu Jan 22 04:45:43 2015
2003 : Thu Jan 22 04:45:45 2015
2004 : Thu Jan 22 04:45:47 2015
2005 : Thu Jan 22 04:45:49 2015
2006 : Thu Jan 22 04:45:51 2015
2007 : Thu Jan 22 04:45:55 2015
2008 : Thu Jan 22 04:45:57 2015
2009 : Thu Jan 22 04:45:59 2015
2010 : Thu Jan 22 04:46:02 2015
2011 : Thu Jan 22 04:46:04 2015
2012 : Thu Jan 22 04:46:06 2015
2013 : Thu Jan 22 04:46:08 2015
2014 : Thu Jan 22 04:46:08 2015
lines =  1737054
airports =  2776
companies =  633
stop: Thu Jan 22 04:46:11 2015

Pajek to TEN

# transforms a Pajek's temporal network in temporal matrix
# Vladimir Batagelj, 22. Jan 2015

import os, re, pickle, datetime
from TQ import *
os.chdir('E:/data/airports/')
print('start:',datetime.datetime.now().ctime())
net = open('flights.net','r')
n = 2776
ten = open('flights.ten', "w")

ten.write("*nodes "+str(n)+"\n")
while True:
   line = net.readline()
   if line[0]=='*': break
k = 0; names = ['']*n; C = [[(1,296,1)]]*n
while True:
   line = net.readline()
   if (line[0]=='*') or (k>=n): break
   L = re.split('\s+',line.strip())
   j = eval(L[0])-1; names[j] = L[1][1:-1]
   ten.write(L[0]+' '+L[1]+' [(1,296,1)]\n')
   k = k+1
while True:
   line = net.readline()
   if line[0]!='*': break

print('nodes done:',datetime.datetime.now().ctime())
W = [[ [] for u in range(n)] for v in range(n)]
ten.write("*arcs\n")
while line:
   L = re.split('\s+',line.strip())
   u = eval(L[1])-1; v = eval(L[2])-1
#   if v<u: (u,v) = (v,u)
   w = eval(L[3]); t = eval(L[4][1:-1])
   W[u][v] = TQ.sum(W[u][v],[(t,t+1,w)])
   line = net.readline()

for u in range(n):
   for v in range(n):
      if W[u][v]!=[]:
         ten.write(str(u+1)+' '+str(v+1)+' '+str(W[u][v])+'\n')
         
net.close()
ten.close()
print('arcs done:',datetime.datetime.now().ctime())

Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
time = [ v+str(u)[2:] for u in range(1990,2015) for v in Months ][:294]
flights = {}

flights['tit'] = 'Flights'
flights['num'] = n
flights['nam'] = names
flights['act'] = C
flights['tim'] = time
flights['mat'] = W
with open("flights.p","wb") as p: pickle.dump(flights,p)
print('saved:',datetime.datetime.now().ctime())
 
start: Thu Jan 22 23:47:10 2015
nodes done: Thu Jan 22 23:47:10 2015
arcs done: Thu Jan 22 23:53:11 2015
saved: Thu Jan 22 23:53:15 2015

Airports

The airports' attribute data can be found at Open flights site or datahub.

To list of airports in flights.ten available in airports.tmp we assign properties (attributes) from airportsHead.csv . We also include the numeric code from countries.dat for countries:

# constructs airports' attributes
# vladimir Batagelj, January 24, 2015
import os, re, csv, datetime
global air, cnt

def indAirport(name):
  global air
  if not(name in air.keys()): air[name] = len(air)+1
  return air[name]

def indCountry(name):
  global cnt
  if not(name in cnt.keys()): cnt[name] = len(cnt)+1
  return cnt[name]

os.chdir('E:/data/airports')

print('start Construct:',datetime.datetime.now().ctime())
print('Countries')

cnt  = {}; air = {}
countries = open('countries.dat', 'r')
row = countries.readlines()
countries.close()
for r in row: ind = indCountry(r[:-1])


line = 0
with open('airportsHead.csv', newline='', encoding='utf-8') as dat:
  airReader = csv.reader(dat, delimiter=',', quotechar='"')
  try:
    head = next(airReader)
    print('|'.join(head))
    for row in airReader:
      nam = row[1]; cny = row[3]
      iata = row[4]; icao = row[5]; tiz = row[9]
      lat = row[6]; lon = row[7]; alt = row[8]
      icny = indCountry(cny)
      if len(iata)>2:
        air[iata] = ( nam, iata, icao, icny, cny, lat, lon, alt)
      elif len(icao)==3:
        air[icao] = ( nam, iata, icao, icny, cny, lat, lon, alt)  
      else: print('missing data', line, nam)
      line = line+1
  except csv.Error as e:
    sys.exit('file {}, line {}: {}'.format(podatki, airReader.line_num, e))
print("lines = ",line,"\nairports = ",len(air))

attrs = open('airports.atr', 'w')
attrs.write('*vertices \n')
airports = open('airports.tmp', 'r')
rows = airports.readlines()[1:]
airports.close()
for r in rows:
  s = r.split(' ')
  ap = s[1][1:-2]; an = int(s[0])
  if ap in air.keys():
    ( nam, iata, icao, icny, cny, lat, lon, alt) = air[ap]
    attrs.write(s[0]+';"'+nam+'";"'+iata+'";"'+icao+'";'+str(icny)+';"'+ \
      cny+'";'+str(lat)+';'+str(lon)+';'+str(alt)+'\n')          
  else:
    print("not found", an, ap)
    attrs.write(s[0]+';"?";"'+ap+'";"'+ap+'";0;"?";0;0;0\n')               
attrs.close()

print('finished:',datetime.datetime.now().ctime())

Extracting submatrix of interesting airports

import os, re, datetime
from TQ import *
from operator import itemgetter

os.chdir('E:/data/airports')

print('start Extract:',datetime.datetime.now().ctime())
print('Network Flights')
AIR = TQ.Ianus2Mat("./Flights.ten")
print('input done:',datetime.datetime.now().ctime())
mat = AIR['mat']
print('sum =',TQ.MatSummary(mat))
print(AIR.keys())
print('dim =',AIR['dim'])
print('met =',AIR['met'])
print('nam =',AIR['nam'][0:10])
print('tit =',AIR['tit'])
print('typ =',AIR['typ'])
print('tin =',AIR['tin'][0:10])
print('til =',AIR['til'][0:10])
(nr,nc,tmin,tmax) = AIR['dim']
n = len(mat); All = range(n)
names = AIR['nam']

A = [[]]*n; S = [0]*n
for u in range(n):
   a = []
   for v in range(n): 
      a = TQ.sum(a,mat[v][u]) 
   A[u] = a; S[u] = TQ.total(a)
   
P = sorted(enumerate(S),key=itemgetter(1),reverse=True)
for i in range(100):
   print(i+1,names[P[i][0]],P[i][1])

B = [[ [] for u in range(100)] for v in range(100)]
L = []; nam = []
for (i,v) in P[0:100]:
   L.append(i); nam.append(names[i])
for (i,u) in enumerate(L):
   for (j,v) in enumerate(L):
      B[i][j] = mat[u][v]

best = {}
best['mat'] = B
best['dim'] = (100,100,1,295)
best['met'] = AIR['met']            
best['nam'] = nam
best['tit'] = "100 largest airports"
best['typ'] = AIR['typ']
best['til'] = AIR['til'][0:100]
best['tin'] = AIR['tin']

TQ.MatSave(best,file="best100.ten")
print('finished:',datetime.datetime.now().ctime())
pajek/data/temp/airpy.txt · Last modified: 2015/06/24 21:30 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki