====== Processing the flights data ====== The flights data were downloaded from [[http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=260|BTS]] - U.S. Department of Transportation (US DOT), Bureau of Transportation Statistics. The file for each year was renamed to BTS ( BTS1990.csv, BTS1991.csv, ..., BTS2014.csv ) and collected in a subdirectory BST. The row structure in data tables is the following: | YEAR | Year | | QUARTER | Quarter | | MONTH | Month | | ORIGIN_AIRPORT_ID | Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. | | ORIGIN_AIRPORT_SEQ_ID | Origin Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. | | ORIGIN_CITY_MARKET_ID | Origin Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. | | ORIGIN | Origin Airport | | ORIGIN_CITY_NAME | Origin Airport, City Name | | ORIGIN_COUNTRY | Origin Airport, Country | | ORIGIN_COUNTRY_NAME | Origin Airport, Country Name | | ORIGIN_WAC | Origin Airport, World Area Code | | DEST_AIRPORT_ID | Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. | | DEST_AIRPORT_SEQ_ID | Destination Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. | | DEST_CITY_MARKET_ID | Destination Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. | | DEST | Destination Airport | | DEST_CITY_NAME | Dest Airport, City Name | | DEST_COUNTRY | Destination Airport, Country | | DEST_COUNTRY_NAME | Destination Airport, Country Name | | DEST_WAC | Destination Airport, World Area Code | | UNIQUE_CARRIER | Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. | | AIRLINE_ID | An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. | | UNIQUE_CARRIER_NAME | Unique Carrier Name. When the same name has been used by multiple carriers, a numeric suffix is used for earlier users, for example, Air Caribbean, Air Caribbean (1). | | UNIQUE_CARRIER_ENTITY | Unique Entity for a Carrier's Operation Region. | | REGION | Carrier's Operation Region. Carriers Report Data by Operation Region. | | CARRIER | Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. | | CARRIER_NAME | Carrier Name | | CARRIER_GROUP | Carrier Group Code. Used in Legacy Analysis | | CARRIER_GROUP_NEW | Carrier Group New | | DISTANCE_GROUP | Distance Intervals, every 500 Miles, for Flight Segment | | CLASS | Service Class | | PASSENGERS | On-Flight Market Passengers Enplaned | | FREIGHT | On-Flight Market Freight Enplaned (pounds) | | MAIL | On-Flight Market Mail Enplaned (pounds) | | DISTANCE | Distance between airports (miles) | 0 "PASSENGERS" 0.00 3167.00 66206.00 1 "FREIGHT" 118745.00 428.00 3106805.00 2 "MAIL" 0.00 0.00 0.00 3 "DISTANCE" 4513.00 1037.00 3451.00 4 "UNIQUE_CARRIER" "ADB" "YV" "BA" 5 "AIRLINE_ID" 20110 20378 19540 6 "UNIQUE_CARRIER_NAME" "Antonov Company" "Mesa Airlines Inc." "British Airways Plc" 7 "UNIQUE_CARRIER_ENTITY" "9488B" "11022" "9493E" 8 "REGION" "I" "L" "I" 9 "CARRIER" "ADB" "YV" "BA" 10 "CARRIER_NAME" "Antonov Company" "Mesa Airlines Inc." "British Airways Plc" 11 "CARRIER_GROUP" 0 2 0 12 "CARRIER_GROUP_NEW" 0 2 0 13 "ORIGIN_AIRPORT_ID" 11548 14107 12478 14 "ORIGIN_AIRPORT_SEQ_ID" 1154804 1410702 1247802 15 "ORIGIN_CITY_MARKET_ID" 31548 30466 31703 16 "ORIGIN" "EMA" "PHX" "JFK" 17 "ORIGIN_CITY_NAME" "Derby, United Kingdom" "Phoenix, AZ" "New York, NY" 18 "ORIGIN_COUNTRY" "GB" "US" "US" 19 "ORIGIN_COUNTRY_NAME" "United Kingdom" "United States" "United States" 20 "ORIGIN_WAC" 493 81 22 21 "DEST_AIRPORT_ID" 15530 11874 12972 22 "DEST_AIRPORT_SEQ_ID" 1553002 1187404 1297203 23 "DEST_CITY_MARKET_ID" 35530 31874 30730 24 "DEST" "UXP" "GDL" "LHR" 25 "DEST_CITY_NAME" "Bay St. Louis, MS" "Guadalajara, Mexico" "London, United Kingdom" 26 "DEST_COUNTRY" "US" "MX" "GB" 27 "DEST_COUNTRY_NAME" "United States" "Mexico" "United Kingdom" 28 "DEST_WAC" 53 148 493 29 "YEAR" 2014 2014 2014 30 "QUARTER" 1 2 2 31 "MONTH" 1 5 5 31 "DISTANCE_GROUP" 10 3 7 32 "CLASS", "P" "F" "F" ===== CSV to Pajek ===== # flightsNet.py # transforming the monthly flights data 1990-2014 from # http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=260 # into Pajek's multirelational temporal file # by Vladimir Batagelj, January 22, 2015 import csv, sys, os, datetime global air, airports, cmp, companies def indAirport(name): global air, airports if not(name in air.keys()): air[name] = len(air)+1 airports.write(str(len(air))+' "'+name+'"\n') return air[name] def indCompany(name): global cmp, companies if not(name in cmp.keys()): cmp[name] = len(cmp)+1 companies.write(str(len(cmp))+' "'+name+'"\n') return cmp[name] print("MakeAirTen *********") print('start:',datetime.datetime.now().ctime()) workdir = 'E:/data/airports/' os.chdir(workdir) arcs = open('arcs.tmp', 'w') arcs.write('*arcs \n') airports = open('airports.tmp', 'w') airports.write('*vertices \n') companies = open('companies.tmp', 'w') companies.write('*vertices \n') cmp = {}; air = {} line = 0 for year in range(1990,2015): podatki = './BTS/BTS'+str(year)+'.csv' with open(podatki, newline='', encoding='utf-8') as dat: airReader = csv.reader(dat, delimiter=',', quotechar='"') try: head = next(airReader) for row in airReader: arcs.write(str(indCompany(row[4]))+': '+str(indAirport(row[16]))+ \ ' '+str(indAirport(row[24]))+' '+str(row[0])+' ['+ \ str(12*(int(row[29])-1990)+int(row[31]))+']\n') line = line+1 except csv.Error as e: sys.exit('file {}, line {}: {}'.format(podatki, airReader.line_num, e)) print(year,':',datetime.datetime.now().ctime()) print("lines = ",line,"\nairports = ",len(air),"\ncompanies = ",len(cmp)) arcs.close(); airports.close(); companies.close() print('stop:',datetime.datetime.now().ctime()) >>> MakeAirTen ********* start: Thu Jan 22 04:45:14 2015 1990 : Thu Jan 22 04:45:15 2015 1991 : Thu Jan 22 04:45:17 2015 1992 : Thu Jan 22 04:45:21 2015 1993 : Thu Jan 22 04:45:25 2015 1994 : Thu Jan 22 04:45:27 2015 1995 : Thu Jan 22 04:45:28 2015 1996 : Thu Jan 22 04:45:30 2015 1997 : Thu Jan 22 04:45:32 2015 1998 : Thu Jan 22 04:45:33 2015 1999 : Thu Jan 22 04:45:35 2015 2000 : Thu Jan 22 04:45:37 2015 2001 : Thu Jan 22 04:45:39 2015 2002 : Thu Jan 22 04:45:43 2015 2003 : Thu Jan 22 04:45:45 2015 2004 : Thu Jan 22 04:45:47 2015 2005 : Thu Jan 22 04:45:49 2015 2006 : Thu Jan 22 04:45:51 2015 2007 : Thu Jan 22 04:45:55 2015 2008 : Thu Jan 22 04:45:57 2015 2009 : Thu Jan 22 04:45:59 2015 2010 : Thu Jan 22 04:46:02 2015 2011 : Thu Jan 22 04:46:04 2015 2012 : Thu Jan 22 04:46:06 2015 2013 : Thu Jan 22 04:46:08 2015 2014 : Thu Jan 22 04:46:08 2015 lines = 1737054 airports = 2776 companies = 633 stop: Thu Jan 22 04:46:11 2015 ===== Pajek to TEN ===== # transforms a Pajek's temporal network in temporal matrix # Vladimir Batagelj, 22. Jan 2015 import os, re, pickle, datetime from TQ import * os.chdir('E:/data/airports/') print('start:',datetime.datetime.now().ctime()) net = open('flights.net','r') n = 2776 ten = open('flights.ten', "w") ten.write("*nodes "+str(n)+"\n") while True: line = net.readline() if line[0]=='*': break k = 0; names = ['']*n; C = [[(1,296,1)]]*n while True: line = net.readline() if (line[0]=='*') or (k>=n): break L = re.split('\s+',line.strip()) j = eval(L[0])-1; names[j] = L[1][1:-1] ten.write(L[0]+' '+L[1]+' [(1,296,1)]\n') k = k+1 while True: line = net.readline() if line[0]!='*': break print('nodes done:',datetime.datetime.now().ctime()) W = [[ [] for u in range(n)] for v in range(n)] ten.write("*arcs\n") while line: L = re.split('\s+',line.strip()) u = eval(L[1])-1; v = eval(L[2])-1 # if v start: Thu Jan 22 23:47:10 2015 nodes done: Thu Jan 22 23:47:10 2015 arcs done: Thu Jan 22 23:53:11 2015 saved: Thu Jan 22 23:53:15 2015 ===== Airports ===== The airports' attribute data can be found at [[http://openflights.org/data.html|Open flights site]] or [[http://datahub.io/dataset/open-flights|datahub]]. To list of airports in ''flights.ten'' available in ''airports.tmp'' we assign properties (attributes) from ''airportsHead.csv'' . We also include the numeric code from ''countries.dat'' for countries: # constructs airports' attributes # vladimir Batagelj, January 24, 2015 import os, re, csv, datetime global air, cnt def indAirport(name): global air if not(name in air.keys()): air[name] = len(air)+1 return air[name] def indCountry(name): global cnt if not(name in cnt.keys()): cnt[name] = len(cnt)+1 return cnt[name] os.chdir('E:/data/airports') print('start Construct:',datetime.datetime.now().ctime()) print('Countries') cnt = {}; air = {} countries = open('countries.dat', 'r') row = countries.readlines() countries.close() for r in row: ind = indCountry(r[:-1]) line = 0 with open('airportsHead.csv', newline='', encoding='utf-8') as dat: airReader = csv.reader(dat, delimiter=',', quotechar='"') try: head = next(airReader) print('|'.join(head)) for row in airReader: nam = row[1]; cny = row[3] iata = row[4]; icao = row[5]; tiz = row[9] lat = row[6]; lon = row[7]; alt = row[8] icny = indCountry(cny) if len(iata)>2: air[iata] = ( nam, iata, icao, icny, cny, lat, lon, alt) elif len(icao)==3: air[icao] = ( nam, iata, icao, icny, cny, lat, lon, alt) else: print('missing data', line, nam) line = line+1 except csv.Error as e: sys.exit('file {}, line {}: {}'.format(podatki, airReader.line_num, e)) print("lines = ",line,"\nairports = ",len(air)) attrs = open('airports.atr', 'w') attrs.write('*vertices \n') airports = open('airports.tmp', 'r') rows = airports.readlines()[1:] airports.close() for r in rows: s = r.split(' ') ap = s[1][1:-2]; an = int(s[0]) if ap in air.keys(): ( nam, iata, icao, icny, cny, lat, lon, alt) = air[ap] attrs.write(s[0]+';"'+nam+'";"'+iata+'";"'+icao+'";'+str(icny)+';"'+ \ cny+'";'+str(lat)+';'+str(lon)+';'+str(alt)+'\n') else: print("not found", an, ap) attrs.write(s[0]+';"?";"'+ap+'";"'+ap+'";0;"?";0;0;0\n') attrs.close() print('finished:',datetime.datetime.now().ctime()) ===== Extracting submatrix of interesting airports ===== import os, re, datetime from TQ import * from operator import itemgetter os.chdir('E:/data/airports') print('start Extract:',datetime.datetime.now().ctime()) print('Network Flights') AIR = TQ.Ianus2Mat("./Flights.ten") print('input done:',datetime.datetime.now().ctime()) mat = AIR['mat'] print('sum =',TQ.MatSummary(mat)) print(AIR.keys()) print('dim =',AIR['dim']) print('met =',AIR['met']) print('nam =',AIR['nam'][0:10]) print('tit =',AIR['tit']) print('typ =',AIR['typ']) print('tin =',AIR['tin'][0:10]) print('til =',AIR['til'][0:10]) (nr,nc,tmin,tmax) = AIR['dim'] n = len(mat); All = range(n) names = AIR['nam'] A = [[]]*n; S = [0]*n for u in range(n): a = [] for v in range(n): a = TQ.sum(a,mat[v][u]) A[u] = a; S[u] = TQ.total(a) P = sorted(enumerate(S),key=itemgetter(1),reverse=True) for i in range(100): print(i+1,names[P[i][0]],P[i][1]) B = [[ [] for u in range(100)] for v in range(100)] L = []; nam = [] for (i,v) in P[0:100]: L.append(i); nam.append(names[i]) for (i,u) in enumerate(L): for (j,v) in enumerate(L): B[i][j] = mat[u][v] best = {} best['mat'] = B best['dim'] = (100,100,1,295) best['met'] = AIR['met'] best['nam'] = nam best['tit'] = "100 largest airports" best['typ'] = AIR['typ'] best['til'] = AIR['til'][0:100] best['tin'] = AIR['tin'] TQ.MatSave(best,file="best100.ten") print('finished:',datetime.datetime.now().ctime())