Agencies¶
Downloads + Imports¶
Read and format data¶
%time agencies = pd.read_csv(zipfile.open('agency.txt'))
agencies.tail()
agencies.info()
CPU times: user 4.66 ms, sys: 56 µs, total: 4.72 ms
Wall time: 4.17 ms
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 agency_id 37 non-null int64
1 agency_name 37 non-null object
2 agency_url 37 non-null object
3 agency_timezone 37 non-null object
4 agency_lang 37 non-null object
5 agency_phone 1 non-null object
dtypes: int64(1), object(5)
memory usage: 1.9+ KB
agencies.head()
agency_id | agency_name | agency_url | agency_timezone | agency_lang | agency_phone | |
---|---|---|---|---|---|---|
0 | 1 | S-Bahn Berlin GmbH | http://www.s-bahn-berlin.de | Europe/Berlin | de | NaN |
1 | 32 | Oberhavel Verkehrsgesellschaft mbH | https://www.ovg-online.de | Europe/Berlin | de | NaN |
2 | 47 | Verkehrsbetriebe Brandenburg an der Havel GmbH | http://www.vbbr.de | Europe/Berlin | de | NaN |
3 | 84 | Stadtverkehrsgesellschaft mbH Frankfurt (Oder) | http://www.svf-ffo.de | Europe/Berlin | de | NaN |
4 | 92 | Havelbus Verkehrsgesellschaft mbH | http://www.havelbus.de | Europe/Berlin | de | NaN |
%time routes = pd.read_csv(zipfile.open('routes.txt'))
routes.tail()
routes = routes.join(agencies[['agency_id','agency_name']].set_index('agency_id'), on='agency_id')
routes.head()
CPU times: user 4.9 ms, sys: 0 ns, total: 4.9 ms
Wall time: 5.45 ms
route_id | agency_id | route_short_name | route_long_name | route_type | route_color | route_text_color | route_desc | agency_name | |
---|---|---|---|---|---|---|---|---|---|
0 | 20969_700 | 32 | 823 | NaN | 700 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
1 | 15068_3 | 32 | 848 | NaN | 3 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
2 | 15068_700 | 32 | 848 | NaN | 700 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
3 | 14755_3 | 32 | 834 | NaN | 3 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
4 | 14755_700 | 32 | 834 | NaN | 700 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
Agencies per Route Type¶
routes['agency_name'].value_counts().head()
Berliner Verkehrsbetriebe 254
prignitzbus 85
regiobus Potsdam Mittelmark GmbH 74
Uckermärkische Verkehrsgesellschaft mbH 69
Oberhavel Verkehrsgesellschaft mbH 69
Name: agency_name, dtype: int64
rename = {2: "Intercity Rail Service", 100: "Railway Service", 109: "Suburban Railway", 400: "Urban Railway Service", 700: "Bus Service", 900: "Tram Service", 1000: "Water Transport Service"}
routes['route_type'].replace(rename, inplace=True)
routes.head()
route_id | agency_id | route_short_name | route_long_name | route_type | route_color | route_text_color | route_desc | agency_name | |
---|---|---|---|---|---|---|---|---|---|
0 | 20969_700 | 32 | 823 | NaN | Bus Service | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
1 | 15068_3 | 32 | 848 | NaN | 3 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
2 | 15068_700 | 32 | 848 | NaN | Bus Service | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
3 | 14755_3 | 32 | 834 | NaN | 3 | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
4 | 14755_700 | 32 | 834 | NaN | Bus Service | NaN | NaN | NaN | Oberhavel Verkehrsgesellschaft mbH |
routes_sorted = routes.groupby(['route_type', 'agency_name']).size().reset_index(name="count")
routes_sorted['max'] = routes_sorted.groupby('agency_name')['count'].transform('sum')
routes_sorted.loc[routes_sorted['max'] < 40, 'agency_name'] = 'Other'
routes_sorted = routes_sorted.sort_values(["max",'agency_name',"count"], ascending=False).drop('max', axis=1)
t = routes_sorted.groupby(['route_type', 'agency_name']).aggregate({'count': 'sum'}).reset_index()
t = t.assign(
ac = lambda x: x.groupby(['route_type'])['count'].transform(np.sum),
share = lambda x: x['count'].div(x['ac'])
)
t = t.pivot(index='route_type', columns='agency_name', values='share')
t.insert(len(t.columns)-1, 'Other', t.pop("Other"))
t.fillna(0.0, inplace=True)
fig, ax = plt.subplots(figsize=(15,6))
cmap = sns.light_palette(sns_c[0])
fmt = lambda y, _: f'{y :0.0%}'
t.pipe((sns.heatmap, 'data'),
vmin=0.0,
vmax=1.0,
cmap="YlGnBu",
linewidths=0.1,
linecolor='black',
annot=True,
fmt='0.2%',
cbar_kws={'format': mtick.FuncFormatter(fmt)},
ax=ax
)
ax.set(title='Agency Share per Route Type');
