Agencies

Downloads + Imports

%run "setup.ipynb"
CPU times: user 222 ms, sys: 167 ms, total: 389 ms
Wall time: 3.89 s
Loading BokehJS ...

Read and format data

%time agencies = pd.read_csv(zipfile.open('agency.txt'))
agencies.tail()
agencies.info()
CPU times: user 4.66 ms, sys: 56 µs, total: 4.72 ms
Wall time: 4.17 ms
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   agency_id        37 non-null     int64 
 1   agency_name      37 non-null     object
 2   agency_url       37 non-null     object
 3   agency_timezone  37 non-null     object
 4   agency_lang      37 non-null     object
 5   agency_phone     1 non-null      object
dtypes: int64(1), object(5)
memory usage: 1.9+ KB
agencies.head()
agency_id agency_name agency_url agency_timezone agency_lang agency_phone
0 1 S-Bahn Berlin GmbH http://www.s-bahn-berlin.de Europe/Berlin de NaN
1 32 Oberhavel Verkehrsgesellschaft mbH https://www.ovg-online.de Europe/Berlin de NaN
2 47 Verkehrsbetriebe Brandenburg an der Havel GmbH http://www.vbbr.de Europe/Berlin de NaN
3 84 Stadtverkehrsgesellschaft mbH Frankfurt (Oder) http://www.svf-ffo.de Europe/Berlin de NaN
4 92 Havelbus Verkehrsgesellschaft mbH http://www.havelbus.de Europe/Berlin de NaN
%time routes = pd.read_csv(zipfile.open('routes.txt'))
routes.tail()

routes = routes.join(agencies[['agency_id','agency_name']].set_index('agency_id'), on='agency_id')
routes.head()
CPU times: user 4.9 ms, sys: 0 ns, total: 4.9 ms
Wall time: 5.45 ms
route_id agency_id route_short_name route_long_name route_type route_color route_text_color route_desc agency_name
0 20969_700 32 823 NaN 700 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
1 15068_3 32 848 NaN 3 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
2 15068_700 32 848 NaN 700 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
3 14755_3 32 834 NaN 3 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
4 14755_700 32 834 NaN 700 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH

Agencies per Route Type

routes['agency_name'].value_counts().head()
Berliner Verkehrsbetriebe                  254
prignitzbus                                 85
regiobus Potsdam Mittelmark GmbH            74
Uckermärkische Verkehrsgesellschaft mbH     69
Oberhavel Verkehrsgesellschaft mbH          69
Name: agency_name, dtype: int64
rename = {2: "Intercity Rail Service", 100: "Railway Service", 109: "Suburban Railway", 400: "Urban Railway Service", 700: "Bus Service", 900: "Tram Service", 1000: "Water Transport Service"}
routes['route_type'].replace(rename, inplace=True)
routes.head()
route_id agency_id route_short_name route_long_name route_type route_color route_text_color route_desc agency_name
0 20969_700 32 823 NaN Bus Service NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
1 15068_3 32 848 NaN 3 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
2 15068_700 32 848 NaN Bus Service NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
3 14755_3 32 834 NaN 3 NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
4 14755_700 32 834 NaN Bus Service NaN NaN NaN Oberhavel Verkehrsgesellschaft mbH
routes_sorted = routes.groupby(['route_type', 'agency_name']).size().reset_index(name="count")
routes_sorted['max'] = routes_sorted.groupby('agency_name')['count'].transform('sum')
routes_sorted.loc[routes_sorted['max'] < 40, 'agency_name'] = 'Other'
routes_sorted = routes_sorted.sort_values(["max",'agency_name',"count"], ascending=False).drop('max', axis=1)
t = routes_sorted.groupby(['route_type', 'agency_name']).aggregate({'count': 'sum'}).reset_index()
t = t.assign(
    ac = lambda x: x.groupby(['route_type'])['count'].transform(np.sum),
    share = lambda x: x['count'].div(x['ac'])
)
t = t.pivot(index='route_type', columns='agency_name', values='share')
t.insert(len(t.columns)-1, 'Other', t.pop("Other"))
t.fillna(0.0, inplace=True)
fig, ax = plt.subplots(figsize=(15,6))
cmap = sns.light_palette(sns_c[0])
fmt = lambda y, _: f'{y :0.0%}'
t.pipe((sns.heatmap, 'data'), 
        vmin=0.0,
        vmax=1.0,
        cmap="YlGnBu",
        linewidths=0.1, 
        linecolor='black',
        annot=True, 
        fmt='0.2%',
        cbar_kws={'format': mtick.FuncFormatter(fmt)},
        ax=ax
    )
ax.set(title='Agency Share per Route Type');
_images/agencies_11_0.png