Routes

Downloads + Imports

%run "setup.ipynb"
CPU times: user 211 ms, sys: 176 ms, total: 387 ms
Wall time: 4.01 s
Loading BokehJS ...

Read and format data

%time routes = pd.read_csv(zipfile.open('routes.txt'))
routes.tail()
routes.apply(lambda x: x.unique().size, axis=0)
CPU times: user 5.64 ms, sys: 141 µs, total: 5.78 ms
Wall time: 5.25 ms
route_id            1269
agency_id             34
route_short_name     912
route_long_name        2
route_type             8
route_color            4
route_text_color       2
route_desc             2
dtype: int64
routes.head()
route_id agency_id route_short_name route_long_name route_type route_color route_text_color route_desc
0 20969_700 32 823 NaN 700 NaN NaN NaN
1 15068_3 32 848 NaN 3 NaN NaN NaN
2 15068_700 32 848 NaN 700 NaN NaN NaN
3 14755_3 32 834 NaN 3 NaN NaN NaN
4 14755_700 32 834 NaN 700 NaN NaN NaN

Routes per route type

fig, ax = plt.subplots()
rename = {2: "Intercity Rail Service", 100: "Railway Service", 109: "Suburban Railway", 400: "Urban Railway Service", 700: "Bus Service", 900: "Tram Service", 1000: "Water Transport Service"}
routes['route_type'].replace(rename, inplace=True)
routes \
    .groupby(['route_type']) \
    .agg(n=('route_id', 'count')) \
    .reset_index() \
    .sort_values('n', ascending=False) \
    .assign(share= lambda x: x['n'] / x['n'].sum()) \
    .pipe((sns.barplot, 'data'), 
        x='share', 
        y='route_type',
        color=sns_c[2],
        edgecolor=sns_c[2],
        ax=ax
    )
fmt = lambda y, _: f'{y :0.0%}'
ax.xaxis.set_major_formatter(mtick.FuncFormatter(fmt))
ax.set(
    title='Share of Routes per Route Type', 
    xlabel='share of routes', 
    ylabel='route type'
);
_images/routes_7_0.png