Routes¶
Downloads + Imports¶
%run "setup.ipynb"
Read and format data¶
%time routes = pd.read_csv(zipfile.open('routes.txt'))
routes.tail()
routes.apply(lambda x: x.unique().size, axis=0)
CPU times: user 5.64 ms, sys: 141 µs, total: 5.78 ms
Wall time: 5.25 ms
route_id 1269
agency_id 34
route_short_name 912
route_long_name 2
route_type 8
route_color 4
route_text_color 2
route_desc 2
dtype: int64
routes.head()
route_id | agency_id | route_short_name | route_long_name | route_type | route_color | route_text_color | route_desc | |
---|---|---|---|---|---|---|---|---|
0 | 20969_700 | 32 | 823 | NaN | 700 | NaN | NaN | NaN |
1 | 15068_3 | 32 | 848 | NaN | 3 | NaN | NaN | NaN |
2 | 15068_700 | 32 | 848 | NaN | 700 | NaN | NaN | NaN |
3 | 14755_3 | 32 | 834 | NaN | 3 | NaN | NaN | NaN |
4 | 14755_700 | 32 | 834 | NaN | 700 | NaN | NaN | NaN |
Routes per route type¶
fig, ax = plt.subplots()
rename = {2: "Intercity Rail Service", 100: "Railway Service", 109: "Suburban Railway", 400: "Urban Railway Service", 700: "Bus Service", 900: "Tram Service", 1000: "Water Transport Service"}
routes['route_type'].replace(rename, inplace=True)
routes \
.groupby(['route_type']) \
.agg(n=('route_id', 'count')) \
.reset_index() \
.sort_values('n', ascending=False) \
.assign(share= lambda x: x['n'] / x['n'].sum()) \
.pipe((sns.barplot, 'data'),
x='share',
y='route_type',
color=sns_c[2],
edgecolor=sns_c[2],
ax=ax
)
fmt = lambda y, _: f'{y :0.0%}'
ax.xaxis.set_major_formatter(mtick.FuncFormatter(fmt))
ax.set(
title='Share of Routes per Route Type',
xlabel='share of routes',
ylabel='route type'
);
