Trips¶
Downloads + Imports¶
Read and format data¶
%time trips = pd.read_csv(zipfile.open('trips.txt'))
trips.tail()
trips.apply(lambda x: x.unique().size, axis=0)
CPU times: user 218 ms, sys: 28.3 ms, total: 247 ms
Wall time: 246 ms
route_id 1269
service_id 1914
trip_id 220162
trip_headsign 2723
trip_short_name 437
direction_id 2
block_id 11075
shape_id 14735
wheelchair_accessible 2
bikes_allowed 2
dtype: int64
%time routes = pd.read_csv(zipfile.open('routes.txt'))
routes.tail()
trips = trips.join(routes[['route_id','route_type', 'route_short_name']].set_index('route_id'), on='route_id')
trips.head()
CPU times: user 5.51 ms, sys: 142 µs, total: 5.66 ms
Wall time: 4.82 ms
route_id | service_id | trip_id | trip_headsign | trip_short_name | direction_id | block_id | shape_id | wheelchair_accessible | bikes_allowed | route_type | route_short_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 20969_700 | 1 | 159960557 | Kremmen, Markt | NaN | 0 | NaN | 373 | 1.0 | NaN | 700 | 823 |
1 | 20969_700 | 1 | 159960556 | Kremmen, Markt | NaN | 0 | NaN | 373 | 1.0 | NaN | 700 | 823 |
2 | 20969_700 | 1 | 159960555 | Kremmen, Markt | NaN | 0 | NaN | 373 | 1.0 | NaN | 700 | 823 |
3 | 20969_700 | 1 | 159960554 | Kremmen, Markt | NaN | 0 | NaN | 373 | 1.0 | NaN | 700 | 823 |
4 | 20969_700 | 1 | 159960553 | Kremmen, Markt | NaN | 0 | NaN | 373 | 1.0 | NaN | 700 | 823 |
fig, ax = plt.subplots()
rename = {2: "Intercity Rail Service", 100: "Railway Service", 109: "Suburban Railway", 400: "Urban Railway Service", 700: "Bus Service", 900: "Tram Service", 1000: "Water Transport Service"}
trips['route_type'].replace(rename, inplace=True)
trips \
.groupby(['route_type']) \
.agg(n=('trip_id', 'count')) \
.reset_index() \
.sort_values('n', ascending=False) \
.assign(share= lambda x: x['n'] / x['n'].sum()) \
.pipe((sns.barplot, 'data'),
x='share',
y='route_type',
color=sns_c[2],
edgecolor=sns_c[2],
ax=ax
)
fmt = lambda y, _: f'{y :0.0%}'
ax.xaxis.set_major_formatter(mtick.FuncFormatter(fmt))
ax.set(
title='Share of Trips per Route Type',
xlabel='share of trips',
ylabel='route type'
);

Stations with most arriving trips¶
trips['trip_headsign'].value_counts().head()
S+U Zoologischer Garten 5410
S+U Rathaus Spandau 3780
S Schöneweide/Sterndamm 2977
S+U Warschauer Str. 2432
U Walther-Schreiber-Platz 2416
Name: trip_headsign, dtype: int64
trips_cleaned = trips.groupby(['trip_headsign', 'route_type']).size().reset_index(name="count")
trips_cleaned['max'] = trips_cleaned.groupby('trip_headsign')['count'].transform('sum')
trips_cleaned = trips_cleaned.sort_values(["max",'trip_headsign',"count"], ascending=False).drop('max', axis=1)
fig, ax = plt.subplots(figsize=(15, 6))
trips_cleaned.head(40).pipe(
(sns.barplot, 'data'),
y='count',
x='trip_headsign',
hue='route_type',
edgecolor='black',
dodge=True,
ax=ax
)
ax.tick_params(axis='x', labelrotation=90)
ax.set(title='number of trips per headsign destination', xlabel='destination', ylabel='number of stops');

Stations with most arriving routes¶
trips.drop_duplicates(subset=['route_id', 'trip_headsign'])['trip_headsign'].value_counts().head()
Cottbus, Hauptbahnhof 44
S Potsdam Hauptbahnhof 32
Pritzwalk, Bahnhof 25
Flughafen BER - Terminal 1-2 23
S Bernau Bhf 22
Name: trip_headsign, dtype: int64
rtrips_cleaned = trips.drop_duplicates(subset=['route_id', 'trip_headsign'])
rtrips_cleaned = rtrips_cleaned.groupby(['trip_headsign', 'route_type']).size().reset_index(name="count")
rtrips_cleaned['max'] = rtrips_cleaned.groupby('trip_headsign')['count'].transform('sum')
rtrips_cleaned = rtrips_cleaned.sort_values(["max",'trip_headsign',"count"], ascending=False).drop('max', axis=1)
fig, ax = plt.subplots(figsize=(15, 6))
rtrips_cleaned.head(40).pipe(
(sns.barplot, 'data'),
y='count',
x='trip_headsign',
hue='route_type',
edgecolor='black',
dodge=True,
ax=ax
)
ax.tick_params(axis='x', labelrotation=90)
ax.set(title='number of routes per headsign destination', xlabel='destination', ylabel='number of routes');
