Metrics¶
In our white paper (PDF, released June 2015), which had originally been focused on impact measurement, we were surprised to learn how important quantitative metrics were. More importantly, we came to understand how newsrooms were getting their numbers from a nearly infinite range of sources. Many newsrooms like NPR have even developed metrics that best reflect their unique understanding of their audience. A principal challenge in building NewsLynx was determining how best to collect, store, analayze, and present metrics in a way that allowed Newsrooms to focus on the metrics that mattered most to them.
The system we devised adheres to the following standards:
- While NewsLynx is an analytics platform, it is not a service for tracking events on sites. NewsLynx exists in between analytics providers and their audiences as a tool which enables the curation and customization of metrics.
- For the reasons above, we do not store timeseries data more granular than hour. While this is configurable by setting
metrics_min_date_unit
andmetrics_min_date_value
in yourconfig.yaml
, it’s not encouraged.
- Metrics should be stored in the their rawest state but should be automatically summarized and compared. The process from ingestion => summary => comparison should be both invisible to the user and effortlessly undone.
- While we should make no assumptions about the data source a user prefers, we should not have to constantly compromise the above standards to accomodate one source. With that in mind, we should have a schema for Metrics which, while rigid, is flexible enough to handle most use cases. In the cases which this schema does not prove adequate, we should consider whether this particular use case is general enough before modifying the schema.
Creating Metrics¶
As described in the Sous Chef docs, Metrics are exclusively defined by the Sous Chefs which create them. All Sous Chefs that create metrics must have the following section in their configuration:
metrics:
metric_name:
**options
anoter_metric_name:
**options
...
Options¶
All metrics have access to the following options:
display_name
- How should this metric be displated?
description
- An informative explanation of what this metric represents.
type
- Helps in determining how metric should be interpreted / summarized / presented.
- Can be one of:
count
- A number of something,
- e.g. pageviews, time on page, attention minutes, etc.
count
metrics are also fill-ins for Binary or Boolean fields. Simply store these values as0
or1
. If you’d like to summarize them as Booleans, set theagg
tomax
.- Metrics with this type will be assumed to have an
agg
ofsum
unless overridden.
cumulative
- A count that increases over time.
- This is a special type of Metric in that we should like to capture its differences over time periods.
However, we also want to avoid alteration of the original source of the data. As a result, a
cumulative
metric is stored as cumulative sum, but when queried, is transformed into acount
. - e.g. twitter shares, facebook comments, followers, etc.
- Metrics with this type will be assumed to have an
agg
ofsum
unless overridden.
median
- The median of a list of metrics.
- e.g. median time on page.
- Metrics with this type will be assumed to have an
agg
ofmedian
unless overridden.
average
- The average of a list of metrics.
- e.g. average time on page.
- Metrics with this type will be assumed to have an
agg
ofaverage
unless overridden.
percentile
- Usually a number beween 0 - 100.
- e.g. percent internal traffic
- Metrics with this type will be assumed to have an
agg
ofavg
unless overridden.
min_rank
- A number which should be interpreted as “a lower number is good.”
- e.g. position on homepage.
- Metrics with this type will be assumed to have an
agg
ofmin
unless overridden.
max_rank
- A number which should be interpreted as “a higher number is good.”
- e.g. position on homepage.
- Metrics with this type will be assumed to have an
agg
ofmax
unless overridden.
agg
- The function to use when aggregating this metric.
- In practice, these map directly onto
postgres
functions. - Can be one of (for now):
sum
avg
median
max
min
content_levels
- This field lets us know that the metric is related to content items and should be stored at the specified level. For more on what this means see metrics-how-does-this-work
- Can be one of:
timeseries
- Accessible via the Content Timeseries API.
summary
- Accessible via the Content Search API endpoints-content-items-search and when retrieving individual Content Items.
comparison
- Accessible via endpoints-content-metrics-get-comparisons
org_levels
- This field lets us know that the metric is linked to an organization as a whole. For more on what this means see metrics-how-does-this-work
- Can be one of:
timeseries
- Accessible via endpoints-org-metrics-get-timeseries
summary
- Accessible via endpoints-org-metrics-get-summary
For instance, the Sous Chef google-analytics-to-content-timeseries
lists this metric configuration:
metrics:
ga_pageviews:
display_name: Pageviews
description: |
The number of times this page was opened,
as reported by Google Analytics.
type: count
content_levels:
- timeseries
- summary
- comparison
org_levels:
- timeseries
- summary
ga_total_time_on_page:
display_name: Total Time on Page
description: |
The total time visitors spent on this page,
as reported by Google Analytics.
type: count
content_levels:
- timeseries
- summary
- comparison
org_levels:
- timeseries
- summary
ga_avg_time_on_page:
display_name: Average Time on Page
type: computed
content_levels:
- timeseries
- summary
- comparison
org_levels:
- timeseries
- summary
formula: '{ga_total_time_on_page} / NULLIF({ga_pageviews}, 0)'
agg: avg