FAQs¶

Trials and Data¶

Which data delivery methods do you offer? We offer a variety of data delivery methods, including flat files, self-service Dashboard access, and custom reports. Flat files can be delivered using Amazon S3, Snowflake, GCS, or via a link containing a zipped version of your flat file. Our most popular delivery method is through an Amazon S3 bucket where we can deliver parquet or CSV files to our clients.

How can I access my Amazon S3 bucket? The first step to accessing your Amazon S3 bucket is to install the AWS Command Line Interface (AWS CLI) on your local machine. AWS’s documentation on the installation process can be found here. Once you have installed the AWS CLI, you will use the code below to access your bucket and your files:

$ aws configure
$ aws s3 ls s3://revelio-client-<client-name>/
# To sync all files from your S3 bucket to the current working directory on your local machine, use the following code:
$ aws s3 sync s3://revelio-client-<client-name>/ ./
# To sync all files from a folder in your S3 bucket to the current working directory on your local machine, use the following code:
$ aws s3 sync s3://revelio-client-<client-name>/<folder-name>/ ./

When is your data delivered? And how frequently is it updated? Clients will receive updated data from the previous month on the 15th of each month. For example, December data (including headcounts, inflows, outflows, etc.) would become available by January 15th.

If you would like your data to be updated more frequently, we also offer weekly or daily data delivery for select feeds.

When does your data start? Our workforce dynamics and employee sentiment data starts in 2008. The start date of our job postings data varies by source, with the earliest starting in 2016. Our unified job postings data, COSMOS, starts in 2021.

What are your data sources? Our data comes from a variety of publicly accessible online data sources, including online professional profiles, online employee reviews, job postings from company career pages and aggregator sites, and WARN layoff notices.

How many companies and positions do you cover? Our data covers more than 20 million public and private companies globally, and it includes over 400 million active positions.

Can you cover companies that are not in your sample file? Yes, if there are any specific companies you are interested in tracking that are not included in the standard trial or sample files, we can include them upon request.

How do you treat company subsidiaries? When a company acquires or merges with another company, we will include the subsidiary as a part of the parent company, even retroactively, before the acquisition took place. For example, we include employee headcounts of Whole Foods as part of Amazon’s headcounts from 2008 to 2016, even though Amazon only acquired Whole Foods in 2017. We treat subsidiaries in this way because we want to avoid seeing an artificial spike in headcount when an acquisition or spinoff occurs. However, we can break out data for companies’ subsidiaries upon request.

Can you provide data on any additional granularities and outcomes? Yes, we can provide data feeds that include custom granularities or mappings.

Modeling¶

How do you compensate for some people not having online profiles? Because we collect our data from online professional profiles, we face an issue of data being drawn from a non-representative sample of the underlying population. We impose sampling weights to adjust for roles and locations that are underrepresented in the sample. For example, if an engineer in the US has a 90% chance of having an online profile, we would consider every engineer in the US that we see to actually represent 1.1 people. If a nurse in Germany has a 25% chance of having an online profile, we would consider every nurse in Germany that we see to actually represent 4 people. This allows us to approximate, as closely as possible, the true estimate of the underlying population.

How are companies being mapped? Company mapping to Revelio Labs’ proprietary company universe is achieved by utilizing company identifiers such as company name, ticker, or website. Each company in the Revelio Labs universe has a RCID (Revelio Labs Company Identifier) associated with it. We assign weights to the different identifiers given to us and map them to our internal RCID universe. Then, we give each potential pairing a probability score, with 1 being a definite match and 0 being a definite mismatch. Finally, we choose the highest scored pairing as our match.

What is the relationship between count, inflow, and outflow in the Workforce Dynamics dataset? To recreate the count metric for a given granularity g at time t, use the following formula:

\[count_g(t) = count_g(t-1)+ inflow_g(t) - outflow_g(t-1)\]

Why are the counts, inflows, and outflows columns decimals, rather than integers? Our data uses time-scaling and cross-sectional models to adjust for lags in reporting and sampling bias. The weights applied in these models produce non-integer values for counts, inflows, and outflows.

Is it true that the change in counts must be equal to inflows minus outflows? Yes.

In what cases will your employee headcounts differ from employee headcounts in a company’s annual report? Our employee headcounts will often differ from a company’s 10-K as they omit contingent workers, which in many cases, make up the majority of a company’s workforce. Our reporting, however, includes all portions of a company’s workforce, such as contingent workers.

How is the Prestige score generated? We take publicly available university rankings to determine a base score for individuals from those universities. We then derive a score for companies, which is based on where those employees attended universities. The company scores and the university scores are used to iteratively generate the Prestige score, filling in information gaps until the algorithm converges. The model is constructed so that even if someone went to a low ranking school, but then went on to work at a prestigious firm, their ranking would still be high.

What kinds of positions fall into each Seniority level? Our Seniority Model assigns seniority scores to positions in one of seven levels. These levels are:

Entry level / Intern (Ex. Accounting Intern, Software Engineer Trainee, Paralegal)
Junior Level (Ex. Account Receivable Bookkeeper, Junior Software QA Engineer, Legal Adviser)
Associate/Analyst Level (Ex. Senior Tax Accountant; Lead Electrical Engineer; Attorney)
Manager Level (Ex. Account Manager; Superintendent Engineer; Lead Lawyer)
Director Level (Ex. Chief of Accountants; VP Network Engineering; Head of Legal)
Executive Level (Ex. Managing Director, Treasury; Director of Engineering, Backend Systems; Attorney, Partner)
Senior Executive Level (Ex. CFO; COO; CEO)

The example job titles above are titles that we can expect at each Seniority level. However, depending on the specific characteristics of the company and position, these titles could also appear at slightly higher or lower levels.

How are Sentiment scores generated? Our Sentiment Model uses Natural Language Processing to capture employee sentiment on specific topics across raw user reviews of companies. The model is built from a Transformer architecture and is trained on an entailment task that allows it to predict the probability that a given topic, phrase, or sentence follows the text of interest. The model is then generalized for our uses in a task known as Zero-Shot Classification, where we allow both positive and negative text to be matched to a predefined topic list. We assume that positive reviews classified to a given topic correspond to positive sentiment for that topic and negative reviews correspond to negative sentiment for that topic. For every review, we can then compute a weighted sentiment score based on how relevant a given topic was for the positive or negative portion of the review. To offset any negative bias in the reviews, we normalize the scores by assuming that they are normally distributed and report how many standard deviations away from the mean a given topic of a review is. These scores can then be averaged across any level of granularity in order to produce a final sentiment score.

How are attrition rates, hiring rates and growth rates calculated? The attrition rate \(a_g(t)\), and hiring rate \(h_g(t)\) are calculated at the particular granularity level \(g\) chosen and the month \(t\). In the formulas below, \(o_g(j)\), \(i_g(j)\) denote the outflows, inflows at the particular granularity level \(g\) in the month \(j\) and \(\bar{c}_g(t)\) denote the average head count at that granularity over the last year. Growth rate is the difference between hiring rate and attrition rate.

\[ \begin{align}\begin{aligned}a_g(t) = 100 \cdot \frac{\sum_{j=t-11}^{t} o_g(j)}{\bar{c}_g(t)}\\h_g(t) = 100 \cdot \frac{\sum_{j=t-11}^{t} i_g(j)}{\bar{c}_g(t)}\\\bar{c}_g(t) = \frac{1}{12} \sum_{j=t-11}^{t} c_g(j)\end{aligned}\end{align} \]

In other words, the attrition rate is the 12-month moving sum of outflows divided by the 12-month moving average of headcount, while the hiring rate is the 12-month moving sum of inflows divided by the 12-month moving average of headcount.

How can I calculate average prestige from my workforce dynamics file? Average prestige can be calculated with the total_prestige field and the prestige_weight field using the sample code below.

select
    company,
    month,
    sum(total_prestige)/sum(prestige_weight) as avg_prestige
from wf_table
group by company, month;

Which languages can you translate for job titles and descriptions? We have the capability to translate job titles and descriptions from all languages using industry standard translation software. Further, for seven languages – Spanish, French, Portuguese, German, Italian, Dutch, and Chinese – we have developed in-house models (built using FastText) that translate text with even higher precision than the standard software.

How can I recreate the plots I see on the Dashboard using the data from my data feed? You can recreate key metrics such as hiring rate, attrition rate, and salary from the data in your data feed with the following sample code:

Hiring rate:

with step1 as (
select
    company,
    month,
    sum(external_inflow) as inflow_sum,
    sum(count) as count_sum
    from wf_table
group by company, month
)
select
    company,
    month,
    sum(inflow_sum) over(partition by company order by month rows between 11 preceding and current row) as inflow_rolling_sum,
    avg(count_sum) over(partition by company order by month rows between 11 preceding and current row) as count_rolling_avg,
    inflow_rolling_sum/count_rolling_avg as hiring_rate
from step1;

Attrition rate:

with step1 as (
select
    company,
    month,
    sum(external_outflow) as outflow_sum,
    sum(count) as count_sum
    from wf_table
group by company, month
)
select
    company,
    month,
    sum(outflow_sum) over(partition by company order by month rows between 11 preceding and current row) as outflow_rolling_sum,
    avg(count_sum) over(partition by company order by month rows between 11 preceding and current row) as count_rolling_avg,
    outflow_rolling_sum/count_rolling_avg as attrition_rate
from step1;

Salary:

select
    company,
    month,
    sum(salary)/sum(count) as salary
from wf_table
group by company, month;

Why do my plots look different than those on the Dashboard? The data on the Dashboard may differ from the data in your data feed due to differing model versions as the data on the Dashboard reflects our latest available models. More information on the methodology for each model can be found in Methodologies.

Questions? Please feel free to reach out directly: info@reveliolabs.com