Trials and Data

Which data delivery methods do you offer? We offer a variety of data delivery methods including flat files, API, self-service Dashboard access, and custom reports. Flat files can be delivered using Amazon S3, AWS Data Exchange, Snowflake, or via a link containing a zipped version of your flat file. Our most popular delivery method is through an Amazon S3 bucket where we can deliver parquet or CSV files to our clients.

How can I access my Amazon S3 bucket? The first step to accessing your Amazon S3 bucket is to install the AWS Command Line Interface (AWS CLI) on your local machine. AWS’s documentation on the installation process can be found here. Once you have installed the AWS CLI, you will use the code below to access your bucket and your files:

aws configure

aws s3 ls s3://revelio-client-<client-name>/

To copy all files from your S3 bucket to the current working directory on your local machine, use the following code:

aws s3 cp s3://revelio-client-<client-name>/ ./ –recursive

To copy all files from a folder in your S3 bucket to the current working directory on your local machine, use the following code:

aws s3 cp s3://revelio-client-<client-name>/<folder-name>/ ./ –recursive

When is your data delivered? And how frequently is it updated? Clients will receive updated data from the previous month on the 15th of each month. For example, December data (including headcounts, inflows, outflows, etc) would all become available by January 15th.

If you would like your data to be updated more frequently, we also offer a daily data feed. Keep in mind that the daily data feed is not as comprehensive as the monthly updates.

When does your data start? Our employee sentiment data dates back to 2007. Workforce dynamics data dates back to 2008. Job postings data dates back to 2019, but employment records that predate 2019 are also available upon request.

What are your data sources? Our data is sourced from a variety of publicly accessible datasets including data from online professional profiles, online employee reviews, H-1B visa filings, job postings on aggregator sites and career pages, and WARN layoff notices.

How many companies do you cover? Our data covers all public and private companies which comes out to roughly 2.5 million companies globally.

Can you cover companies that are not in your sample file? Yes, if there are any specific companies you are interested in tracking that are not included in the standard trial or sample files, we can include them upon request.

How do you treat company subsidiaries? When a company acquires or merges with another company, we will include the subsidiary as a part of the parent company, even retroactively, before the acquisition took place. For example, we include all of Whole Foods employees as part of Amazon, during 2008-2016, even though Amazon only acquired Whole Foods in 2017. The reason for this decision is that we want to avoid seeing an artificial spike in headcount when an acquisition or spinoff occurs.

Additionally, we can provide point in time mapping, which provides a static view of a company at a given point in time. For example, with point in time mapping, we can provide metrics for Whole Foods in 2017 and differentiate between true inflows/outflows of employees due to the acquisition.


How do you compensate for some people not having online profiles? Because we collect our data from online professional profiles, we face an issue of data being drawn from a non representative sample of the underlying population. We impose sampling weights to adjust for roles and locations that are underrepresented in the sample. For example, if an engineer in the US has a 90% chance of having an online profile, we would consider every engineer in the US that we see to actually represent 1.1 people. If a nurse in Germany has a 25% chance of having an online profile, we would consider every nurse in Germany that we see to actually represent 4 people. This allows us to approximate, as closely as possible, the true estimate of the underlying population.

How are companies being mapped Company mapping is a three-step process. The first step matches each target company to a company in our universe using common identifiers, such as: company name, ticker, website, and/or ISIN. The second step matches subsidiary companies to their parent company using standardized company lists from sources such as Factset and Orb Intelligence. The third step gathers a set of company names found in our data, which all point to the same target company URL. We then match that company to its parent company name. The company mapping process allows us to identify all of the colloquial names for each requested company and all of its subsidiaries.

Why are the counts, inflows, and outflows columns decimals, rather than integers? Our data uses time-scaling and cross-sectional models to adjust for lags in reporting and sampling bias. The weights applied in these models produce non-integer values for counts, inflows, and outflows.

Is it true that the change in counts must be equal to inflows minus outflows? Yes

In what cases will your employee headcounts differ from employee headcounts in a company’s annual report? Our employee headcounts will often differ from a company’s 10-K as they omit contingent workers, which in many cases, make up the majority of a company’s workforce. Our reporting, however, includes all portions of a company’s workforce, such as contingent workers.

How is the Prestige score generated? We take publicly available university rankings to determine a base score for individuals from those universities. We then derive a score for companies, which is based on where those employees attended universities. The company scores and the university scores are used to iteratively generate the Prestige score, filling in information gaps until the algorithm converges. The model is constructed so that even if someone went to a low ranking school, but then went on to work at a prestigious firm, their ranking would still be high.

How is the Seniority metric generated? The seniority metric is created using an ensemble model. First, information about an individual’s current job, including their title, company, and industry, are used to generate an initial seniority score. Second, details about an individual’s job history, such as the duration of their previous employment and the seniority of previous positions, are taken into account to create a second seniority score. Finally, an individual’s age is used to generate a third seniority score. The scores from these models are averaged together to arrive at a continuous seniority metric for an individual.

To convert from this continuous seniority metric into an ordinal value, we gather samples of seniority predictions corresponding to recognizable keywords such as “junior”, “senior”, “director”, etc. and map the metric to the most likely bin. This allows us to attach meaning to the raw metric values, and to bin seniorities into discrete buckets.

How is the Gender/Ethnicity Entropy metric generated? The Gender/Ethnicity Entropy metric aims to ordinally rank the diversity of a company’s workforce. The metric uses a modified version of the Shannon Index to calculate diversity scores in relation to a company’s peers, while taking into account occupation and region. To calculate the final score, we assign the percentile to which the company of interest falls in relation to all other peer companies, given the region and occupation type. This provides a relative score that is easily interpretable.

The Gender/Ethnicity Entropy metric is designed on a linear scale of 1-10, where a score of 1 indicates poor diversity and a score of 10 indicates high diversity. The scale may also be interpreted as a percentile with respect to peers (i.e. 1 corresponds to a diversity score between the 0 and 10th percentile, 2 corresponds to a diversity score between the 10th and 20th percentile, and so forth). The metric is available at any level of granularity, and may also be tuned to have an increased sensitivity to particular groups of people. The metrics are reported independently for gender and ethnicity diversity.

We have taken measures to evaluate the accuracy of both of these models. We evaluate our Gender Model by comparing its predicted share of females to the share of females reported by pronouns in the “recommended” section of profiles (self-reported pronouns are not available on public profiles but can be visible in the text of recommendations). In doing so, we find that our Gender Model has an accuracy of 96.16%. Further, we validate our Ethnicity Model by comparing the shares of ethnic groups in each US Metropolitan Statistical Area (MSA) reported by official statistics to those predicted by our model. We find that our Ethnicity Model has an accuracy of 93.28%.

How are Sentiment scores generated? Our Sentiment Model uses Natural Language Processing to capture employee sentiment on specific topics across raw user reviews of companies. The model is built from a Transformer architecture and is trained on an entailment task that allows it to predict the probability that a given topic, phrase, or sentence follows the text of interest. The model is then generalized for our uses in a task known as Zero-Shot Classification, where we allow both positive and negative text to be matched to a predefined topic list. We assume that positive reviews classified to a given topic correspond to positive sentiment for that topic and negative reviews correspond to negative sentiment for that topic. For every review, we can then compute a weighted sentiment score based on how relevant a given topic was for the positive or negative portion of the review. To offset any negative bias in the reviews, we normalize the scores by assuming that they are normally distributed and report how many standard deviations away from the mean a given topic of a review is. These scores can then be averaged across any level of granularity in order to produce a final sentiment score.

How are attrition rates, hiring rates and growth rates calculated? The attrition rate \(a_g(t)\), and hiring rate \(h_g(t)\) are calculated at the particular granularity level \(g\) chosen and the month \(t\). In the formulae below, \(o_g(j)\), \(i_g(j)\) denote the outflows, inflows at the particular granularity level \(g\) in the month \(j\) and \(\bar{c}_g(t)\) denote the average head count at that granularity over the last year. Growth rate is the difference between hiring rate and attrition rate.

\[ \begin{align}\begin{aligned}a_g(t) = 100 \cdot \frac{\sum_{j=t-11}^{t} o_g(j)}{\bar{c}_g(t)}\\h_g(t) = 100 \cdot \frac{\sum_{j=t-11}^{t} i_g(j)}{\bar{c}_g(t)}\\\bar{c}_g(t) = \frac{1}{12} \sum_{j=t-11}^{t} c_g(j)\end{aligned}\end{align} \]

Known Issues and Updates

Known Issues

Skill Data Sparsity

Issue: Our profile data is combined from multiple sources which gather publicly available profiles. Around May 2021, user skills disappeared from the majority of public profiles. However, they are still visible on a minority of public profiles which we collect, but we do not see the most recently added skills for most existing users, and we do not see any skills for most new users.

Scope: This affects the individual level skill_file and the workforce dynamics skill_file. The workforce dynamics skill_file still tracks users’ observable skills across different positions.

Solution: We will continue to capture the skills when they appear on public profiles, and monitor for any changes in their availability. Additionally, we recently implemented a model which predicts missing skills that may be useful in filling the gaps in the individual level skill_file as well as improving the currency of the workforce dynamics skill_file.

Updated: June 9, 2022

Bug Fixes

Highest Degree

We fixed a bug where an individual’s reported highest degree was sometimes not their actual highest degree.

Updated: June 13, 2022


Company Mapping 2.0

We recently released a new company mapping model that maps company entities to Revelio Labs’ proprietary company universe, which is a big improvement over our previous model.

Updated: June 2, 2022

Please feel free to reach out directly with any questions: info@reveliolabs.com