Datasets¶

Workforce Dynamics¶

Download a sample of our Workforce Dynamics files here.

This dataset contains aggregated workforce statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. In that case, every row observes a particular company in a given month. If we include country as a level of aggregation, then each row of the dataset would correspond to a company, country, and month combination. The dataset at the company-country-month level can be aggregated to create the company-month dataset.

Let’s take a look at an example output where we have the levels of aggregation as company, country tracked across month and let count be the outcome of interest that represents the total headcounts for that particular level of aggregation, month combination (the count represents the headcount at the end of that particular month):

company	country	month	count
Company A	U.S.	2021-01	10
Company A	U.S.	2021-02	12
Company A	U.S.	2021-03	14
Company A	Canada	2021-01	10
Company A	Canada	2021-02	11
Company A	Canada	2021-03	9

This enables us to visualize the table as a graph as well, where the month can be represented along the X-axis, and the outcome count can be represented along the Y-axis. Thus, in this case (Company A, U.S.) and (Company A, Canada) can be viewed as entities for which the outcome count is tracked over time (month) on this graph.

Note that it’s easy to compute a broader level of aggregation from a narrower level of aggregation. To reduce our previous example to the company and month level, we can sum across the country column to get:

company	month	count
Company A	2021-01	20 (10+10)
Company A	2021-02	23 (12+11)
Company A	2021-03	23 (14+9)

Levels of Aggregation¶

We can construct the Workforce Dynamics file across different levels of aggregation, including combinations of the following:

Company (categorical): Revelio Labs’ delivery file can provide insights on more than 20 million companies globally. By default, all subsidiaries of a the company are included.
Rcid (categorical): Revelio Labs company ID
Region (categorical): Our broadest geographical level of aggregation is region. We classify locations into 15 distinct geographical regions:
- Northern America
- Central America
- Southern America
- Northern Europe
- Southern Europe
- Eastern Europe
- Western Europe
- Southern Asia
- South-Eastern Asia
- Eastern Asia
- Central and Western Asia
- Pacific Islands
- Arab States
- Northern Africa
- Sub-Saharan Africa
Country (categorical): The granularity also can be specified at the country level for 247 distinct countries.
State (categorical): The granularity can be specified at the state level, including international locations.
Metro_area (categorical): Our most narrow level of aggregation for geography is metro area. Employees may be included under a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.
Job Category (categorical): In addition to aggregating by geography, we can also aggregate by occupation or role. Our broadest role classification groups positions into the following 7 job categories:
- Admin
- Engineer
- Finance
- Marketing
- Operations
- Sales
- Scientist
The job role taxonomy is developed by our proprietary representation and clustering algorithms. We develop mathematical representations of each job title using the title itself, the text description of the position (from either individuals describing their own experiences or employers on a job posting), individuals’ skills, associates, and previous experience. Our clustering algorithm is in the family of hierarchical/agglomerative clustering algorithms. This means that we begin with every job title occupying its own cluster, then iteratively combine clusters based on a set of criteria. This allows for complete flexibility of the number of clusters. We update this taxonomy periodically to adjust to the changing occupational landscape. Please see our Methodology section for more details on our job taxonomy.
Role_kn (categorical): Aggregated position role with n discrete levels. We can provide roles at several levels of aggregation, including the following: role_k50, role_k150, role_k300, role_k500, role_k1500. For Workforce Dynamics, the most granular role classification we recommend is role_k150.
Seniority (ordinal): Seniority ranges from 1 to 7. 1 is the most junior, and 7 is the most senior (see the Methodology section for more details). Our seniority model predicts seniority based on the title, company, industry, age, previous seniority, and position history.
Gender (categorical): Gender is calculated as a probability based on the likelihood of the first name being male or female.
Ethnicity (categorical): Ethnicity is estimated based on the likelihood of both the first and last name as well as an individual’s location.
Month (categorical): The month and year of the position are provided in “YYYY-MM” format. Each Workforce Dynamics file contains monthly data up to the previous month’s end.

Outcomes¶

We can include the following outcomes as columns in the Workforce Dynamics file:

Count (float): The total number of employees for a specific level of aggregation for each month. Please note that these counts can be decimals (see our FAQ for more details).
Inflow/Outflow (float): The total inflow and outflow counts of employees at each level of aggregation for a given month
External Inflow/Outflow (float): Total inflow and outflow counts of employees at each level of aggregation for a given month, excluding internal movements within a company
Salary (float): Sum of estimated annual salaries of employees at each level of aggregation in a given month, in USD. We predict the salary for each position based on role, seniority, company, and country using a regression-based model. We train this model using over 200 million salaries from job postings and publicly available labor certification applications, and use country-level inflation rates to estimate the change in salary over time. We get an out-of-sample root mean squared error (RMSE) of 14%. The Salary column in Workforce Dynamics is the sum of salaries at a specific level of aggregation; please divide by the Count column to get the average salary of employees in that level.
Total_prestige (float): We can predict the average prestige level of employees at each level of aggregation in a given month. We calculate the prestige score of each position using world university rankings to set prior values for our base model, with information then being redistributed among all positions according to the changing networks created by worker inflows and outflows. The Total_prestige is the numerator of our prestige score.
Prestige_weight (float): Denominator of our prestige score. To calculate average prestige for a certain level of aggregation, please divide Total_prestige by Prestige_weight.
Duration (float): The average tenure of employees in the specified level of aggregation in years.

Please see the FAQ section for more information on outcomes and levels of aggregation in our Workforce Dynamics files.

Skill Dynamics¶

We can also provide a version of the Workforce Dynamics file with skills as a level of aggregation. The skill categories that can be included are:

Skill_k25
Skill_k50
Skill_k75

More information on these skill categories, and our Skills Taxonomy in general, is available in our Methodology section.

Individual employees (users) in our data are associated with sets of skills. The counts for each skill_k category in the Skill Dynamics file represent the number of distinct employees who have skills in that category, who are included in a specified level of aggregation each month. The inflow and outflow columns represent the number of employees with skills in each category who have entered or exited each level of aggregation each month.

Please note that as employees can have multiple skills, or may not report skills at all, the counts in the Skill Dynamics file may be different than the headcounts in the Workforce Dynamics file. This is especially true when aggregating across skill_k categories, as employees may be counted in more than one skill_k category.

Transitions¶

This dataset contains information on transitions into and out of a set of base companies.

The data consists of two files: Inflows and Outflows. Each row provides data on an individual transition, including the previous and new roles, location, seniority, and salary of individuals leaving or joining the company. The base company in the Inflows file is denoted by the ‘new’ prefix, while the base company in the Outflows file is denoted by ‘prev’.

Download a sample of our Transitions files here.

User_id (categorical): Revelio Labs user ID (unique to each user)

Prev_rcid (categorical): Revelio Labs company ID of previous company

Prev_position_id (categorical): Previous position ID

Prev_company (categorical): Previous company name

Prev_seniority (ordinal): Previous seniority level with 7 discrete levels

Prev_region (categorical): Previous region

Prev_country (categorical): Previous country

Prev_state (categorical): Previous state

Prev_metro_area (categorical): Previous metropolitan area

Prev_jobtitle (categorical): Previous job title

Prev_job_category (categorical): Aggregated previous position role with 7 discrete levels

Prev_role_k50 (categorical): Aggregated previous position role with 50 discrete levels

Prev_role_k150 (categorical): Aggregated previous position role with 150 discrete levels

Prev_enddate (time): End date of previous position

Prev_salary (float): Estimated annual salary of the previous role (in USD)

New_position_id (categorical): New position ID

New_rcid (categorical): Revelio Labs company ID of new company

New_company (categorical): New company name

New_seniority (ordinal): New seniority level with 7 discrete levels

New_region (categorical): New region

New_country (categorical): New country

New_state (categorical): New state

New_metro_area (categorical): New metropolitan area

New_jobtitle (categorical): New job title

New_job_category (categorical): Aggregated new position role with 7 discrete levels

New_role_k50 (categorical): Aggregated new position role with 50 discrete levels

New_role_k150 (categorical): Aggregated new position role with 150 discrete levels

New_startdate (time): Start date of new position

New_salary (float): Estimated annual salary of the new role (in USD)

Job Postings¶

Revelio Labs provides job postings data in two formats: Job Posting Dynamics (an aggregated time series of monthly job posting statistics), and Individual Job Postings. Our job postings data comes from several sources, including job posting aggregator sites and company websites. We can provide job postings either via COSMOS, our unified job posting dataset which has been standardized and deduplicated across our different postings sources, or separately by source. The coverage of the data is global.

Job Posting Dynamics¶

This dataset contains aggregated job posting statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. Each row would correspond to a company and month combination. For more information on the levels of aggregation, please refer to the Workforce Dynamics section.

Download a sample of our Job Posting Dynamics data here.

Rcid (categorical): Revelio Labs company ID

Company (categorical): Company name

Country (categorical): Country location of job posting

State (categorical): State location of job posting

Job_category (categorical): Aggregated posting role with 7 discrete levels

Role_k50 (categorical): Aggregated posting role with 50 discrete levels

Role_k150 (categorical): Aggregated posting role with 150 discrete levels

Month (categorical): The month and year provided in “YYYY-MM” format

Active_posting (float): Number of active postings during that month

New_posting (float): Number of new postings during that month

Removed_posting (float): Number of postings removed during that month:

Active_salary_avg (float): Average salary for active postings during that month

New_salary_avg (float): Average salary for new postings during that month

Removed_salary_avg (float): Average salary for postings that got removed during that month

Filling_time_avg (float): Average time to fill, in months

Expected_hires (float): The total number of hires expected for active postings in each level of aggregation and month (COSMOS only)

Individual Job Postings¶

Revelio Labs also provides data on individual job postings. These files contain posting-level information on current and historical job postings such as posting date, location, role, and salary.

Download a sample of our COSMOS Individual Job Postings data here.

Job_id (categorical): Posting key

Rcid (categorical): Revelio Labs company ID

Company (categorical): Company name

Rics_k50 (categorical): Industry of employer with 50 discrete categories (Revelio Labs mapped)

Rics_k200 (categorical): Industry of employer with 200 discrete categories (Revelio Labs mapped)

Rics_k400 (categorical): Industry of employer with 400 discrete categories (Revelio Labs mapped)

Title_raw (categorical): Position title (raw from posting)

Title_translated (categorical): Raw position title translated to English

Job_category (categorical): Aggregated position role with 7 discrete levels

Role_k50 (categorical): Aggregated position role with 50 discrete levels

Role_k150 (categorical): Aggregated position role with 150 discrete levels

Role_k1500 (categorical): Aggregated position role with 1500 discrete levels

State, country (categorical): Listed location for posting

Salary (float): Predicted salary for posting

Post_date (categorical): Date at which the job was posted

Remove_date (categorical): Date at which the job was removed. If null, it hasn’t been removed yet.

Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

Ultimate_parent_company_name (categorical): Name of the parent company

Remote_type (categorical): Type of remote work a job posting offers. If not specified, the job is categorized as “Fully in Office.”

Expected_hires (float): The expected number of hires for each job posting (COSMOS only)

Source_* (boolean): Indicator for whether a job posting was found in each data source (e.g. company websites, LinkedIn, Indeed, etc.) (COSMOS only)

Sentiment¶

Download a sample of our Sentiment data here.

Individual Reviews¶

Revelio Labs provides company review data with the following information. Note that not all rating fields are required to be filled out by the reviewer. Also, some ratings (ie., ‘culture and values’ and ‘diversity and inclusion’) were added more recently.

Rcid (categorical): Revelio Labs company ID

Company (categorical): Company name

Review_id (categorical): Review ID

Title_raw (categorical): Reviewer’s raw position title

Location_raw (categorical): Reviewer’s raw location

Region (categorical): Reviewer’s region

Country (categorical): Reviewer’s country

State (categorical): Reviewer’s state

Metro_area (categorical): Reviewer’s metropolitan area. Reviews may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

Review_language_id (categorical): Language of the review

Review_date_time (time): Posting date of the review

Review_iscovid19 (boolean): Indicates whether review mentions the Covid-19 pandemic

Reviewer_current_job (boolean): Indicates whether the reviewer is a current or former employee

Reviewer_employment_status (categorical): Reviewer’s employment type (freelance, part time, intern, contract, regular)

Reviewer_job_ending_year (integer): Final year of the reviewer’s employment with the company

Reviewer_length_of_employment (integer): Number of years that the reviewer worked at the company

Rating_overall (integer): Reviewer’s overall rating of the company (integer values from 1 to 5, with 5 being the best)

Rating_career_opportunities (float): Reviewer’s rating of the company’s career opportunities (from 1 to 5, with half-points awarded, and 5 being the best)

Rating_compensation_and_benefits (float): Reviewer’s rating of the company’s compensation and benefits (from 1 to 5, with half-points awarded, and 5 being the best)

Rating_culture_and_values (integer): Reviewer’s rating of the company’s culture and values (integer values from 1 to 5, with 5 being the best)

Rating_diversity_and_inclusion (integer): Reviewer’s rating of the company’s diversity and inclusion (integer values from 1 to 5, with 5 being the best)

Rating_senior_leadership (float): Reviewer’s rating of the company’s senior management (from 1 to 5, with half-points awarded, and 5 being the best)

Rating_work_life_balance (float): Reviewer’s rating of the company’s work-life balance (from 1 to 5, with half-points awarded, and 5 being the best)

Rating_business_outlook (categorical): Reviewer’s rating of the company’s business outlook (positive, negative, neutral)

Rating_ceo (categorical): Reviewer’s approval rating of the company’s CEO (approve, disapprove, no opinion)

Rating_recommend_to_friend (categorical): Indicates whether the reviewer would recommend the company to a friend (positive, negative)

Review_summary (string): Title of review

Review_pros (string): Reviewer’s positive comments about the company

Review_cons (string): Reviewer’s negative comments about the company

Review_count_helpful (integer): Number of users who found the review helpful

Review_count_not_helpful (integer): Number of users who found the review unhelpful

Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

Ultimate_parent_company_name (categorical): Name of the parent company

Sentiment Scores¶

This dataset contains employee sentiment scores that were generated using our sentiment model. This model uses Natural Language Processing to capture employee sentiment on specific topics such as management and diversity. For each review, we compute a weighted sentiment score based on how relevant a given topic was for the positive or negative portion of the review, assigning a positive (negative) score to topics that had an overall positive (negative) impact on the review. These scores are then aggregated to arrive at a company-wide sentiment score. Each row contains the sentiment scores for a given company.

Rcid (categorical): Revelio Labs company ID

Company (categorical): Company name

Management_sentiment (float): Management sentiment score

Innovative_technology_sentiment (float): Innovative technology sentiment score

Work_life_balance_sentiment (float): Work life balance sentiment score

Mentorship_sentiment (float): Mentorship sentiment score

Career_advancement_sentiment (float): Career advancement sentiment score

Diversity_and_inclusion_sentiment (float): Diversity and inclusion sentiment score

Coworkers_sentiment (float): Coworkers sentiment score

Compensation_sentiment (float): Compensation sentiment score

Culture_sentiment (float): Culture sentiment score

Company_and_division_size_sentiment (float): Company and division size sentiment score

Perks_and_benefits_sentiment (float): Perks and benefits sentiment score

Onboarding_sentiment (float): Onboarding sentiment score

Remote_work_sentiment (float): Remote work sentiment score

Num_reviews (integer): Number of reviews factored into the scores

Sentiment Trends¶

This dataset contains employee review data aggregated to the level of company, region, position, and month. Each row contains aggregated ratings for this level of granularity. All ratings span from 1 to 5, with 5 being the highest rating.

Rcid (categorical): Revelio Labs company ID

Company (categorical): Company name

Region (categorical): Region of the company (ex. Northern Europe, South-eastern Asia)

Job_category (categorical): Aggregated position role with 7 discrete levels (also available at other levels of aggregation)

Month (categorical): Month and year of the aggregated reviews, provided in “YYYY-MM” format

Rating_overall (float): Aggregated overall rating

Rating_career_opportunities (float): Aggregated career opportunities rating

Rating_compensation_and_benefits (float): Aggregated employee compensation and benefits rating

Rating_culture_and_values (float): Aggregated company culture and values rating

Rating_diversity_and_inclusion (float): Aggregated company diversity and inclusion rating

Rating_senior_leadership (float): Aggregated senior management rating

Rating_work_life_balance (float): Aggregated work-life balance rating

Rating_ceo (float): Aggregated CEO approval rating

Rating_recommend_to_friend (float): Aggregated “recommend to a friend” rating

Rating_business_outlook (float): Aggregated business outlook rating

Layoff Notices¶

Download a sample of our Layoff Notices data here.

We collect WARN layoff data, which details whenever a firm is planning to lay off a significant portion of its workforce. The WARN Act (Worker Adjustment and Retraining Notification) ensures that mass layoffs and plant closures are registered with states and the Department of Labor in advance to allow for the provision of compliance assistance materials to help workers and employers understand their rights and responsibilities.

We provide the WARN data at the notice level, where each row represents a layoff notice.

Rcid (categorical): Revelio Labs company ID

Company (categorical): Name of company registering layoff (Revelio Labs mapped)

State (categorical): State where layoff is occurring (Revelio Labs mapped)

City (categorical): City where layoff is occurring (Revelio Labs mapped)

Metro_area (categorical): Metro area where layoff is occurring (Revelio Labs mapped). Layoffs may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

Notice_date (categorical): Date of layoff notice

Layoff_date (categorical): Date as of which layoffs will be effective

Layoff_type (categorical): Type of layoff (permanent, temporary, etc.)

Num_employees (integer): Number of employees to be laid off

Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

Ultimate_parent_company_name (categorical): Name of the parent company

Individual Level Data¶

Revelio Labs also provides data on individual professional profiles. These files contain user-level information on current and historical positions, educational history, name, and demographic information.

Download a sample of our Individual data here.

User File¶

This file contains the individual level user data. Each row is an individual’s public profile.

User_id (categorical): Revelio Labs unique user ID

Firstname (categorical): First name (parsed from fullname)

Lastname (categorical): Last name (parsed from fullname)

Fullname (categorical): Name reported on online profile

F_prob (float): Probability of user being female

M_prob (float): Probability of user being male

Api_prob (float): Probability of user being Asian/Pacific Islander

Black_prob (float): Probability of user being Black or African American

Hispanic_prob (float): Probability of user being Hispanic or Latino

Multiple_prob (float): Probability of user being two or more races

Native_prob (float): Probability of user being American Indian or Alaskan Native

White_prob (float): Probability of user being Non-Hispanic White

Position File¶

This file contains the individual level position data. Each row is a position held by an individual.

Position_id (categorical): Revelio Labs position ID

User_id (categorical): Revelio Labs user ID (unique to each user)

Company_raw (categorical): Company name (raw from online profile)

Company_linkedin_url (categorical): URL for employer (from online profile)

Company_cleaned (categorical): Company name (from online profile, cleaned of special characters)

Location_raw (categorical): Location of position (raw from online profile; if missing, we infer it from user’s location reported in user file)

Region (categorical): Region of position (Ex. Southern Asia, Western Europe, imputed from raw location)

Country (categorical): Country of position (imputed from raw location)

State (categorical): State of position (imputed from raw location)

Metro_area (categorical): Metropolitan area of position (imputed from raw location). Positions may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

Startdate (categorical): Position start date if reported, null otherwise

Enddate (categorical): Position end date if reported, null otherwise

Title_raw (categorical): Position title (raw from online profile)

Role_k1500 (categorical): Aggregated position role with 1500 discrete levels (also available at other levels of aggregation)

Job_category (categorical): Aggregated position role with 7 discrete levels

Seniority (ordinal): Seniority level with 7 discrete levels

Salary (float): Modeled annual salary for the position (in USD)

Position_number (integer): Chronological order of a position in a user’s profile

Rcid (categorical): Revelio Labs company ID

Company_name (categorical): Company name (mapped)

Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

Ultimate_parent_company_name (categorical): Name of the parent company

Education File¶

This file contains the individual level education data. Each row is an educational record.

User_id (categorical): Revelio Labs user ID (unique to each user)

University_raw (categorical): School name (raw from online profile)

Startdate (categorical): Start date

Enddate (categorical): End date

Degree_raw (categorical): Degree title (raw from online profile)

Field_raw (categorical): Degree field (raw from online profile)

Skill File¶

This file contains the individual level skills data. Revelio Labs uses proprietary algorithms to cluster the skill universe into distinct clusters of skills. The clustering can be as coarse as 25 groups and as fine as over 20,000 groups. The default skill clustering is done at 50 groups.

User_id (categorical): Revelio Labs user ID (unique to each user)

Skill (categorical): Single skill from profile (raw from online profile)

Skill_mapped (categorical): Skill from profile (Revelio Labs mapped)

Skill_k75 (categorical): Aggregated skill with 75 discrete levels (also available at other levels of aggregation)

Company Reference¶

This file contains information on companies that are covered by the delivered data and is included with each delivery.

Download a sample of our Company Reference file here.

Rcid (categorical): Revelio Labs company ID

Company (categorical): Company name

Factset_entity_id (categorical): FactSet company ID

Year_founded (categorical): Year in which the company was founded

Ticker (categorical): Ticker of the company

Exchange_name (categorical): The stock exchange that the company is listed on

Sedol (categorical): SEDOL code

Isin (categorical): ISIN code

Cusip (categorical): CUSIP number

Url (categorical): Company’s website URL

Naics_code (categorical): Company’s NAICS industry code

Cik (categorical): CIK number

Lei (categorical): LEI code

Linkedin_url (categorical): Company LinkedIn URL

Child_rcid (categorical): Revelio Labs company ID of largest subsidiary company

Child_company (categorical): Company name of largest subsidiary company

Child_linkedin_url (categorical): Company LinkedIn URL of largest subsidiary company

Ultimate_parent_rcid (categorical): Revelio Labs company ID of ultimate parent company

Ultimate_parent_rcid_name (categorical): Company name of ultimate parent company