Datasets

Workforce Dynamics

This dataset contains aggregated workforce statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. In that case, every row observes a particular company in a given month. If we include country as a level of aggregation, then each row of the dataset would correspond to a company, country, and month combination. The dataset at the company-country-month level can be aggregated to create the company-month dataset.

Let’s take a look at an example output where we have the levels of aggregation as company, country tracked across month and let count be the outcome of interest that represents the total headcounts for that particular level of aggregation, month combination (the count represents the headcount at the end of that particular month):

company

country

month

count

Company A

U.S.

2021-01

10

Company A

U.S.

2021-02

12

Company A

U.S.

2021-03

14

Company A

Canada

2021-01

10

Company A

Canada

2021-02

11

Company A

Canada

2021-03

9

This enables us to visualize the table as a graph as well, where the month can be represented along the X-axis, and the outcome count can be represented along the Y-axis. Thus, in this case (Company A, U.S.) and (Company A, Canada) can be viewed as entities for which the outcome count is tracked over time (month) on this graph.

Note that it’s easy to compute a broader level of aggregation from a narrower level of aggregation. To reduce our previous example to the company and month level, we can sum across the country column to get:

company

month

count

Company A

2021-01

20 (10+10)

Company A

2021-02

23 (12+11)

Company A

2021-03

23 (14+9)

  • Count (float): The total number of employees for a specific level of granularity for each month.

  • Inflow/Outflow (float): We also estimate the total inflow (people joining) and outflow (people leaving) in a given month.

  • Salary (float): We predict the salary for each position based on role, seniority, company, and country using a regression-based model. We train this model using over 200 million salaries from job postings and publicly available labor certification applications, and use country-level inflation rates to estimate the change in salary over time. We get an out-of-sample root mean squared error (RMSE) of 14%. The Salary column in long_file shows the sum of salaries of employees in the particular granularity level.

  • Month (time): The month and year of the position are provided in “YYYY-MM” format. Each deliverable file contains monthly data up to the previous month’s end.

  • Company (categorical): RL delivery file can provide insights on all public (and many private) companies. By default, companies are defined at the holding company level, where all subsidiaries held by the top parent company are included. The list of parent companies covered by Revelio Labs include those mapped by FactSet Research Systems Inc., in addition to manually defined companies at the client’s request.

  • Region (categorical): The most coarse geographical granularity can be defined at region level. The 15 region names are as follows:

    • Arab States

    • Northern Africa

    • South-Eastern Asia

    • Central America

    • Northern America

    • Southern Asia

    • Central and Western Asia

    • Northern Europe

    • Southern Europe

    • Eastern Asia

    • Pacific Islands

    • Sub-Saharan Africa

    • Eastern Europe

    • South America

    • Western Europe

  • Country (categorical): The granularity can be specified at the country level for 232 distinct countries.

  • State (categorical): For US and US territories, the granularity can be specified at the state level. This level includes 50 states and 9 territories (American Samoa, Guam, Northern Mariana Islands, Puerto Rico, Virgin Islands, Minor Outlying Islands, Micronesia, Marshall Islands, Palau).

  • MSA (categorical): For US states and US territories, the most granular geography is the Metropolitan Statistical Area (MSA).

  • Job Category (categorical): In addition to geographical granularities, role-level granularities can also be specified. The most basic job category classification groups roles into the following 7 groups:

    • Admin

    • Engineer

    • Finance

    • Marketing

    • Operations

    • Sales

    • Scientist

    The job role taxonomy is developed by our proprietary representation and clustering algorithms. We develop mathematical representations of each job title using the title itself, the text description of the position (from either individuals describing their own experiences or employers on a job posting), individuals’ skills, associates, and previous experience. Our clustering algorithm is in the family of hierarchical/agglomerative clustering algorithms. This means that we begin with every job title occupying its own cluster, then iteratively combine clusters based on a set of criteria. This allows for complete flexibility of the number of clusters. We update this taxonomy periodically to adjust to the changing occupational landscape.

  • Role_kn (categorical): Aggregated position role with n discrete levels. We can provide roles at several levels of aggregation, including the following: role_k50, role_k150, role_k300, role_k500, role_k1250.

  • Seniority (ordinal): Seniority ranges from 1 to 7. 1 is the most junior, and 7 is the most senior. Our seniority model predicts seniority based on the title, company, industry, age, previous seniority, and position history.

Statistics that can be included are as follows:

  • Levels of Aggregation:
    • Month (time): The month and year of the position, provided in “YYYY-MM” format

    • Region (categorical): The most coarse geographical granularity with 16 discrete levels

    • Country (categorical): 232 different countries

    • State (categorical): For US and US territories, state level location

    • MSA (categorical): For US states and US territories, metropolitan statistical area

    • Job_category (categorical): Aggregated position role with 7 discrete levels

    • Seniority (ordinal): Seniority level with 7 discrete levels

    • Gender (categorical): Gender is calculated as a probability based on the likelihood of the first name being male or female

    • Ethnicity (categorical): Ethnicity is estimated based on the likelihood of both the first and last name as well as location

  • Outcomes:
    • Count (float): Headcount for a given month

    • Inflow/Outflow (float): Total inflow and outflow of employees at each granularity level

    • Salary (float): Sum of Estimated salaries in the particular granularity level

    • Prestige (float): Average prestige score of the specified granularity level

    • Duration (float): Average tenure of employees at a given granularity level in years

    • Hiring (float): Sum of inflows at a given level of granularity over the last year divided by the average counts at that granularity over the last year

    • Attrition (float): Sum of outflows at a given level of granularity over the last year divided by the average counts at that granularity over the last year

See the FAQ section for additional outcomes and granularities.

Transitions

  • User_id: Revelio Labs user id

  • Prev_position_id Unique position id of previous job

  • Prev_rcid: Revelio Labs company ID of previous company

  • Prev_company: Previous company name

  • Prev_region: Previous region

  • Prev_jobtitle: Previous job title

  • Prev_job_category: Aggregated previous position role with 7 discrete levels

  • Prev_role_k50: Aggregated previous position role with 50 discrete levels

  • Prev_role_k150: Aggregated previous position role with 150 discrete levels

  • Prev_enddate: End date of previous position

  • Prev_seniority: Previous seniority level with 7 discrete levels

  • Prev_salary: Estimated salary of the previous role

  • new_position_id Unique position id of new job

  • New_rcid: Revelio company ID of new company

  • New_company: New company name

  • New_region: New region

  • New_jobtitle: New job title

  • New_job_category: Aggregated new position role with 7 discrete levels

  • New_role_k50: Aggregated new position role with 50 discrete levels

  • New_role_k150: Aggregated new position role with 150 discrete levels

  • New_enddate: New date of previous position

  • New_seniority: New seniority level with 7 discrete levels

  • New_salary: Estimated salary of the new role

Job Posting Dynamics

This dataset contains aggregated job posting statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. Each row would correspond to a company and month combination. For more information on the levels of aggregation, please refer to the Workforce Dynamics section.

  • Rcid: Revelio Labs company ID

  • Company: Company name

  • State: Location of job posting

  • Job_category: Aggregated posting role with 7 discrete levels

  • Role_k150: Aggregated posting role with 150 discrete levels

  • Role_k50: Aggregated posting role with 50 discrete levels

  • Granularity_id: Revelio Labs internal ID

  • Month: Month granularity

  • Active_posting: Number of active postings during that month

  • New_posting: Number of new postings during that month

  • Removed_posting: Number of postings removed during that month:

  • Active_salary_avg: Average salary for active postings during that month

  • New_salary_avg: average salary for new postings during that month

  • Removed_salary_avg: Average salary for postings that got removed during that month

  • Filling_time_avg: Average time to fill

Individual Job Postings

RL also provides individual level job postings data.

  • Job_id: Posting key

  • Rcid: Revelio Labs company ID

  • Company: Name of the company

  • Company_cleaned: Standardized company name

  • Post_date: Date at which the job was posted

  • Remove date: Date at which the job was removed. If null, it hasn’t been removed yet.

  • Title: Raw job title

  • Title_cleaned: Standardized title

  • Role_k150: Aggregated position role with 150 discrete levels

  • Role_k50: Aggregated position role with 50 discrete levels

  • Job_category: Aggregated position role with 7 discrete levels

  • Status: Discrete posting status includes: open, closed, expired and pending close

  • Salary: Salary information from the posting.

  • Location, city, state, state_long, zip, county, latitude, longitude: Listed location for posting

  • State: State of the posting

  • Industry: Listed industry of posting

  • Industry_cleaned: Standardized listed industry

Employee Sentiment

RL provides Company review data with the following information. Note that not all rating fields are required to be filled out by the reviewer. Also, some ratings (ie. ‘culture and values’ and ‘diversity and inclusion’) were added more recently.

  • Review_id: Review key

  • Review_language_id: Indicates the language of the review. Most reviews are automatically translated to English. However, some remain in their native language.

  • Rcid: Revelio company ID

  • Company: Name of the company

  • Region (categorical): The most coarse geographical granularity with 16 discrete levels

  • Country (categorical): 232 different countries

  • State (categorical): For US and US territories, state level location

  • MSA (categorical): For US states and US territories, metropolitan statistical area

  • Job_title_raw: Raw position title of the reviewer

  • Review_date_time: Time when review was posted

  • Review_iscovid19: Indicates whether review mentions the Covid-19 pandemic

  • Reviewer_employment_status: Indicates employment type of the reviewer (freelance, part time, intern, contract, regular)

  • Reviewer_job_ending_year: Final year of the reviewer’s employment with the company

  • Reviewer_length_of_employment: Number of years the reviewer worked at the company

  • Reviewer_current_job: Indicates whether the reviewer is a current or former employee

  • Rating_overall: Overall rating of company (integer values from 1 to 5, with 5 being the best)

  • Rating_business_outlook: Business outlook rating (positive, negative, neutral)

  • Rating_career_opportunities: Rating of career opportunities (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_ceo: Approval rating of the CEO (approve, disapprove, no opinion)

  • Rating_compensation_and_benefits: Rating of employee compensation and benefits (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_culture_and_values: Rating of company culture and values (integer values from 1 to 5, with 5 being the best)

  • Rating_diversity_and_inclusion: Rating of company diversity and inclusion (integer values from 1 to 5, with 5 being the best)

  • Rating_recommend_to_friend: Indicates whether the reviewer would recommend the company to a friend (positive, negative)

  • Rating_senior_leadership: Rating of senior management (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_work_life_balance: Rating of work-life balance (from 1 to 5, with half-points awarded, and 5 being the best)

  • Review_summary: Title of review

  • Review_advice: Reviewer’s advice to management

  • Review_pros: Positive review of company

  • Review_cons: Negative review of company

  • Review_count_helpful: Number of users who found the review helpful

  • Review_count_not_helpful: Number of users who found the review unhelpful

  • ultimate_parent_rcid: Revelio Labs unique company ID for parent company

  • ultimate_parent_company_name: Revelio Labs unique company ID for parent company

Employee Sentiment Scores

This dataset contains employee sentiment scores using our sentiment model. This model uses Natural Language Processing to capture employee sentiment on specific topics such as management and diversity. For each review, we compute a weighted sentiment score based on how relevant a given topic was for the positive or negative portion of the review. These scores are then aggregated to arrive at a company-wide sentiment score. Each row contains the sentiment scores for a given company.

  • Rcid: Revelio Labs company ID

  • Company: Name of the company

  • Management_sentiment: Management sentiment score

  • Innovative_technology_sentiment: Innovative technology sentiment score

  • Work_life_balance_sentiment: Work life balance sentiment score

  • Mentorship_sentiment: Mentorship sentiment score

  • Career_advancement_sentiment: Career advancement sentiment score

  • Diversity_and_inclusion_sentiment: Diversity and inclusion sentiment score

  • Coworkers_sentiment: Coworkers sentiment score

  • Compensation_sentiment: Compensation sentiment score

  • Culture_sentiment: Culture sentiment score

  • Company_and_division_size_sentiment: Company and division size sentiment score

  • Perks_and_benefits_sentiment: Perks and benefits sentiment score

  • Onboarding_sentiment: Onboarding sentiment score

  • Remote_work_sentiment: Remote work sentiment score

  • Num_reviews: Number of reviews captured in a given company

Layoff Notices

We also collect WARN layoff data, which details whenever a firm is planning to lay off a significant portion of its workforce. The WARN Act (Worker Adjustment and Retraining Notification) ensures that mass layoffs and plant closures are registered with states and the Department of Labor in advance to allow for the provision of compliance assistance materials to help workers and employers understand their rights and responsibilities.

We provide the WARN data at the notice level, where each row represents a layoff notice.

  • Rcid: Revelio company ID

  • Company: Name of company registering layoff

  • State: State where layoff is occurring

  • City: City where layoff is occurring

  • Num_employees: Number of employees to be laid off

  • Layoff_dates: Date as of which layoffs will be effective

Individual Level Data

RL also provides individual level position data. These files contain user-level information on current or historical positions, educational history, name, and demographics information.

position_file

This file contains the individual level position data. Each row is a position held by an individual.

  • Position_id: Revelio Labs job ID

  • User_id: Revelio Labs user ID

  • Region: Region of position (Ex. Southern Asia, Western Europe)

  • Country: Country of position (imputed from location)

  • State: State of position (if missing, we infer it from the user’s current state)

  • Msa: MSA of position (if missing, we infer it from the user’s current location)

  • Rcid: Revelio Labs company ID

  • Company_name: Company name (mapped)

  • Company_raw: Company name (raw from online profile)

  • Company_linkedin_url: URL for employer (from online profile)

  • Company_cleaned: Company name (from online profile, cleaned of special characters)

  • Final_parent_factset_id: The ID of the company’s final parent company

  • final_parent_factset_name: The Revelio Labs mapped name of the final parent company – the top-level company of which this company is a subsidiary. For example, the final parent company for both Google and Waymo is Alphabet.

  • Jobtitle_raw: Position title (raw from online profile)

  • Mapped_role: Position title (Revelio Labs mapped)

  • Seniority: Seniority level with 7 discrete levels

  • Job_category: Aggregated position role with 7 discrete levels (also available at other levels of aggregation)

  • Salary: Modeled salary for the position

  • Description: User reported description of position

  • Startdate: Position start date if reported, null otherwise.

  • Enddate: Position end date if reported, null otherwise.

  • Rn: Chronological order of position in a user’s profile (i.e., 1 corresponds the earliest position reported)

  • ultimate_parent_rcid: Revelio Labs unique company ID for parent company

  • ultimate_parent_company_name: Revelio Labs unique company ID for parent company

user_file

This file contains the individual level user data. Each row is an individual’s public profile.

  • User_id: Revelio Labs user id

  • Firstname: First name (parsed from fullname)

  • Lastname: Last name (parsed from fullname)

  • Fullname: Name reported on online profile

  • Country: Profile country

  • Title: Current job title reported on online profile

  • Currentindustry: Current industry reported on online profile

  • F_prob: Probability of user being female

  • M_prob: Probability of user being male

  • Api_prob: Probability of user being Asian/Pacific Islander

  • Black_prob: Probability of user being Black or African American

  • Hispanic_prob: Probability of user being Hispanic or Latino

  • Multiple_prob: Probability of user being two or more races

  • Native_prob: Probability of user being American Indian or Alaskan Native

  • White_prob: Probability of user being Non-Hispanic White

  • Highest_degree: The highest level of education reported (Ex. Bachelor, High School)

education_file

This file contains the individual level education data. Each row is an educational record.

  • User_id: Revelio Labs user id

  • School: Campus name (university)

  • Startdate: Start date

  • Enddate: End date

  • Degree: Degree title

  • Major: Listed degree type (e.g. Bachelor of Science)

  • Field: Degree Field

  • Specialization: Listed field of study (e.g. Physics)

skill_file

This file contains the individual level skills data. Revelio Labs uses proprietary algorithms to cluster the skill universe into distinct clusters of skills. The clustering can be as coarse as 25 groups and as fine as over 20,000 groups. The default skill clustering is done at 50 groups.

  • User_id: Revelio Labs user id

  • Skill: Single skill from profile

  • Skill_mapped: Skill from profile (Revelio Labs mapped)

  • Skill_k75: Aggregated skill with 75 discrete levels (also available at other levels of aggregation)

company_ref

This file contains information on all of the companies that are covered by the delivered data. We tend to attach this file with our trial data to detail its coverage.

  • Rcid: Revelio Labs company id (parent company)

  • Company: Name of the company (parent company)

  • Factset_entity_id: Factset company id

  • Year_founded: Year in which the company was founded

  • Ticker: Ticker of the company (if it is public)

  • Exchange_name: The stock exchange that the company is listed on

  • Sedol: SEDOL code

  • Isin: ISIN code

  • Cusip: Cusip number

  • Url: Company’s website URL

  • Naics_code: Company’s NAICS industry code

  • Cik: CIK number

  • Lei: LEI code

  • Linkedin_url: Company LinkedIn URL (parent company)

  • Child_rcid: Revelio Labs company id (child company)

  • Child_company: Name of the company (child company)

  • Child_linkedin_url: Company LinkedIn URL (child company)