Datasets

Workforce Dynamics

This dataset contains aggregated workforce statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. In that case, every row observes a particular company in a given month. If we include country as a level of aggregation, then each row of the dataset would correspond to a company, country, and month combination. The dataset at the company-country-month level can be aggregated to create the company-month dataset.

Let’s take a look at an example output where we have the levels of aggregation as company, country tracked across month and let count be the outcome of interest that represents the total headcounts for that particular level of aggregation, month combination (the count represents the headcount at the end of that particular month):

company

country

month

count

Company A

U.S.

2021-01

10

Company A

U.S.

2021-02

12

Company A

U.S.

2021-03

14

Company A

Canada

2021-01

10

Company A

Canada

2021-02

11

Company A

Canada

2021-03

9

This enables us to visualize the table as a graph as well, where the month can be represented along the X-axis, and the outcome count can be represented along the Y-axis. Thus, in this case (Company A, U.S.) and (Company A, Canada) can be viewed as entities for which the outcome count is tracked over time (month) on this graph.

Note that it’s easy to compute a broader level of aggregation from a narrower level of aggregation. To reduce our previous example to the company and month level, we can sum across the country column to get:

company

month

count

Company A

2021-01

20 (10+10)

Company A

2021-02

23 (12+11)

Company A

2021-03

23 (14+9)

  • Count (float): The total number of employees for a specific level of granularity for each month.

  • Inflow/Outflow (float): In addition to the number of employees, we also estimate the total inflow (people joining) and outflow (people leaving) in a given month.

  • Salary (float): We predict the salary for each position based on role, seniority, company, and country using a regression-based model. We train this model using over 200 million salaries from job postings and publicly available labor certification applications, and use country-level inflation rates to estimate the change in salary over time. We get an out-of-sample root mean squared error (RMSE) of 14%. The Salary column in long_file shows the sum of salaries of employees in the particular granularity level.

  • Prestige (float): We generate the prestige score for each university, degree level, company, and individual. We start with the prestige of universities by publicly available scores and then include the relationships between universities, individuals, and companies that we observe in our data until each individual converges on a prestige score.

  • Month (time): The month and year of the position are provided in “YYYY-MM” format. Each deliverable file contains monthly data up to the previous month’s end.

  • Company (categorical): RL delivery file can provide insights on all public (and many private) companies. By default, companies are defined at the holding company level, where all subsidiaries held by the top parent company are included. The list of parent companies covered by Revelio include those mapped by FactSet Research Systems Inc., in addition to manually defined companies at the client’s request.

  • Region (categorical): The most coarse geographical granularity can be defined at region level. The 15 region names are as follows:

    • Arab States

    • Northern Africa

    • South-Eastern Asia

    • Central America

    • Northern America

    • Southern Asia

    • Central and Western Asia

    • Northern Europe

    • Southern Europe

    • Eastern Asia

    • Pacific Islands

    • Sub-Saharan Africa

    • Eastern Europe

    • South America

    • Western Europe

  • Country (categorical): The granularity can be specified at the country level for 232 distinct countries.

  • State (categorical): For US and US territories, the granularity can be specified at the state level. This level includes 50 states and 9 territories (American Samoa, Guam, Northern Mariana Islands, Puerto Rico, Virgin Islands, Minor Outlying Islands, Micronesia, Marshall Islands, Palau).

  • MSA (categorical): For US states and US territories, the most granular geography is the Metropolitan Statistical Area (MSA).

  • Job Category (categorical): In addition to geographical granularities, role-level granularities can also be specified. The most basic job category classification groups roles into the following 7 groups:

    • Admin

    • Engineer

    • Finance

    • Marketing

    • Operations

    • Sales

    • Scientist

    The job role taxonomy is developed by our proprietary representation and clustering algorithms. We develop mathematical representations of each job title using the title itself, the text description of the position (from either individuals describing their own experiences or employers on a job posting), individuals’ skills, associates, and previous experience. Our clustering algorithm is in the family of hierarchical/agglomerative clustering algorithms. This means that we begin with every job title occupying its own cluster, then iteratively combine clusters based on a set of criteria. This allows for complete flexibility of the number of clusters. We update this taxonomy periodically to adjust to the changing occupational landscape. Aside from the 7-cluster jobs above, the most common job clustering is done at 150 groups and 1000 groups.

  • Seniority (ordinal): Seniority ranges from 1 to 4. 1 is the most junior, and 4 is the most senior. Our seniority model predicts seniority based on the title, accounting for industry and company size. Age and tenure do not directly determine our seniority measure.

Statistics that can be included are as follows:

  • Levels of Aggregation:
    • Month (time): The month and year of the position, provided in “YYYY-MM” format

    • Region (categorical): The most coarse geographical granularity with 16 discrete levels

    • Country (categorical): 232 different countries

    • State (categorical): For US and US territories, state level location

    • MSA (categorical): For US states and US territories, metropolitan statistical area

    • Job_category (categorical): Aggregated position role with 7 discrete levels

    • Seniority (ordinal): Seniority level with 4 discrete levels

    • Highest_degree (categorical): Highest degree attained by workers at the granularity level

    • Gender (categorical): Gender is calculated as a probability based on the likelihood of the first name being male or female

    • Ethnicity (categorical): Ethnicity is estimated based on the likelihood of both the first and last name as well as location

    • Veteran (categorical): Veteran status

  • Outcomes:
    • Count (float): Headcount for a given month

    • Inflow/Outflow (float): Total inflow and outflow of employees at each granularity level

    • Salary (float): Sum of Estimated salaries in the particular granularity level

    • Prestige (float): Average prestige score of the specified granularity level

    • Remote_suitability (float) : Revelio score for suitability of a role for remote work

    • Duration (float): Average tenure of employees at a given granularity level in years

    • Hiring (float): Sum of inflows at a given level of granularity over the last year divided by the average counts at that granularity over the last year

    • Attrition (float): Sum of outflows at a given level of granularity over the last year divided by the average counts at that granularity over the last year

    • Gender_entropy (float): Gender diversity score at a given granularity level

    • Ethnicity_entropy (float): Ethnicity diversity score at a given granularity level

Transitions

  • User_id: Revelio user id

  • Month: Month

  • Prev_company: Previous company name

  • Prev_sector: Previous sector

  • Prev_industry: Previous industry

  • Prev_region: Previous region

  • Prev_jobtitle: Previous job title

  • Prev_job_category: Aggregated previous position role with 7 discrete levels

  • Prev_role_k50: Aggregated previous position role with 50 discrete levels

  • Prev_role_k150: Aggregated previous position role with 150 discrete levels

  • Prev_seniority: Previous seniority level with 4 discrete levels

  • Prev_enddate: End date of previous position

  • Prev_salary: Estimated salary of the previous role

  • New_company: New company name

  • New_sector: New sector

  • New_industry: New industry

  • New_region: New region

  • New_jobtitle: New job title

  • New_job_category: Aggregated new position role with 7 discrete levels

  • New_role_k50: Aggregated new position role with 50 discrete levels

  • New_role_k150: Aggregated new position role with 150 discrete levels

  • New_seniority: New seniority level with 4 discrete levels

  • New_startdate: Start date of new position

  • New_salary: Estimated salary of the new role

Job Posting Dynamics

This dataset contains aggregated job posting statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. Each row would correspond to a company and month combination. For more information on the levels of aggregation, please refer to the Workforce Dynamics section.

  • Granularity_id: Revelio internal ID

  • Start_date: Month

  • Active_posting: Number of active postings during that month

  • New_posting: Number of new postings during that month

  • Removed_posting: Number of postings removed during that month:

  • Active_salary_avg: Average salary for active postings during that month

  • New_salary_avg: average salary for new postings during that month

  • Removed_salary_avg: Average salary for postings that got removed during that month

  • Filling_time_avg: Average time to fill

  • Company: Company name

  • State: Location of job posting

  • Role_k7: Aggregated posting role with 7 discrete levels

  • Role_k150: Aggregated posting role with 150 discrete levels

  • Seniority: Seniority level with 4 discrete levels

Individual Job Postings

RL also provides individual level job postings data.

  • Job_id: Posting key

  • Company: Name of the company

  • Company_cleaned: Standardized company name

  • Post_date: Date at which the job was posted

  • Remove date: Date at which the job was removed. If null, it hasn’t been removed yet.

  • Title: Raw job title

  • Title_cleaned: Standardized title

  • Role_k150: Aggregated position role with 150 discrete levels

  • Role_k50: Aggregated position role with 50 discrete levels

  • Role_k7: Aggregated position role with 7 discrete levels

  • Status: Discrete posting status includes: open, closed, expired and pending close

  • Salary: Salary information from the posting.

  • Location, city, state, state_long, zip, county, latitude, longitude: Listed location for posting

  • Region_state: Metropolitan Statistical Area of the posting

  • Industry: Listed industry of posting

  • Industry_cleaned: Standardized listed industry

Employee Sentiment

RL provides Company review data with the following information. Note that not all rating fields are required to be filled out by the reviewer. Also, some ratings (ie. ‘culture and values’ and ‘diversity and inclusion’) were added more recently.

  • Review_id: Review key

  • Review_language_id: Indicates the language of the review. Most reviews are automatically translated to English. However, some remain in their native language.

  • Location, city, state, country: Listed location of the reviewer

  • Job_title_name: Raw position title of the reviewer

  • Review_date_time: Time when review was posted

  • Review_featured: Indicates whether review is featured on the company page

  • Review_iscovid19: Indicates whether review mentions the Covid-19 pandemic

  • Reviewer_employment_status: Indicates employment type of the reviewer (freelance, part time, intern, contract, regular)

  • Reviewer_job_ending_year: Final year of the reviewer’s employment with the company

  • Reviewer_length_of_employment: Number of years the reviewer worked at the company

  • Reviewer_current_job: Indicates whether the reviewer is a current or former employee

  • Rating_overall: Overall rating of company (integer values from 1 to 5, with 5 being the best)

  • Rating_business_outlook: Business outlook rating (positive, negative, neutral)

  • Rating_career_opportunities: Rating of career opportunities (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_ceo: Approval rating of the CEO (approve, disapprove, no opinion)

  • Rating_compensation_and_benefits: Rating of employee compensation and benefits (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_culture_and_values: Rating of company culture and values (integer values from 1 to 5, with 5 being the best)

  • Rating_diversity_and_inclusion: Rating of company diversity and inclusion (integer values from 1 to 5, with 5 being the best)

  • Rating_recommend_to_friend: Indicates whether the reviewer would recommend the company to a friend (positive, negative)

  • Rating_senior_leadership: Rating of senior management (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_work_life_balance: Rating of work-life balance (from 1 to 5, with half-points awarded, and 5 being the best)

  • Review_summary: Title of review

  • Review_advice: Reviewer’s advice to management

  • Review_pros: Positive review of company

  • Review_cons: Negative review of company

  • Review_count_helpful: Number of users who found the review helpful

  • Review_count_not_helpful: Number of users who found the review unhelpful

Layoff Notices

We also collect WARN layoff data, which details whenever a firm is planning to lay off a significant portion of its workforce. The WARN Act (Worker Adjustment and Retraining Notification) ensures that mass layoffs and plant closures are registered with states and the Department of Labor in advance to allow for the provision of compliance assistance materials to help workers and employers understand their rights and responsibilities.

We provide the WARN data at the notice level, where each row represents a layoff notice.

  • Company_name: Name of company registering layoff

  • State: State where layoff is occurring

  • City: City where layoff is occurring

  • County_or_region: The county or larger region, where applicable, where layoff is occurring

  • Num_employees: Number of employees to be laid off

  • Layoff_start_date: Date as of which layoffs will be effective

  • Layoff_end_date: Date by which all workers laid off in this notice will be laid off

  • Notice_date: Date the notice was filed

  • Layoff_type: The type of layoff occurring (large layoff, closure, etc)

Individual Level Data

RL also provides individual level position data. These files contain user-level information on current or historical positions, educational history, name, and demographics information.

position_file

This file contains the individual level position data. Each row is a position held by an individual.

  • Position_id: Revelio job ID

  • User_id: Revelio user ID

  • Location: Job location string from profile

  • Region: Region of position (Ex. Southern Asia, Western Europe)

  • Country: Country of position (imputed from location)

  • State: State of position (if missing, we infer it from the user’s current state)

  • Msa: MSA of position (if missing, we infer it from the user’s current location)

  • Company: Company name (raw from online profile)

  • Companyurl: URL for employer (from online profile)

  • Company_cleaned: Company name (from online profile, cleaned of special characters)

  • Title: Position title (raw from online profile)

  • Mapped_role: Position title (Revelio mapped)

  • Seniority: Seniority level with 4 discrete levels

  • Role_k7: Aggregated position role with 7 discrete levels (also available at other levels of aggregation)

  • Salary: Modeled salary for the position

  • Remote_suitability: Revelio remote suitability score

  • Description: User reported description of position

  • Startdate: Position start date if reported, null otherwise.

  • Enddate: Position end date if reported, null otherwise.

  • Rn: Chronological order of position in a user’s profile (i.e., 1 corresponds the earliest position reported)

user_file

This file contains the individual level user data. Each row is an individual’s public profile.

  • User_id: Revelio user id

  • Firstname: First name (parsed from fullname)

  • Lastname: Last name (parsed from fullname)

  • Fullname: Name reported on online profile

  • Location: Profile location

  • Country: Profile country

  • Title: Current job title reported on online profile

  • Currentindustry: Current industry reported on online profile

  • Url: Url of profile

  • F_prob: Probability of user being female

  • M_prob: Probability of user being male

  • Api_prob: Probability of user being Asian/Pacific Islander

  • Black_prob: Probability of user being Black or African American

  • Hispanic_prob: Probability of user being Hispanic or Latino

  • Multiple_prob: Probability of user being two or more races

  • Native_prob: Probability of user being American Indian or Alaskan Native

  • White_prob: Probability of user being Non-Hispanic White

  • Highest_degree: The highest level of education reported (Ex. Bachelor, High School)

education_file

This file contains the individual level education data. Each row is an educational record.

  • User_id: Revelio user id

  • Campus: Campus name (university)

  • University_priname_usa: Mapped university name from USA rankings

  • University_priname_world: Mapped university from world rankings

  • University_priname: Mapped university name

  • Universityurl: University url of online university profile

  • Major: Listed degree type (e.g. Bachelor of Science)

  • Specialization: Listed field of study (e.g. Physics)

  • Startdate: Start date

  • Enddate: End date

  • Sequenceno: Chronological order

  • Degree: Degree title

  • Field: Degree Field

  • Degree_level: Code for degree level (0: empty, 1: High School, 2: Associate, 3: Bachelor, 4: Master, 5: MBA, 6: Doctor)

skill_file

This file contains the individual level skills data. RL uses proprietary algorithms to cluster the skill universe into distinct clusters of skills. The clustering can be as coarse as 25 groups and as fine as over 20,000 groups. The default skill clustering is done at 50 groups.

  • User_id: Revelio user id

  • Skill: Single skill from profile

  • Skill_k50: Aggregated skill with 50 discrete levels (also available at other levels of aggregation)