Использование Python и ChatGPT для получения значимой информации из обзоров Apple App Store

Ваша самая важная задача как специалиста по продукту — понять своих пользователей — их желания, желания, задачи, которые необходимо выполнить, и особенно их болевые точки. Но поскольку у клиентов больше способов оставить отзыв, чем когда-либо (например, магазины приложений, служба поддержки клиентов и социальные сети), может быть сложно извлечь осмысленную информацию из потока информации.

В результате большая часть отзывов клиентов никогда не возникает таким образом, чтобы это могло повлиять на дорожную карту продукта. Это огромная упущенная возможность. Но так не должно быть. Ниже я собираюсь показать вам, как вы можете использовать API ChatGPT Python и OpenAI для масштабного анализа отзывов Apple App Store.

(Если вы не хотите следовать дальше и вам нужен только код, вы можете скопировать и вставить его из этого блокнота)

Шаг 1. Настройте свою среду и найдите идентификатор своего приложения.

Прежде чем погрузиться в код, убедитесь, что на вашем компьютере установлен Python и следующие зависимости. Вы можете установить их, используя:

!pip3 install openai
!pip3 install gspread
!pip3 install requests
!pip3 install app-store-scraper
!pip3 install oauth2client

Затем импортируйте необходимые библиотеки:

import base64
import openai
import gspread
from gspread.models import Cell
import requests
from oauth2client.service_account import ServiceAccountCredentials
import logging
import json
from datetime import datetime, timedelta, timezone

Затем перейдите в Apple App Store и найдите идентификатор своего приложения в URL-адресе вашего листинга (см. изображение ниже):

Шаг 2. Получите доступ к Google Таблицам с учетными данными сервисного аккаунта

Посетите Консоль разработчиков Google
Создайте новый проект и включите Google Sheets API
Перейдите в «Учетные данные» и создайте новый ключ учетной записи службы.
Загрузите файл JSON и назовите его как-нибудь узнаваемо, например «{ВАШЕ-ИМЯ-ФАЙЛА}.json».
Поместите файл JSON в свой рабочий каталог и добавьте имя файла в код ниже, где написано «{FILENAME.json}».
Создайте Google Sheet, используя эту схему
Скопируйте и вставьте свой Идентификатор Google Sheet в поле под названием {’GOOGLE-SHEET-ID}.
Поделитесь своим Google Sheet с адресом электронной почты, указанным в вашем файле .json сверху. Он будет иметь такое имя, как «имя-сервиса@project-id-101011.iam.gserviceaccount.com», и ему будут предоставлены разрешения на редактирование.

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

scope = ["https://spreadsheets.google.com/feeds",
         "https://www.googleapis.com/auth/spreadsheets",
         "https://www.googleapis.com/auth/drive.file",
         "https://www.googleapis.com/auth/drive"]

credentials = ServiceAccountCredentials.from_json_keyfile_name('{FILENAME}.json', scope) # Best security practice is to create an environment variable instead of hardcoding this value.

# Access Google Sheet 
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key('{GOOGLE-SHEET-ID}').sheet1 # This is the spreadsheet you'll dump the top 10 lists into; the first tab is always '.sheet1'
logger.info("Successfully accessed the Google Sheet.")

Шаг 3. Получите ключ API OpenAI

Затем перейдите в OpenAI, чтобы создать учетную запись и сгенерировать ключ API. Я настоятельно рекомендую перейти на Plus, пока вы там 😃.

Зарегистрируйте аккаунт OpenAI здесь.
Сгенерируйте ключ API и включите его в свой скрипт:

openai.api_key = 'OPENAI-API-KEY' # Best practice is to create an environment variable, instead of hard-coding your api key in your code.

Шаг 4. Создайте код для получения отзывов

Теперь давайте посмотрим обзоры App Store. Приведенный ниже фрагмент кода извлекает отзывы для определенного идентификатора приложения за последние 7 дней. При необходимости вы можете изменить даты.

# Fetch Apple App Store Reviews using RSS Feed
def fetch_app_store_reviews():
    # Base URL for fetching customer reviews for a specific app identified by its ID
    base_url = "https://itunes.apple.com/us/rss/customerreviews/id={YOUR-APP-ID}/sortby=mostrecent/json?urlDesc=/customerreviews/id={YOUR-APP-ID}/sortby=mostrecent/json&page="

    # Get today's date in UTC timezone to compare against review dates
    today = datetime.now(timezone.utc)

    # Initialize an empty list to collect reviews from the last 7 days
    reviews_in_last_7_days = []
    page_number = 1

    # Continue fetching reviews while the number of collected reviews is less than 700
    while len(reviews_in_last_7_days) < 700:
        url = base_url + str(page_number)  # Construct the URL with the current page number
        response = requests.get(url)  # Make a GET request to the URL
        if response.status_code != 200:
            break  # Exit the loop if the HTTP status code is not 200 (success)

        try:
            data = response.json()  # Decode the JSON response
        except json.JSONDecodeError:
            break  # Break the loop if there's a JSON decoding error

        if 'entry' not in data['feed']:
            break  # Break if 'entry' key is missing, indicating no reviews on this page

        # Iterate through the entries (reviews) in the response
        for entry in data['feed']['entry']:
            review_date_str = entry['updated']['label']  # Extract review date as a string
            # Convert the review date string to a datetime object
            review_date = datetime.strptime(review_date_str, '%Y-%m-%dT%H:%M:%S%z')
            # Check if the review is from the last 7 days and if less than 700 reviews have been collected
            if (today - review_date) <= timedelta(days=7) and len(reviews_in_last_7_days) < 700:
                title = entry['title']['label']  # Extract the title of the review
                content = entry['content']['label']  # Extract the content of the review
                reviews_in_last_7_days.append(content)  # Append the content to the list

        # Break the loop if there are fewer than 50 entries on the current page, indicating no more pages
        if len(data['feed']['entry']) < 50:
            break

        page_number += 1  # Increment the page number to fetch the next page of reviews

    return reviews_in_last_7_days  # Return the collected reviews

Шаг 5. Проанализируйте и определите, что больше всего нравится и не нравится

Используя OpenAI, обработайте отзывы, чтобы определить 10 основных моментов, которые людям нравятся и не нравятся в сервисе. Вы могли бы создать одну функцию для одновременной оценки лайков и антипатий, но я обнаружил, что две отдельные функции: одна для лайков и одна для антипатий, дают более точные результаты.

def generate_top_ten_likes(reviews):
    # Log an informational message to indicate the start of the generation process
    logger.info("Generating top ten likes with ChatGPT API...")

    # Construct a detailed prompt by concatenating review texts for OpenAI GPT model to process
    detailed_prompt = "Given the following app reviews, please generate a numbered list of the top ten things people like or enjoy about the app, listed from the most prominent / prevelant (1) to the least prominent (10):\n" + "\n".join(reviews)

    # Messages array to define the interaction with the OpenAI GPT model
    messages = [{"role": "system", "content": "You are a helpful assistant."}, 
                {"role": "user", "content": detailed_prompt}]

    try:
        # Call OpenAI API to get completion based on the provided messages and model parameters
        response = openai.ChatCompletion.create(
            model="gpt-4-32k", # if you don't have access to gpt-4-32k, use gpt-3.5-turbo-16k
            messages=messages,
            max_tokens=1000, # You likely won't need this many tokens; input whatever value you need, up to the token limit
            temperature=0.5  # temperature controls how deterministic the model is in its response, with 0 being the most deterministic and 2 being least (most creative)
        )

        # Check if the response has 'choices' content
        if 'choices' in response:
            # Extract and split the message content
            content = response["choices"][0]["message"]["content"].strip().split("\n")
            
            # Filter out any empty or whitespace-only entries from the content
            content = [line.strip() for line in content if line.strip()]
            
            # Limit to the top 10 non-empty entries
            content = content[:10]
            
        else:
            # Log an error if no choices were found in the response
            logger.error("No choices found in the response from ChatGPT API.")
            return []

        # Return the top 10 likes
        return content[:10]
    except Exception as e:
        # Log an error if there was any exception during the process
        logger.error(f"Error generating top ten likes: {e}")
        return []

def generate_top_ten_dislikes(reviews):
    # Log an informational message to indicate the start of the generation process
    logger.info("Generating top ten dislikes with ChatGPT API...")

    # Construct a detailed prompt by concatenating review texts for OpenAI GPT model to process
    detailed_prompt = "Given the following app reviews, please generate a numbered list of the top ten things people dislike about the app, listed from the most prominent / prevelant (1) to the least prominent (10):\n" + "\n".join(reviews)

    # Messages array to define the interaction with the OpenAI GPT model
    messages = [{"role": "system", "content": "You are a helpful assistant."}, 
                {"role": "user", "content": detailed_prompt}]

    try:
        # Call OpenAI API to get completion based on the provided messages and model parameters
        response = openai.ChatCompletion.create(
            model="gpt-4-32k", # if you don't have access to gpt-4-32k, use gpt-3.5-turbo-16k
            messages=messages,
            max_tokens=1000, # You likely won't need this many tokens; input whatever value you need, up to the token limit
            temperature=0.5  # temperature controls how deterministic the model is in its response, with 0 being the most deterministic and 2 being least (most creative)
        )

        # Check if the response has 'choices' content
        if 'choices' in response:
            # Extract and split the message content
            content = response["choices"][0]["message"]["content"].strip().split("\n")
            
            # Filter out any empty or whitespace-only entries from the content
            content = [line.strip() for line in content if line.strip()]
            
            # Limit to the top 10 non-empty entries
            content = content[:10]
            
        else:
            # Log an error if no choices were found in the response
            logger.error("No choices found in the response from ChatGPT API.")
            return []

        # Return the top 10 likes
        return content[:10]
    except Exception as e:
        # Log an error if there was any exception during the process
        logger.error(f"Error generating top ten dislikes: {e}")
        return []

Шаг 6. Обновите таблицу Google с результатами

Затем обобщите свои результаты и обновите их в Google Sheet для быстрого доступа.

def update_google_sheet(timestamp, likes, dislikes):
    try:
        # Locate the columns by their titles for "date", "rank", "detail", and "type" in the Google Sheet.
        date_col = spreadsheet.find("date").col
        rank_col = spreadsheet.find("rank").col
        detail_col = spreadsheet.find("detail").col
        type_col = spreadsheet.find("type").col
        
        # Determine the next available row in the Google Sheet by finding the length of the "date" column and adding 1.
        next_row = len(spreadsheet.col_values(date_col)) + 1
        
        # Initialize an empty list to prepare for a batch update of cells.
        cell_list = []
        
        # Iterate through the likes, and create a new Cell object for each field (timestamp, like detail, type, and rank).
        # Add these cells to the cell_list.
        for index, like in enumerate(likes, start=1):
            cell_list.append(Cell(row=next_row, col=date_col, value=timestamp))
            cell_list.append(Cell(row=next_row, col=detail_col, value=like))
            cell_list.append(Cell(row=next_row, col=type_col, value="like"))
            cell_list.append(Cell(row=next_row, col=rank_col, value=str(index)))  # Rank of the like is updated here
            next_row += 1

        # Iterate through the dislikes, and create a new Cell object for each field (timestamp, dislike detail, type, and rank).
        # Add these cells to the cell_list.
        for index, dislike in enumerate(dislikes, start=1):
            cell_list.append(Cell(row=next_row, col=date_col, value=timestamp))
            cell_list.append(Cell(row=next_row, col=detail_col, value=dislike))
            cell_list.append(Cell(row=next_row, col=type_col, value="dislike"))
            cell_list.append(Cell(row=next_row, col=rank_col, value=str(index)))  # Rank of the dislike is updated here
            next_row += 1
            
        # Use the batch update method to add all the cells in cell_list to the Google Sheet at once.
        spreadsheet.update_cells(cell_list)
        
    except Exception as e:
        # Log an error message if there's any exception during the update process.
        logger.error(f"Error updating Google Sheet: {e}")

Шаг 7. Обработайте функции в своем коде.

Наконец, вызовите свои функции с помощью process_reviews . Эта функция выполняет весь написанный вами код: собирает отзывы из App Store; обрабатывает их, чтобы получить симпатии и антипатии; и обновляет данные в Google Sheet. Он также включает операторы регистрации и печати для отслеживания прогресса и обработки исключений на этом пути.

def process_reviews():
    try:
        # Fetches the reviews from the App Store and stores them in the variable `scraped_reviews`
        scraped_reviews = fetch_app_store_reviews()  

        # Constructs a log message with the total number of reviews scraped and logs it
        log_message = f"Total number of reviews scraped: {len(scraped_reviews)}"
        logger.info(log_message)
        print(log_message) # Also prints the log message to the console

        # Gets the current timestamp in the given format
        timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

        # Generates the top ten likes and dislikes from the scraped reviews
        likes = generate_top_ten_likes(scraped_reviews)
        dislikes = generate_top_ten_dislikes(scraped_reviews)

        # Constructs a log message with the number of likes and dislikes processed and logs it
        log_message = f"Processed {len(likes)} likes and {len(dislikes)} dislikes."
        logger.info(log_message)
        print(log_message) # Also prints the log message to the console

        # If no likes or dislikes are found, log an error and exit the function
        if len(likes) == 0 and len(dislikes) == 0:
            logger.error("Did not receive any likes or dislikes.")
            print("Error processing likes/dislikes.")
            return

        # Updates a Google Sheet with the timestamp, likes, and dislikes
        update_google_sheet(timestamp, likes, dislikes)

        print("Data updated successfully.") # Prints a success message to the console
    except Exception as e:
        # If an exception occurs, logs and prints an error message with details of the exception
        error_message = f"Error processing reviews: {e}"
        logger.error(error_message)
        print(error_message)

process_reviews() # Calls the process_reviews function

Полученные результаты

Теперь вы, вероятно, задаетесь вопросом: являются ли эти выводы точными и/или полезными? Давайте посмотрим на 10 самых неприятных моделей, представленных для Slack, популярного инструмента обмена сообщениями и совместной работы.

Как пользователь Slack, многие из этих болевых точек резонируют со мной. Если бы я был менеджером по продукту в Slack, это был бы ценный сигнал, в который я бы хотел углубиться с помощью интервью с пользователями и количественного анализа данных.

Заключение

Как я уже упоминал во введении к этой статье, для специалистов по продукту как никогда важно понимать своих клиентов. Но может быть невероятно сложно разобраться в тысячах отзывов и заявок в службу поддержки, которые средний и крупный бизнес получает каждый день. Если вы столкнулись с этой проблемой, я надеюсь, что вы сможете использовать приведенную выше модель, чтобы помочь вам принимать более правильные решения о продукте и, в конечном итоге, доставлять удовольствие тем, кому вы служите.

Приятного кодирования! 🚀

материалы по теме:

Новые материалы

Кластеризация: более глубокий взгляд

Кластеризация — это метод обучения без учителя, в котором мы пытаемся найти группы в наборе данных на основе некоторых известных или неизвестных свойств, которые могут существовать. Независимо от..

Как написать эффективное резюме

Предложения по дизайну и макету, чтобы представить себя профессионально Вам не позвонили на собеседование после того, как вы несколько раз подали заявку на работу своей мечты? У вас может..

Частный метод Python: улучшение инкапсуляции и безопасности

Введение Python — универсальный и мощный язык программирования, известный своей простотой и удобством использования. Одной из ключевых особенностей, отличающих Python от других языков, является..

Как я автоматизирую тестирование с помощью Jest

Шутка для победы, когда дело касается автоматизации тестирования Одной очень важной частью разработки программного обеспечения является автоматизация тестирования, поскольку она создает..

Работа с векторными символическими архитектурами, часть 4 (искусственный интеллект)

Hyperseed: неконтролируемое обучение с векторными символическими архитектурами (arXiv) Автор: Евгений Осипов , Сачин Кахавала , Диланта Хапутантри , Тимал Кемпития , Дасвин Де Сильва ,..

Понимание расстояния Вассерштейна: мощная метрика в машинном обучении

В обширной области машинного обучения часто возникает необходимость сравнивать и измерять различия между распределениями вероятностей. Традиционные метрики расстояния, такие как евклидово..

Обеспечение масштабируемости LLM: облачный анализ с помощью AWS Fargate и Copilot

В динамичной области искусственного интеллекта все большее распространение получают модели больших языков (LLM). Они жизненно важны для различных приложений, таких как интеллектуальные..

Machine Learning JavaScript Blockchain Artificial Intelligence Data Science Cryptocurrency Software Development Python Web Development Coding Deep Learning AI Bitcoin React Software Engineering Ethereum Web3 Business Crypto Nodejs Solidity Development Front End Development Data Finance Money Java Trading Typescript Smart Contracts Productivity Tech Startup Investing Neural Networks Developer Computer Science NLP