( Hevi | 2024. 02. 29., cs – 19:10 )

Egy biztos: az MI nem képes megfelelő programokat létrehozni, és nem is fog, mert nem "érti", hogy mit csinál a kód, csak gyakoriság alapján összeollóz dolgokat.

Prompt:

Write a python script which scrapes the URL https://hup.hu/activity and collects all the names in the "Tartalom" column

I can help guide you through the process of writing a Python script to scrape data from the specified URL, but I need to clarify that I can't run live web scraping scripts or access external websites directly from here.

To scrape the names from the "Tartalom" column of the page at https://hup.hu/activity, you can use Python libraries such as requests to retrieve the page content and BeautifulSoup from bs4 to parse the HTML and extract the desired information. Below is a general outline of how you could set up your script:

1. Install necessary libraries (if you haven't already):

pip install requests beautifulsoup4

2. Python script:

import requests
from bs4 import BeautifulSoup

# URL to scrape
url = 'https://hup.hu/activity'

# Send a GET request to the URL
response = requests.get(url)
if response.status_code == 200:  # Check if the request was successful
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find the table or the specific elements containing the names in the "Tartalom" column
    # This part is generic since I can't directly see the page structure. 
    # You'll need to inspect the HTML to find the correct tags/classes/ids
    names = soup.find_all('tag_or_class_identifying_names')
    
    for name in names:
        # Extract and print the text part of each name element
        # This might need adjustment based on the actual structure
        print(name.text.strip())
else:
    print(f"Failed to retrieve content, status code: {response.status_code}")

A few things to note:

  • You'll need to replace 'tag_or_class_identifying_names' with the actual tag, class, or id that uniquely identifies the names in the "Tartalom" column. Use the browser's developer tools to inspect the page and find out how these names are represented in the HTML.
  • This script does a very basic error handling by checking the HTTP status code. Depending on the website's structure and behavior, you might need to implement more sophisticated error handling.
  • Ensure that the website's terms of use and robots.txt file allow scraping. Always respect the website's rules regarding automated access and data extraction.

This script provides a starting point. You'll need to adjust it based on the specific HTML structure of the website you're trying to scrape.

Prompt:

Give me a solution if the XPath selector is "//*[@id="block-hup-theme-content"]/div/div/div[2]/table/tbody/tr[1]/td[3]/a"

Given that you have an XPath selector, the lxml library becomes a suitable choice for parsing the HTML content, as it provides comprehensive support for XPath expressions. Here's how you can modify the script to use lxml and the provided XPath to scrape names from the specified column:

1. Install necessary libraries (if you haven't already):

pip install requests lxml

2. Python script using lxml:

import requests
from lxml import etree

# URL to scrape
url = 'https://hup.hu/activity'

# Send a GET request to the URL
response = requests.get(url)
if response.status_code == 200:  # Check if the request was successful
    # Parse the HTML content
    html = etree.HTML(response.content)
    
    # Use the provided XPath to select the target elements
    # This XPath gets the first name in the "Tartalom" column, you'll need to adjust it to iterate over all rows
    xpath = '//*[@id="block-hup-theme-content"]/div/div/div[2]/table/tbody/tr[1]/td[3]/a'
    
    # Assuming you want to scrape names from multiple rows, you might need to adjust the XPath
    # For example, to select all 'a' elements in the third column of each row in the tbody:
    adjusted_xpath = '//*[@id="block-hup-theme-content"]/div/div/div[2]/table/tbody/tr/td[3]/a'
    
    # Find all elements matching the adjusted XPath
    names = html.xpath(adjusted_xpath)
    
    for name in names:
        # Print the text content of each element
        print(name.text)
else:
    print(f"Failed to retrieve content, status code: {response.status_code}")

In this script (kiemelés tőlem):

  • The xpath variable initially contains the XPath selector you provided, which points to a specific link in the first row of the "Tartalom" column. However, to scrape names from all rows, the XPath has been adjusted in the adjusted_xpath variable. This assumes that each row in the table follows a similar structure, and all names you're interested in are placed within <a> tags in the third column of the table body (tbody).
  • The lxml library's html.xpath() method is used to select all elements that match the given XPath expression. It returns a list of elements, from which you can extract text using the .text attribute.
  • This example prints the name found in each matching element. Depending on the website's structure, you might need to adjust the XPath expression further to accurately select the elements you're interested in.

Remember, always ensure your web scraping activities comply with the target website's terms of service and legal considerations around data use.

---

Szeretném felhívni a figyelmed arra, hogy kijavította a kódomat, és az oldal forráskódja ismerete nélkül felismerte, hogy az általam szolgáltatott XPath csak egy elemre mutat, majd felkínálta az adjusted_xpath változót, ami szerinte azt csinálja majd, amit én szeretnék.

Nem próbáltam ki a kódot, de azt mondani, hogy "nem érti", hogy miről van szó, háááát...

Számos Junior programozó simán implementálta volna a kódot az eredeti XPath-om alapján, aztán csodálkozott volna, hogy miért csak egy topic címet köp ki a script.

És ez csak GPT-4, nem pedig a legújabb modellek.

Programozóknak meg azt üzenem, ha a cégvezetésetek MI bevezetésén gondolkodik, meneküljetek, mielőtt süllyedni kezd a hajó, mert az a cég durván rá fog fázni

Nekem mindegy, csak ne csodálkozzatok, ha 3-5 év múlva véletlenül leépítésre kerültök, és sehova nem vesznek fel, mert az AI használatának kerülése miatt a teljesítményetek nem elégséges a kollégákhoz képest, akik viszont használnak generatív AI-t.