파이썬:: 웹크롤링을 위한 BeautifulSoup 연습

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

박미미의 지식에서 쌓는 즐거움

파이썬:: 웹크롤링을 위한 BeautifulSoup 연습 본문

카테고리 없음

파이썬:: 웹크롤링을 위한 BeautifulSoup 연습

낑깡좋아 2019. 7. 14. 00:30

BeautifulSoup 은 파이썬으로 웹크롤링을 위한 라이브러리이다.

기본적인 기능과 연습을 해보자.

BeautifulSoup의 커뮤니티는 아래 링크를 클릭하면 된다. 안에 설명과 예제가 나와있으니 확인해보도록 해요. (물론 영어로 된 사이트;;)

https://www.crummy.com/software/BeautifulSoup/

우선 간단한 html 코드를 작성해보자.

연습을 위한 코드는 아무렇게나...

<html><head><title>BeautifulSoup 연습</title></head>
<body>파이썬 웹크롤링 연습 one
Python two

<a>Hickory and Lime</a></html>

이 html 코드로 라이브러리 연습을 할 예정입니다.

from bs4 import BeautifulSoup

doc = ['<html><head><title>BeautifulSoup 연습</title></head>', \
'<body>파이썬 웹크롤링 연습 one', \
'Python two', '</html>']

# 검색이 용이한 객체
soup = BeautifulSoup("".join(doc) )

#태그 정렬해서 보여주기
print( soup.prettify )

결과

1. 정규식을 이용해서 't'로 시작하는 태그 가져오기

import re #정규표현식

tagStartingWithT = soup.find_all(re.compile("^t"))
print( [tag.name for tag in tagStartingWithT] )

결과

['title']

2. 'title'과 'p'태그만 가져오기

print(soup.find_all(['title', 'p']))

결과

[<title>BeautifulSoup 연습</title>, 파이썬 웹크롤링 연습 one, Python two

3. Lambda 함수 이용해서 태그 속성이 2개 인것만 출력

print( soup.find_all(lambda tag:len(tag.attrs) ==2) )

결과

파이썬 웹크롤링 연습 one, Python two

4. id가 'rst'로 끝나는 것 찾기

print( soup.find_all(id=re.compile("rst$")) )

결과

파이썬 웹크롤링 연습 one

5. 태그 중에 class = lime 인 것 찾기

print( soup.find("b", attrs={"class":"lime"}))

결과

Lime

저작자표시

Comments

박미미의 지식에서 쌓는 즐거움

파이썬:: 웹크롤링을 위한 BeautifulSoup 연습 본문

파이썬:: 웹크롤링을 위한 BeautifulSoup 연습

티스토리툴바