티스토리 뷰

In summary,

when you using format() in for loop, if you want to put 0(zero) in front of the number of 1 digits,

USE {:02d} format

 

 

 

 

When I analyze HTML which I want to crawling web, List's id class pattern start from 03 to 27.

 

//*[@id='Lu_gov_DG_ctl03_btnGovPartNo'] ~ //*[@id='Lu_gov_DG_ctl27_btnGovPartNo'] 

 

However, if I use format() in for loop, there is no way to put 0 in front of 1 digit.

for i in range(3,13) :
     address = "//*[@id='Lu_gov_DG_ctl{}_btnGovPartNo']".format(i)
     print(address)

 

//*[@id='Lu_gov_DG_ctl3_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl4_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl5_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl6_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl7_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl8_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl9_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl10_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl11_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl12_btnGovPartNo']

 

First thing I found is double for loop.

It works well, but it's only work if this number starts 0 or 1. If it starts number 2 or up, it skips some numbers when it changes the number of 2 digits. See below.

 

for i in range(0,2) :
    for j in range(3,10) :
        address = "//*[@id='Lu_gov_DG_ctl{}{}_btnGovPartNo']".format(i,j)
        print(address)

 

//*[@id='Lu_gov_DG_ctl03_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl04_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl05_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl06_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl07_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl08_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl09_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl13_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl14_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl15_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl16_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl17_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl18_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl19_btnGovPartNo']

If I set condition like exception, it makes result what I want, but I think there is better way not to use additional condition.

 

The soution is !

USE {:02d} format

 

when it is needed to create/(or load) numbered files, you need to create strings
such as '001', '002' etc. To get this you could use .format method.

 

"{:03d}".format(4)
 

When it applys in for loop, see below.

 

for i in range(3,13) :
     address = "//*[@id='Lu_gov_DG_ctl{:02d}_btnGovPartNo']".format(i)
     print(address)
//*[@id='Lu_gov_DG_ctl03_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl04_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl05_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl06_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl07_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl08_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl09_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl10_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl11_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl12_btnGovPartNo']

Clear!

 


내가 진행하고 있는 웹 크롤링에서, id class 명이 for loop 를 돌려야하는데 

03~27까지로,

(1) 1의 자리에는 0이 붙음과 동시에 

(2) 시작은 3부터 시작하는

괴랄한 패턴이었다.( 좀 0이나 1부터 시작하지 쫌 ..ㅠㅠ)

 

특히 단순히 print만 하는 것이었으면 강제로 "0"을 집어넣든지 하면 되었지만, 게다가 for문에 format() 함수를 동시에 쓰다보니 여간 답답한게 아니었다. 이 세 가지 문제를 모두 해결하기 위해서는,

 

for i in range(3,13) :
     address = "//*[@id='Lu_gov_DG_ctl{:02d}_btnGovPartNo']".format(i)
     print(address)

이렇게, format()이 들어가는 {}에 몇 자리를 가지는 어떤 형식인지를 선언해줌으로써! 가능하게 만들 수 있었다!

 

 

공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
글 보관함