티스토리 뷰
[Eng/Kor, Python for crawling] The way to put 0(zero) in front of 1 digit such as 01, 001 with using format() in for loop // for 문 사용 시 01, 001 등 일의자리 앞에 0을 넣는 방법
0_hoonie 2022. 1. 8. 23:54In summary,
when you using format() in for loop, if you want to put 0(zero) in front of the number of 1 digits,
USE {:02d} format
When I analyze HTML which I want to crawling web, List's id class pattern start from 03 to 27.
//*[@id='Lu_gov_DG_ctl03_btnGovPartNo'] ~ //*[@id='Lu_gov_DG_ctl27_btnGovPartNo']
However, if I use format() in for loop, there is no way to put 0 in front of 1 digit.
for i in range(3,13) :
address = "//*[@id='Lu_gov_DG_ctl{}_btnGovPartNo']".format(i)
print(address)
//*[@id='Lu_gov_DG_ctl3_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl4_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl5_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl6_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl7_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl8_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl9_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl10_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl11_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl12_btnGovPartNo']
First thing I found is double for loop.
It works well, but it's only work if this number starts 0 or 1. If it starts number 2 or up, it skips some numbers when it changes the number of 2 digits. See below.
for i in range(0,2) :
for j in range(3,10) :
address = "//*[@id='Lu_gov_DG_ctl{}{}_btnGovPartNo']".format(i,j)
print(address)
//*[@id='Lu_gov_DG_ctl03_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl04_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl05_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl06_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl07_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl08_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl09_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl13_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl14_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl15_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl16_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl17_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl18_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl19_btnGovPartNo']
If I set condition like exception, it makes result what I want, but I think there is better way not to use additional condition.
The soution is !
USE {:02d} format
when it is needed to create/(or load) numbered files, you need to create strings
such as '001', '002' etc. To get this you could use .format method.
"{:03d}".format(4)
When it applys in for loop, see below.
for i in range(3,13) :
address = "//*[@id='Lu_gov_DG_ctl{:02d}_btnGovPartNo']".format(i)
print(address)
//*[@id='Lu_gov_DG_ctl03_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl04_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl05_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl06_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl07_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl08_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl09_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl10_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl11_btnGovPartNo']
//*[@id='Lu_gov_DG_ctl12_btnGovPartNo']
Clear!
내가 진행하고 있는 웹 크롤링에서, id class 명이 for loop 를 돌려야하는데
03~27까지로,
(1) 1의 자리에는 0이 붙음과 동시에
(2) 시작은 3부터 시작하는
괴랄한 패턴이었다.( 좀 0이나 1부터 시작하지 쫌 ..ㅠㅠ)
특히 단순히 print만 하는 것이었으면 강제로 "0"을 집어넣든지 하면 되었지만, 게다가 for문에 format() 함수를 동시에 쓰다보니 여간 답답한게 아니었다. 이 세 가지 문제를 모두 해결하기 위해서는,
for i in range(3,13) :
address = "//*[@id='Lu_gov_DG_ctl{:02d}_btnGovPartNo']".format(i)
print(address)
이렇게, format()이 들어가는 {}에 몇 자리를 가지는 어떤 형식인지를 선언해줌으로써! 가능하게 만들 수 있었다!