Today
-
Yesterday
-
Total
-
  • ํŒŒ์ด์ฌ ๋‹ค์šด๋กœ๋“œ์‹œ ํŒŒ์ผ๋ช… ๊ฐ€์ ธ์˜ค๋Š” ๋ฐฉ๋ฒ•, ๋ฌธ์ž ์ธ์ฝ”๋”ฉ
    Soliloquy 2022. 6. 22. 01:38

     

    ๐ŸŒ

     

    ๋‹ค์šด๋กœ๋“œ ์‹œ ์„œ๋ฒ„๋กœ๋ถ€ํ„ฐ ํŒŒ์ผ๋ช…์„ ๊ฐ€์ ธ์˜ค๋Š”๋ฐ์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ์‹์ด ์žˆ๋‹ค. ์ธํ„ฐ๋„ท ์ฃผ์†Œ์˜ ๋๋ถ€๋ถ„์„ ๊ฐ€์ง€๊ณ  ์ถ”๋ก ํ•˜๊ฑฐ๋‚˜, ์‘๋‹ต ํ—ค๋”์˜ Content-Disposition์ด๋ผ๋Š” ๋ถ€๋ถ„์„ ๋ณด๊ฑฐ๋‚˜ ์•„๋‹ˆ๋ฉด HTML5์— ์žˆ๋Š” Download ํƒœ๊ทธ๋ฅผ ์“ฐ๊ฑฐ๋‚˜ ๋“ฑ์ด ์žˆ๋‹ค.

     

    ์˜ˆ๋กœ ์ด๋ฏธ์ง€ ์ฃผ์†Œ๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค๋ฉด URL์˜ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์ธ Windows_logo_and_wordmark_-_2021.svg๋ฅผ ๋ณด๊ณ  ์ด ๋ถ€๋ถ„์ด ํŒŒ์ผ๋ช…์ž„์„ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์•„๋ž˜์™€ ๊ฐ™์ด ์‘๋‹ต ํ—ค๋”์˜ Content-Disposition ๋ถ€๋ถ„์˜ filename๋ถ€๋ถ„์„ ๋ณด์•„๋„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

     

    https://blog.kakaocdn.net/dn/bI5PaW/btrozw3Udj4/o2nI0pKMfL4l5ZiuFIUTpK/Windows_logo_and_wordmark_-_2021.svg?attach=1&knm=tfile.svg

    ๋‹ค์šด๋กœ๋“œ ํŒŒ์ผ ์ฃผ์†Œ ์˜ˆ์‹œ

     

    content-disposition ์˜ˆ์‹œ

    ์ด ์ค‘ content-disposition์œผ๋กœ ํŒŒ์ผ ๋ช…์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค๋ฉด ํŒŒ์ด์ฌ์˜ urllib์˜ request.urlopen(url).headers.get_filename()์„ ์ด์šฉํ•˜์—ฌ ํŒŒ์ผ ์ œ๋ชฉ์„ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

     

    from urllib import request
    downloadURL = "https://github.com/notepad-plus-plus/notepad-plus-plus/releases/download/v8.4.2/npp.8.4.2.Installer.arm64.exe"
    urlOpen = request.urlopen(downloadURL)
    fileName = urlOpen.headers.get_filename()
    print(fileName)
    
    > npp.8.4.2.Installer.arm64.exe

     

    ๊ทธ๋Ÿฐ๋ฐ headers.get_filename()์„ ํ†ตํ•ด ํŒŒ์ผ ์ œ๋ชฉ์„ ๋ถˆ๋Ÿฌ์˜ค๋ฉด ํŒŒ์ผ๋ช…์ด ๊นจ์ง€๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋ฒˆ์— ๊ณต๊ณต๋ฐ์ดํ„ฐ ํฌํ„ธ์ธ data.go.kr์—์„œ CSV ํŒŒ์ผ์„ ๋ฐ›์•„์˜ฌ ๋•Œ ๊ทธ๋žฌ๋‹ค. ๋‚ด๊ฐ€ ๋ณด๋ ค๋˜ ์ •๋ณด๋Š” ๋Œ€๊ตฌ๋„์‹œ์ฒ ๋„๊ณต์‚ฌ_2ํ˜ธ์„  ์—ด์ฐจ์‹œ๊ฐํ‘œ์˜ CSV ๋ฐ์ดํ„ฐ์˜€๋Š”๋ฐ, ์›น ๋ธŒ๋ผ์šฐ์ €์—์„œ์™€๋Š” ๋‹ค๋ฅด๊ฒŒ ํŒŒ์ด์ฌ์˜ urllib ๋ชจ๋“ˆ์€ ์˜จ์ „ํ•œ ํŒŒ์ผ๋ช…์„ ๋ณด์—ฌ์ฃผ์ง€ ๋ชปํ–ˆ๋‹ค.

     

    ํŒŒ์ด์–ดํญ์Šค์—์„œ ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ๋ฅผ ํ•˜๋Š” ๋ชจ์Šต

    from urllib import request
    downloadURL = "https://www.data.go.kr/cmm/cmm/fileDownload.do?atchFileId=FILE_000000002535606&fileDetailSn=1&insertDataPrcus=N"
    urlOpen = request.urlopen(downloadURL)
    fileName = urlOpen.headers.get_filename()
    print(fileName)
    
    ëยŒย€êµ¬ëยย„ìย‹ยœì² ëยย„ê³µìย‚¬_2íย˜¸ìย„  ìย—´ì°¨ìย‹ยœê°ยíย‘ยœ_20220430.csv

    ํŒŒ์ด์ฌ์œผ๋กœ ๋ณธ ๋ชจ์Šต ํŒŒ์ผ๋ช…์ด ๊นจ์ ธ์žˆ๋‹ค

     

    ์ธ์ฝ”๋”ฉ์ด ๋ฌธ์ œ์˜€๊ฒ ๊ฑฐ๋‹ˆ ํ•ด์„œ ํ•œ๊ตญ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋๋˜ ์ธ์ฝ”๋”ฉ ๋ฐฉ์‹์ธ euc-kr๋กœ ์ธ์ฝ”๋”ฉ ์„ค์ •์„ ํ•ด๋ณด์•˜๋Š”๋ฐ, ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์™”๋‹ค. ๊ทธ๋ž˜์„œ ์ข€ ๋” ์ฐพ์•„๋ณด์•„๋ณด๊ธฐ๋กœ ํ–ˆ๋‹ค. ํ—ค๋” ๋ถ€๋ถ„์—๋Š” Content-Disposition: =?utf-8?b ๊ตฌ๋ฌธ์ด ์žˆ์–ด์„œ UTF-8์ธ ์ค„ ์•Œ์•˜๋Š”๋ฐ, ์•„๋‹Œ ๋“ฏํ–ˆ๋‹ค. ๋Œ€์ฒด ๋ญ˜๊นŒ...

    >>>fileName.encode('euc-kr')
    
    
    Traceback (most recent call last):
      File "<pyshell#2>", line 1, in <module>
        fileName.encode('euc-kr')
    UnicodeEncodeError: 'euc_kr' codec can't encode character '\xeb' in position 0: illegal multibyte sequence

     

    ๊ทธ๋Ÿฌ๋‹ค๊ฐ€ ์ธํ„ฐ๋„ท์—์„œ Universal Cyrillic decoder๋ผ๋Š” ์›น ํŽ˜์ด์ง€๋ฅผ ์ฐพ์•˜๋‹ค. ์ด ์‚ฌ์ดํŠธ๋Š” ๋‹ค์ˆ˜์˜ ์ž…๋ ฅํ•œ ๋ฌธ์ž๋ฅผ ๋‹ค์–‘ํ•œ ์ธ์ฝ”๋”ฉ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋‰ด๊ฐ€ ์žˆ์—ˆ๋‹ค.[๊ฐ์ฃผ:1]

     

    ๊ทธ๋Ÿฐ๋ฐ ํ•ด๋‹น ๋ฌธ์ž๊ฐ€ ์–ด๋–ค ์ธ์ฝ”๋”ฉ์œผ๋กœ ๋˜์–ด ์žˆ๋Š”์ง€๋Š” ๋ณด์—ฌ์ฃผ์ง€ ์•Š์•˜๋‹ค. ํ˜น์‹œ๋‚˜ ๊ฐœ๋ฐœ์ž ๋„๊ตฌ์— ์ธ์ฝ”๋”ฉ ์ข…๋ฅ˜๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์ ํ˜€์žˆ์ง€ ์•Š์„๊นŒ ํ•ด์„œ ๋ดค๋Š”๋ฐ, ๋‹คํ–‰ํžˆ๋„ ์žˆ์—ˆ๋‹ค. ISO-8859-1๊ฐ€ ์ •๋‹ต์ด์—ˆ๋‹ค. ํŒŒ์ด์ฌ์—์„œ .encode('ISO-8859-1').decode('UTF-8')์„ ์‹คํ–‰ํ–ˆ๋”๋‹ˆ ๋‚ด๊ฐ€ ๋ฐ”๋ผ๋˜ ํŒŒ์ผ๋ช…์ด ์ •์ƒ์ ์œผ๋กœ ์ถœ๋ ฅ๋จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

     

    Universal Cyrillic Decoder ์›น ํŽ˜์ด์ง€ ํ™”๋ฉด๊ณผ ๊ฐœ๋ฐœ์ž ๋„๊ตฌ ํ™”๋ฉด

     

    >>> fileName.encode('ISO-8859-1').decode('UTF-8')
    '๋Œ€๊ตฌ๋„์‹œ์ฒ ๋„๊ณต์‚ฌ_2ํ˜ธ์„  ์—ด์ฐจ์‹œ๊ฐํ‘œ_20220430.csv'

     

    ์™œ ISO-8859-1๋กœ ์ธ์ฝ”๋”ฉ์„ ํ–ˆ๋‹ค๊ฐ€ ๋‹ค์‹œ ๋””์ฝ”๋”ฉํ•˜๋ฉด ์ž˜ ๋ณด์ด๋Š” ๊ฒƒ์ผ๊นŒ? urllib ์ธ์ฝ”๋”ฉ ๊ธฐ๋ณธ ๊ฐ’์ด ISO-8859-1์ด๋ผ์„œ ๊ทธ๋Ÿฐ ๊ฒƒ์ผ๊นŒ? ์•„๋‹ˆ๋ฉด ์„œ๋ฒ„ ์ธก์—์„œ ํŒŒ์ผ๋ช…์„ ISO-8859-1 ํ˜•์‹์œผ๋กœ ๋งŒ๋“  ๋’ค์— ๋ณด๋‚ด๋Š” ๊ฒƒ์ผ๊นŒ? ๋‚˜์ค‘์— ์ƒ๊ฐ๋‚˜๋ฉด ์ฐพ์•„๋ด์•ผ๊ฒ ๋‹ค.

     

    1. ํ•œ ๋ฒˆ ๋ณ€ํ™˜ ๋ฒ„ํŠผ์„ ๋ˆŒ๋ ค์•ผ ํ•ด๋‹น ๋ฉ”๋‰ด๊ฐ€ ์ƒ๊ธด๋‹ค. [๋ณธ๋ฌธ์œผ๋กœ]

    ๋Œ“๊ธ€

์–ด์ œ๋Š” ์ด๊ณณ์— ๋ช…์ด ๋‹ค๋…€๊ฐ”์Šต๋‹ˆ๋‹ค.

Powered & Designed by Tistory