pythono爬取springboot所有的配置项

springboot的配置项很多,也得益于这些配置。我们少写了不少代码。一直想把官方的所有配置项都翻译成中文,然后提供一个在线浏览的服务。

今天算是开始第一步吧,用python爬取解析,并且先在本地生成markdown的文件。

SpringBoot配置项官方文档

https://docs.spring.io/spring-boot/docs/current/reference/html/appendix-application-properties.html

Python脚本

依赖 BeautifulSouprequests

from bs4 import BeautifulSoup
import requests
import os

# 在线文档
doc = 'https://docs.spring.io/spring-boot/docs/current/reference/html/appendix-application-properties.html'

# 生成文件的目录
target_dir = "D:\git\springboot-properties"

# 换行符
new_line = '\n'

with requests.get(doc) as res:

    html = BeautifulSoup(res.text, 'html.parser')
    
    tocs = html.find_all('li')
    
    tableblock = html.find_all('table', class_='tableblock')
    
    toc_length = len(tocs)
    
    with open(target_dir + os.path.sep + 'SUMMARY.md', 'w', encoding='UTF_8') as summary:
    
        summary.write('# 目录')
        summary.write(new_line)
        
        for i in range(toc_length):
            # 标题
            title = tocs[i].find('a').string
            
            # 标题对应的MD文件名称
            file_name = title.replace(' properties', '').replace(' ', '')
            
            # 写入标题
            summary.write('* [%s](%s.md)' % (title, file_name))
            summary.write(new_line)
                
            with open(target_dir + os.path.sep + file_name + '.md', 'w', encoding='UTF_8') as file:
                
                # 写入标题
                file.write('# %s' % title)
                file.write(new_line)
                
                # 写入Header
                file.write('| 配置项 |  默认值 | 说明 |')
                file.write(new_line)
                file.write('| :-----| :---- | :---- |')
                file.write(new_line)
                
                # 属性table
                table = tableblock[i]
                for tr in table.find_all('tr'):
                    # 配置项
                    key = ''
                    # 默认值
                    default_value = ''
                    # 说明
                    desc = ''
                    
                    # 配置项和默认值
                    code = tr.find_all('code')
                    
                    if not code:
                        continue
                        
                    key = code[0].string
                    
                    if len(code) > 1:
                        default_value = code[1].string
                    
                    # 说明
                    p = tr.find_all('p')
                    if p:
                        desc = p[-1].string
        
                    if key:
                        key = '`' + key + '`'
                    if default_value:
                        default_value = '`' + default_value + '`'
                    
                    # 写入配置项
                    file.write('| %s | %s | %s |' % (key, default_value, desc))
                    file.write(new_line)
                
    

Markdown仓库