内容简介:同时在osquery/tables/system/linux/os_version.cpp中对其进行了实现。那么问题就来了,Table schema, the osquery user API, is created using the Python-based “.spec” files in ./specs. More documentation on how specs work can be found in the Creating New Tables developer documentati
在前面的分析文章已经说到了所有表都是在 specs
中定义的。所有在 specs
中定义表的都是对应的cpp实现文件。以 os_version
为例:
table_name("os_version") description("A single row containing the operating system name and version.") schema([ Column("name", TEXT, "Distribution or product name"), Column("version", TEXT, "Pretty, suitable for presentation, OS version"), Column("major", INTEGER, "Major release version"), Column("minor", INTEGER, "Minor release version"), Column("patch", INTEGER, "Optional patch release"), Column("build", TEXT, "Optional build-specific or variant string"), Column("platform", TEXT, "OS Platform or ID"), Column("platform_like", TEXT, "Closely related platforms"), Column("codename", TEXT, "OS version codename"), ]) extended_schema(WINDOWS, [ Column("install_date", TEXT, "The install date of the OS."), ]) implementation("system/os_version@genOSVersion") fuzz_paths([ "/System/Library/CoreServices/SystemVersion.plist", ])
同时在osquery/tables/system/linux/os_version.cpp中对其进行了实现。那么问题就来了, os_version.table
中表的定义是如何与实现文件以及最终的查询结果相关联的呢?在osquery的文档 ReadTheDocs Wiki 中存在这样的一段话:
Table schema, the osquery user API, is created using the Python-based “.spec” files in ./specs. More documentation on how specs work can be found in the Creating New Tables developer documentation. These files are used to build osquery, but can be parsed to create JSON-based API schema. This JSON is published to the homepage at [ https://osquery.io/schema/] .
简而言之,就是所有的 *.spec
都是基于 Python 语法创建的文件。osquery会在编译期间利用这种文件生成对应表的schema。接下来我们就深入osquery的细节看是如何实现由 *.tables
到最终的查询语句。
CMakeLibs.cmake
当我们调用 make
时,实际会调用到 CMake/CMakeLibs.cmake
。通过分析,在470行左右存在与生成表有关的宏定义。如下:
# Find and generate table plugins from .table syntax macro(GENERATE_TABLES TABLES_PATH) # Get all matching files for all platforms. set(TABLES_SPECS "${TABLES_PATH}/specs") set(TABLE_CATEGORIES "") if(APPLE) list(APPEND TABLE_CATEGORIES "darwin" "posix" "macwin") elseif(FREEBSD) list(APPEND TABLE_CATEGORIES "freebsd" "posix") elseif(LINUX) list(APPEND TABLE_CATEGORIES "linux" "posix" "linwin") elseif(WINDOWS) list(APPEND TABLE_CATEGORIES "windows" "macwin" "linwin") else() message( FATAL_ERROR "Unknown platform detected, cannot generate tables") endif() # Features optionally disabled. if(NOT SKIP_LLDPD AND NOT WINDOWS) list(APPEND TABLE_CATEGORIES "lldpd") endif() if(NOT SKIP_YARA AND NOT WINDOWS) list(APPEND TABLE_CATEGORIES "yara") endif() if(NOT SKIP_TSK AND NOT WINDOWS) list(APPEND TABLE_CATEGORIES "sleuthkit") endif() if(NOT SKIP_SMART AND NOT WINDOWS) list(APPEND TABLE_CATEGORIES "smart") endif() file(GLOB TABLE_FILES "${TABLES_SPECS}/*.table") set(TABLE_FILES_FOREIGN "") file(GLOB ALL_CATEGORIES RELATIVE "${TABLES_SPECS}" "${TABLES_SPECS}/*") foreach(CATEGORY ${ALL_CATEGORIES}) if(IS_DIRECTORY "${TABLES_SPECS}/${CATEGORY}" AND NOT "${CATEGORY}" STREQUAL "utility") file(GLOB TABLE_FILES_PLATFORM "${TABLES_SPECS}/${CATEGORY}/*.table") list(FIND TABLE_CATEGORIES "${CATEGORY}" INDEX) if(${INDEX} EQUAL -1) # Append inner tables to foreign list(APPEND TABLE_FILES_FOREIGN ${TABLE_FILES_PLATFORM}) else() # Append inner tables to TABLE_FILES. list(APPEND TABLE_FILES ${TABLE_FILES_PLATFORM}) endif() endif() endforeach() # Generate a set of targets, comprised of table spec file. get_property(TARGETS GLOBAL PROPERTY AMALGAMATE_TARGETS) set(NEW_TARGETS "") foreach(TABLE_FILE ${TABLE_FILES}) list(FIND TARGETS "${TABLE_FILE}" INDEX) if (${INDEX} EQUAL -1) # Do not set duplicate targets. list(APPEND NEW_TARGETS "${TABLE_FILE}") endif() endforeach() set_property(GLOBAL PROPERTY AMALGAMATE_TARGETS "${NEW_TARGETS}") set_property(GLOBAL PROPERTY AMALGAMATE_FOREIGN_TARGETS "${TABLE_FILES_FOREIGN}") endmacro()
由于我们的编译打包是在 Linux 平台下,所以我们就以 Linux
平台为例来进行说明。
确定运行平台
elseif(LINUX) list(APPEND TABLE_CATEGORIES "linux" "posix" "linwin")
确定需要编译的表
file(GLOB TABLE_FILES "${TABLES_SPECS}/*.table") set(TABLE_FILES_FOREIGN "") file(GLOB ALL_CATEGORIES RELATIVE "${TABLES_SPECS}" "${TABLES_SPECS}/*") foreach(CATEGORY ${ALL_CATEGORIES}) if(IS_DIRECTORY "${TABLES_SPECS}/${CATEGORY}" AND NOT "${CATEGORY}" STREQUAL "utility") file(GLOB TABLE_FILES_PLATFORM "${TABLES_SPECS}/${CATEGORY}/*.table") list(FIND TABLE_CATEGORIES "${CATEGORY}" INDEX) if(${INDEX} EQUAL -1) # Append inner tables to foreign list(APPEND TABLE_FILES_FOREIGN ${TABLE_FILES_PLATFORM}) else() # Append inner tables to TABLE_FILES. list(APPEND TABLE_FILES ${TABLE_FILES_PLATFORM}) endif() endif() endforeach()
-
file(GLOB TABLE_FILES "${TABLES_SPECS}/*.table")
,得到所有定义表的文件,即在 specs 目录下所有的文件; -
file(GLOB ALL_CATEGORIES RELATIVE "${TABLES_SPECS}" "${TABLES_SPECS}/*")
,得到所有的平台。在osquery架构一览中也说过,osquery是通过目录结构来区分得到运行平台的。所以在这一步得到的ALL_CATEGORIES
就包括了linux
、windows
、darwin
、macwin
等等。 -
判断
specs
中的表是在什么平台下的。list(FIND TABLE_CATEGORIES "${CATEGORY}" INDEX) if(${INDEX} EQUAL -1) # Append inner tables to foreign list(APPEND TABLE_FILES_FOREIGN ${TABLE_FILES_PLATFORM}) else() # Append inner tables to TABLE_FILES. list(APPEND TABLE_FILES ${TABLE_FILES_PLATFORM})
如果是在设定的平台下,则将对应的
.table
文件加入到TABLE_FILES
中,否则加入到TABLE_FILES_FOREIGN
中。
设置编译目录
macro(GENERATE_TABLE TABLE_FILE FOREIGN NAME BASE_PATH OUTPUT) GET_GENERATION_DEPS(${BASE_PATH}) set(TABLE_FILE_GEN "${TABLE_FILE}") string(REGEX REPLACE ".*/specs.*/(.*)\\.table" "${CMAKE_BINARY_DIR}/generated/tables_${NAME}/\\1.cpp" TABLE_FILE_GEN ${TABLE_FILE_GEN} ) add_custom_command( OUTPUT "${TABLE_FILE_GEN}" COMMAND "${PYTHON_EXECUTABLE}" "${BASE_PATH}/tools/codegen/gentable.py" "${FOREIGN}" "${TABLE_FILE}" "${TABLE_FILE_GEN}" DEPENDS ${TABLE_FILE} ${GENERATION_DEPENDENCIES} WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}" ) list(APPEND ${OUTPUT} "${TABLE_FILE_GEN}") endmacro(GENERATE_TABLE) ..... macro(AMALGAMATE BASE_PATH NAME OUTPUT) GET_GENERATION_DEPS(${BASE_PATH}) if("${NAME}" STREQUAL "foreign") get_property(TARGETS GLOBAL PROPERTY AMALGAMATE_FOREIGN_TARGETS) set(FOREIGN "--foreign") else() get_property(TARGETS GLOBAL PROPERTY AMALGAMATE_TARGETS) endif()
- 为每一个
table_name.table
文件生成一个对应的cpp文件,路径是/generated/tables_${NAME}/table_name.cpp
- 通过
add_custom_command(...)
,调用tools/codegen/gentable.py
执行Python代码。为每一个table_name.table
生成对应的cpp文件 -
list(APPEND ${OUTPUT} "${TABLE_FILE_GEN}")
输出所有编译成功的文件 - 之后根据
if("${NAME}" STREQUAL "foreign")
的判断,决定将编译之后的cpp文件放置在什么目录。
合并编译文件
所有平台上面需要的文件最终都在 AMALGAMATE_TARGETS
目录下。
# Append all of the code to a single amalgamation. set(AMALGAMATION_FILE_GEN "${CMAKE_BINARY_DIR}/generated/${NAME}_amalgamation.cpp") add_custom_command( OUTPUT ${AMALGAMATION_FILE_GEN} COMMAND "${PYTHON_EXECUTABLE}" "${BASE_PATH}/tools/codegen/amalgamate.py" "${FOREIGN}" "${BASE_PATH}/tools/codegen/" "${CMAKE_BINARY_DIR}/generated" "${NAME}" DEPENDS ${GENERATED_TARGETS} ${GENERATION_DEPENDENCIES} WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}" ) set(${OUTPUT} ${AMALGAMATION_FILE_GEN})
-
set(AMALGAMATION_FILE_GEN "${CMAKE_BINARY_DIR}/generated/${NAME}_amalgamation.cpp")
,会根据name
分别生成不同的文件。最终会得到additional_amalgamation.cpp
,foreign_amalgamation.cpp
和utils_amalgamation.cpp
- 调用
tools/codegen/amalgamate.py
执行Python代码。"${BASE_PATH}/tools/codegen/amalgamate.py" "${FOREIGN}" "${BASE_PATH}/tools/codegen/" "${CMAKE_BINARY_DIR}/generated" "${NAME}"
根据不同的目录分别生成对应的cpp文件。 -
set(${OUTPUT} ${AMALGAMATION_FILE_GEN})
输出最终结果。
最终的目录结果如下所示:
表结构
还是以 os_version
表结构为例来进行说明。在前面的 os_version.table
中已经给出了表的定义,那么最终就会在 cmake-build-debug/generated/tables_additional/os_version.cpp
中生成对应的cpp代码。如下所示:
namespace osquery { /// BEGIN[GENTABLE] namespace tables { osquery::QueryData genOSVersion(QueryContext& context); } class osVersionTablePlugin : public TablePlugin { private: TableColumns columns() const override { return { std::make_tuple("name", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("version", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("major", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("minor", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("patch", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("build", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("platform", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("platform_like", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("codename", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("install_date", TEXT_TYPE, ColumnOptions::HIDDEN), }; } TableAttributes attributes() const override { return TableAttributes::NONE; } QueryData generate(QueryContext& context) override { auto results = tables::genOSVersion(context); return results; } }; REGISTER(osVersionTablePlugin, "table", "os_version"); /// END[GENTABLE] }
同时这部分代码会最终合并到 cmake-build-debug/generated/additional_amalgamation.cpp
中。如下:
namespace tables { osquery::QueryData genOSVersion(QueryContext& context); } class osVersionTablePlugin : public TablePlugin { private: TableColumns columns() const override { return { std::make_tuple("name", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("version", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("major", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("minor", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("patch", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("build", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("platform", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("platform_like", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("codename", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("install_date", TEXT_TYPE, ColumnOptions::HIDDEN), }; } TableAttributes attributes() const override { return TableAttributes::NONE; } QueryData generate(QueryContext& context) override { auto results = tables::genOSVersion(context); return results; } }; REGISTER(osVersionTablePlugin, "table", "os_version"); }
可以看到 os_version.cpp
中的代码完全合并到 additional_amalgamation.cpp
中了。
gentable.py实现
gentable.py
文件是位于 tools/codegen/gentable.py
中。我们分步来看其代码实现,是如何由 os_version.table
变为了 os_version.cpp
.
Main
def main(argc, argv): parser = argparse.ArgumentParser( "Generate C++ Table Plugin from specfile.") parser.add_argument( "--debug", default=False, action="store_true", help="Output debug messages (when developing)" ) parser.add_argument("--disable-blacklist", default=False, action="store_true") parser.add_argument("--foreign", default=False, action="store_true", help="Generate a foreign table") parser.add_argument("--templates", default=SCRIPT_DIR + "/templates", help="Path to codegen output .cpp.in templates") parser.add_argument("spec_file", help="Path to input .table spec file") parser.add_argument("output", help="Path to output .cpp file") args = parser.parse_args() if args.debug: logging.basicConfig(format=LOG_FORMAT, level=logging.DEBUG) else: logging.basicConfig(format=LOG_FORMAT, level=logging.INFO) filename = args.spec_file output = args.output if filename.endswith(".table"): # Adding a 3rd parameter will enable the blacklist setup_templates(args.templates) with open(filename, "rU") as file_handle: tree = ast.parse(file_handle.read()) exec(compile(tree, "<string>", "exec")) blacklisted = is_blacklisted(table.table_name, path=filename) if not args.disable_blacklist and blacklisted: table.blacklist(output) else: template_type = "default" if not args.foreign else "foreign" table.generate(output, template=template_type)
其中关键代码是 tree = ast.parse(file_handle.read());exec(compile(tree, "<string>", "exec"))
。由于 os_version.table
本身就是Python的语法写成的文件,首先通过 tree = ast.parse(file_handle.read())
得到此文件对应的语法树。之后调用 exec(compile(tree, "<string>", "exec"))
。那么 exec()
和 compile()
在Python中是什么含义呢?关于这一点可以看文章 whats-the-difference-between-eval-exec-and-compile 。其实 compile()
函数将一个字符串编译为字节代码,而 exec()
就能够执行其代码。根据 stackoverflow
中的例子就可以对这种用法有一个很清晰的理解。
>>> eval(compile('42', '<string>', 'exec')) # code returns None >>> eval(compile('42', '<string>', 'eval')) # code returns 42 42 >>> exec(compile('42', '<string>', 'eval')) # code returns 42, >>> # but ignored by exec
那么 exec(compile(tree, "<string>", "exec"))
就是执行其中的代码。举例来说,以 table_name("os_version")
为例,其实当执行到这行时,就认为是需要调用 table_name()
函数,其参数值是 os_version
.同样 description("A single row containing the operating system name and version.")
就是执行 description()
函数,其参数值是 A single row.....
.
那么通过看 os_version.table
的定义,我们就可以知道在 gentable.py
就存在每一个方法的实现。比如
-
table_name(name,aliases=[])
-
description(text)
-
schema(schema_list)
-
class Column(object)
def __init__(self, name, col_type, description="", aliases=[], **kwargs): self.name = name self.type = col_type self.description = description self.aliases = aliases self.options = kwargs
-
extended_schema(check, schema_list)
-
implementation(impl_string, generator=False)
-
fuzz_paths(paths)
generate
在对 gentable.py
中的关键函数介绍完了,接下来主要看看最主要的供。由python代码变为cpp代码。在 main()
中的关键代码是:
output = args.output ..... template_type = "default" if not args.foreign else "foreign" table.generate(output, template=template_type)
跟踪进入到 generate(self, path, template="default")
函数中:
def generate(self, path, template="default"): ..... self.impl_content = jinja2.Template(TEMPLATES[template]).render( table_name=self.table_name, table_name_cc=to_camel_case(self.table_name), schema=self.columns(), header=self.header, impl=self.impl, function=self.function, class_name=self.class_name, attributes=self.attributes, examples=self.examples, aliases=self.aliases, has_options=self.has_options, has_column_aliases=self.has_column_aliases, generator=self.generator, attribute_set=[TABLE_ATTRIBUTES[attr] for attr in self.attributes if attr in TABLE_ATTRIBUTES], )
可以看到是采用的 jinja2
进行渲染的。我们这里以 default.cpp.in
模板为例进行说明:
class {{table_name_cc}}TablePlugin : public TablePlugin { private: TableColumns columns() const override { return { {% for column in schema %}\ std::make_tuple("{{column.name}}", {{column.type.affinity}},\ {% if column.options|length > 0 %} {{column.options_set}}\ {% else %} ColumnOptions::DEFAULT\ {% endif %}\ ), {% endfor %}\ }; } ........
这个就是一个典型使用 jinja2
渲染的例子了。那么最终渲染得到的结果就是:
class osVersionTablePlugin : public TablePlugin { private: TableColumns columns() const override { return { std::make_tuple("name", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("version", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("major", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("minor", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("patch", INTEGER_TYPE, ColumnOptions::DEFAULT), std::make_tuple("build", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("platform", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("platform_like", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("codename", TEXT_TYPE, ColumnOptions::DEFAULT), std::make_tuple("install_date", TEXT_TYPE, ColumnOptions::DEFAULT), }; } ......
通过这种方式每一个table最终都会有一个对应的cpp文件。
amalgamate
在 generate
阶段为每一个 table
文件都生成了一个 cpp
文件,那么接下来就是合并所有的 cpp
生成一个 additional_amalgamation.cpp
.而这个工作是由 codegen/amalgamate.py
完成的。
def main(): tables_folder = os.path.join(args.generated, "tables_%s" % (args.category)) for base, _, filenames in os.walk(tables_folder): for filename in filenames: if filename == args.category: continue table_data = genTableData(os.path.join(base, filename)) if table_data is not None: tables.append(table_data) ...... env = jinja2.Environment(keep_trailing_newline=True) amalgamation = env.from_string(template_data).render(tables=tables,foreign=args.foreign) output = os.path.join(args.generated, "%s_amalgamation.cpp" % args.category) try: os.makedirs(os.path.dirname(output)) except: # Generated folder already exists pass with open(output, "w") as fh: fh.write(amalgamation)
其中关键的代码是 amalgamation = env.from_string(template_data).render(tables=tables,foreign=args.foreign)
,就是用来对所有读取到的cpp的内容进行渲染,得到 amalgamation
,最终通过 fh.write(amalgamation)
写入到文件。而 amalgamation.cpp.in
的内容也非常的简单,如下:
namespace osquery { {% if foreign %} void registerForeignTables() { {% endif %} {% for table in tables %} {{table}} {% endfor %} {% if foreign %} } {% endif %} }
最终得到的 cmake-build-debug/generated/additional_amalgamation.cpp
内容如下:
osquery获取表定义
前面都是分析的在编译osquery的过程中是如何生成对应的 additional_amalgamation.cpp
的。在文章 osquery动态调试和重打包 中讲到最终都会调用 QueryData generate(QueryContext& context)
方法从而执行正在运行逻辑功能的cpp,那么某个类中的 https://blog.spoock.com/2019/01/04/osquery-dynamic-debug/
这个方法又是何时被调用的呢?
我们还是以 select * from hosts;
的查询为例来进行说明。当我们输入了 select * from hosts;
之后,同样会经历 sqlite 前端的 shell 的一系列查询,最终会进入到 osquery/sql/virtual_table.cpp
中的 xCreate()
中。
最终经过 osquery/core/tables.cpp
中的 routeInfo()
函数:
可以看到此时需要调用 columns()
方法。但是此时如何知道是哪个类的 columns()
方法呢?通过上述中的 this->name_.c_str()
发现表是 etc_hosts()
。由于每一个表的实现都是继承自 TablePlugin
类,所以在此处执行 columns()
方法就类似于 java 中的多态一样。在执行时根据具体调用的类执行对应的方法。所以此时就会执行 etcHostsTablePlugin::columns()
。如下:
以上全部都是用于获取到 etc_hosts
表的定义的整个流程
osquery执行查询
- 程序会重新运行到
osquery/devtools/shell.cpp
中的rc = shell_exec(zSql, shell_callback, p, &zErrMsg);
- 进入到
rc = sqlite3_step(pStmt);
- 之后所有的查询流程和文章中的 osquery动态调试和重打包 已经描述了,这里也不在赘述了。
需要注意的是,在查询过程中上述的 osquery的获取表的定义 只会在第一次查询的时候才会做。之后的每次查询这张表都不会重新获取表的定义,而每一次查询都会重新走一遍 osquery执行查询 的过程
总结
通过上述的分析,作为第三方的开发者也很方面地修改已经存在的表。我们只需要修改表的定义,然后在具体的实现cpp中修改其对应的逻辑即可,而我们不需要关系osquery内部是如何协同工作的。通过分析osquery的源代码,也是第一次看到了C++中的多态的实现与使用
拥有快速学习能⼒的⽩帽子,是不能有短板的。有的只是⼤量的标准板和⼏块长板
以上
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。