CCPM & FGCNN：使用 CNN 进行特征生成的 CTR 预测模型

栏目: 数据库 · 发布时间: 6年前

内容简介：文章作者：沈伟臣阿里巴巴算法工程师内容来源：作者授权发布

文章作者：沈伟臣阿里巴巴算法工程师

内容来源：作者授权发布

出品社区：DataFun

注：欢迎转载，转载请注明出处。

前言

今天主要通过两篇论文介绍如何将 CNN 应用在传统的结构化数据预测任务中，尽量以精简的语言说明主要问题，并提供代码实现和运行 demo ，细节问题请参阅论文。

CIKM'15 《A Convolutional Click Prediction Model》
WWW'19《Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction》

CNN 在计算机视觉领域占据着主导地位，在自然语言处理领域也有广泛的应用。基于点击率预测任务和自然语言处理中一些任务的相似性(大规模稀疏特征)， NLP 的一些方法和 CTR 预测任务的方法其实也是可以互通的。

A Convolutional Click Prediction Model

模型结构

CCPM & FGCNN：使用 CNN 进行特征生成的 CTR 预测模型

主要思想

通过一个 (width, 1) 的 kernel 进行对特征的 embedding 矩阵进行二维卷积，其中 width 表示的每次对连续的 width 个特征进行卷积运算，之后使用一个 Flexible pooling 机制进行池化操作进行特征聚合和压缩表示，堆叠若干层后将得到特征矩阵作为 MLP 的输入，得到最终的预测结果。

这里解释两个问题：

1. 为什么强调是连续的 width 个特征进行卷积

我们都知道 CNN 之所以在 CV 领域大放异彩是由于其具有如下特性

参数共享通常一个特征检测子（如边缘检测）在图像某一部位有用也在其他部位生效。
稀疏连接每一层的输出只依赖于前一层一小部分的输入

在 NLP 任务中由于语句天然存在前后依赖关系，所以使用 CNN 能获得一定的特征表达，那么在 CTR 任务中使用 CNN 能获得特征提取的功能吗？

答案是能，但是效果可能没有那么好，问题就出在卷积是对连续的 width 个特征进行计算，这导致了我们输入特征的顺序发生变化就会引起结果的变化，而在 CTR 任务中，我们的特征输入是没有顺序的。

这相当于我们给了一个先验在里面，就是连续的 width 个特征进行组合更具有意义。

虽然我们可以使用类似空洞卷积的思想增加感受野来使得卷积计算的时候跨越多个特征，但是这仍然具有一定的随机性。所以使用 CNN 进行 CTR 任务的特征提取的一个难点就在于其计算的是局部特征组合，无法有效捕捉全局组合特征。

2. Flexible pooliong 是什么？

其实就是 Max Pooling ，只不过每次沿某一维度取 p 个最大的，不是1个最大的。 p 的取值根据当前池化层数和总层数自适应计算，其中 i 是当前层数, l 是总层数

CCPM & FGCNN：使用 CNN 进行特征生成的 CTR 预测模型

核心代码

这里就简单贴一下卷积池化还有最后全连接层的对应代码，完整的代码请参考

https://github.com/shenweichen/DeepCTR/blob/master/deepctr/models/ccpm.py

for i in range(1, l + 1):
        filters = conv_filters[i - 1]
        width = conv_kernel_width[i - 1]
        k = max(1, int((1 - pow(i / l, l - i)) * n)) if i < l else 3

        conv_result = tf.keras.layers.Conv2D(filters=filters, kernel_size=(width, 1), strides=(1, 1), padding='same',
                                             activation='tanh', use_bias=True, )(pooling_result)
        pooling_result = KMaxPooling(
            k=min(k, conv_result.shape[1].value), axis=1)(conv_result)

Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction

文章的主要贡献点有2个：

使用重组层进行特征生成缓解了 CCPM 中 CNN 无法有效捕获全局组合特征的问题
FGCNN 作为一种特征生成方法，可以和任意模型进行组合

模型结构

CCPM & FGCNN：使用 CNN 进行特征生成的 CTR 预测模型

分组嵌入

由于原始特征既要作为后续模型的输入，又要作为 FGCNN 模块的输入，所以原始特征的 embedding 向量可能会遇到梯度耦合的问题。这里对于 FGCNN 模块使用一套独立的 embedding 向量，避免梯度耦合的问题。

卷积层和池化层

卷积和池化和 CCPM 类似，池化层使用的是普通的 Max Pooling 。

重组层

我们之前提到了，使用 CNN 进行 CTR 任务的特征提取的一个难点就在于其计算的是局部特征组合。所以这里作者提出使用一个重组的机制来生成全局组合特征，做法是将池化后的 Feature Maps ( )展平成一个向量，然后使用单层的神经网络进行特征组合，输出维度受超参数控制。

拼接层

经过若干重组后，将重组后生成的特征拼接上原始的特征作为新的输入，后面可以使用各种其他的方法，如 LR，FM，DeepFM 等。

实验结果对比

IPNN-FGCNN 于其他 stoa 模型的对比

CCPM & FGCNN：使用 CNN 进行特征生成的 CTR 预测模型

作为特征生成模型的效果

CCPM & FGCNN：使用 CNN 进行特征生成的 CTR 预测模型

核心代码

这里分两部分介绍，一个是 FGCNN 的特征生成模块，一个使用 FGCNN 进行特征扩充的 IPNN 介绍。

FGCNN模块

embedding_size = inputs.shape[-1].value
pooling_result = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=3))(inputs)

new_feature_list = []

for i in range(1, len(self.filters) + 1):
    filters = self.filters[i - 1]
    width = self.kernel_width[i - 1]
    new_filters = self.new_maps[i - 1]
    pooling_width = self.pooling_width[i - 1]
    conv_result = tf.keras.layers.Conv2D(filters=filters, kernel_size=(width, 1), strides=(1, 1),padding='same',activation='tanh', use_bias=True, )(pooling_result)
    pooling_result = tf.keras.layers.MaxPooling2D(pool_size=(pooling_width, 1))(conv_result)
    flatten_result = tf.keras.layers.Flatten()(pooling_result)
    new_result = tf.keras.layers.Dense(pooling_result.shape[1].value * embedding_size * new_filters,
                                               activation='tanh', use_bias=True)(flatten_result)
    new_feature_list.append(tf.keras.layers.Reshape((pooling_result.shape[1].value * new_filters, embedding_size))(new_result))
new_features = concat_fun(new_feature_list, axis=1)

使用 FGCNN 进行特征扩充的 IPNN

完整代码请参考

https://github.com/shenweichen/DeepCTR/blob/master/deepctr/models/fgcnn.py

特征分组嵌入

根据输入特征分别得到 deep_emb_list 和 fg_deep_emb_list ，其中 fg_deep_emb_list 用于 FGCNN 模块的输入。

deep_emb_list, fg_deep_emb_list, linear_logit, inputs_list = preprocess_input_embedding(feature_dim_dict,embedding_size,l2_reg_embedding,l2_reg_linear, init_std, seed, True)

CNN 特征生成模块通过封装好的 FGCNNLayer ，生成 CNN 提取的组合特征 new_features

new_features = FGCNNLayer(conv_filters, conv_kernel_width, new_maps, pooling_width)(fg_input)

拼接层将原始 embedding 输入和新特征拼接，生成组合输入 combined_input

combined_input = concat_fun([origin_input, new_features], axis=1)

交叉层这部分可以看作是一个独立的模型，论文里是用的 IPNN 模型，其实这里可以自由的替换成任意结构， deepctr.layers.interaction 里面的大部分层都可以在这里使用。

inner_product = tf.keras.layers.Flatten()(InnerProductLayer()(
        tf.keras.layers.Lambda(unstack, mask=[None] * combined_input.shape[1].value)(combined_input)))
linear_signal = tf.keras.layers.Flatten()(combined_input)
dnn_input = tf.keras.layers.Concatenate()([linear_signal, inner_product])
dnn_input = tf.keras.layers.Flatten()(dnn_input)

final_logit = DNN(dnn_hidden_units, dropout_rate=dnn_dropout,
                      l2_reg=l2_reg_dnn)(dnn_input)
final_logit = tf.keras.layers.Dense(1, use_bias=False)(final_logit)
output = PredictionLayer(task)(final_logit)

model = tf.keras.models.Model(inputs=inputs_list, outputs=output)

运行用例

首先确保你的 python 版本为2.7，3.4，3.5或3.6，然后 pip install deepctr (如果已经安装了 tensorflow-gpu ，请使用命令 pip install deepctr --no-deps ，)，再去下载一下 demo 数据（ https://github.com/shenweichen/DeepCTR/blob/master/examples/criteo_sample.txt ）然后直接运行下面的代码吧!

import pandas as pdfrom sklearn.preprocessing import LabelEncoder, MinMaxScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import log_loss, roc_auc_scorefrom deepctr.models import CCPM,FGCNNfrom deepctr.utils import SingleFeatif __name__ == "__main__":
    data = pd.read_csv('./criteo_sample.txt')

    sparse_features = ['C' + str(i) for i in range(1, 27)]
    dense_features = ['I'+str(i) for i in range(1, 14)]

    data[sparse_features] = data[sparse_features].fillna('-1', )
    data[dense_features] = data[dense_features].fillna(0,)
    target = ['label']

    # 1.Label Encoding for sparse features,and do simple Transformation for dense features
    for feat in sparse_features:
        lbe = LabelEncoder()
        data[feat] = lbe.fit_transform(data[feat])
    mms = MinMaxScaler(feature_range=(0, 1))
    data[dense_features] = mms.fit_transform(data[dense_features])

    # 2.count #unique features for each sparse field,and record dense feature field name

    sparse_feature_list = [SingleFeat(feat, data[feat].nunique())
                           for feat in sparse_features]
    dense_feature_list = [SingleFeat(feat, 0)
                          for feat in dense_features]

    # 3.generate input data for model

    train, test = train_test_split(data, test_size=0.2)
    train_model_input = [train[feat.name].values for feat in sparse_feature_list] + \        [train[feat.name].values for feat in dense_feature_list]
    test_model_input = [test[feat.name].values for feat in sparse_feature_list] + \        [test[feat.name].values for feat in dense_feature_list]

    # 4.Define Model,train,predict and evaluate
    model = FGCNN({"sparse": sparse_feature_list,
                    "dense": dense_feature_list})
    #model = CCPM({"sparse": sparse_feature_list, "dense": dense_feature_list})                
    model.compile("adam", "binary_crossentropy",
                  metrics=['binary_crossentropy'], )

    history = model.fit(train_model_input, train[target].values,
                        batch_size=256, epochs=10, verbose=2, validation_split=0.2, )
    pred_ans = model.predict(test_model_input, batch_size=256)
    print("test LogLoss", round(log_loss(test[target].values, pred_ans), 4))
    print("test AUC", round(roc_auc_score(test[target].values, pred_ans), 4))