Why put test[u][0] into the item_index list ?? #11

BEbillionaireUSD · 2021-02-07T11:53:35Z

Hi, could you please answer my question?
In the evaluate function, it makes an item_index list and put the test[u][0] in it.
What I consider is that the test[u][0] should be what we want to predict, but in this way, the model knows it should predict from the possibility of these candidates, including the one we want to predict.
Is this a kind of data leaking? Or did I misunderstand something?

for u in users:

        if len(train[u]) < 1 or len(test[u]) < 1: continue

        seq = np.zeros([args.maxlen], dtype=np.int32)
        idx = args.maxlen - 1
        seq[idx] = valid[u][0]
        idx -= 1
        for i in reversed(train[u]):
            seq[idx] = i
            idx -= 1
            if idx == -1: break
        rated = set(train[u])
        rated.add(0)
        item_idx = [test[u][0]]  ##### WHY???
        for _ in range(100):
            t = np.random.randint(1, itemnum + 1)
            while t in rated: t = np.random.randint(1, itemnum + 1)
            item_idx.append(t)

        predictions = -model.predict(*[np.array(l) for l in [[u], [seq], item_idx]])
        predictions = predictions[0] # - for 1st argsort DESC

        rank = predictions.argsort().argsort()[0].item()

My understanding of this phase is that: The model randomly chooses 100 candidates from all items (except those in the train sequence ) and adds the one it wants to predict into the candidate set. Then it predicts the probability of these 101 candidates. The logic seems to be strange.

The text was updated successfully, but these errors were encountered:

pmixer · 2021-02-07T13:43:52Z

@cherylLbt hi, item_index is negatively sampled items to be ranked by the model, thus it must contain the real-next-item.

BEbillionaireUSD · 2021-02-08T03:00:42Z

Thanks for your quick reply. But what if I want to predict the next item while I don't know the real-next one?

pmixer · 2021-02-08T03:31:14Z

Thanks for your quick reply. But what if I want to predict the next item while I don't know the real-next one?

u are welcome :)

We need to 'predict' only when we do not know sth, pls pay attention to the difference between model evaluation(testing, see what the model can do when we have some sample data to test it) and model inference(online serving, we do not know the right answer like real-next item, thus deploy a model to somehow predict it).

Moreover, for a recommender system, you at least have two options for model serving:

rank all items for all time like what got addressed in https://www.kdd.org/kdd2020/accepted-papers/view/on-sampled-metrics-for-item-recommendation
firstly recall a set of items(for example 100), then rank these recalled items, this is a common approach for industry recommender system

for But what if I want to predict the next item while I don't know the real-next one?, again, we do need to predict because we do not know. If you mean for model evaluation, just rank all items(like in https://github.com/pmixer/TiSASRec.debug), or just keep the original setting do not add real-next one, we would definitely fail for cases in which the real-next one not included in the negatively sample set.

BEbillionaireUSD · 2021-02-08T03:48:26Z

Thanks a lot for your detailed explanation, I understand!!

BEbillionaireUSD · 2021-02-08T16:05:19Z

Hi, sorry to bother you. I trained the model 800 epochs and did predictions on the dataset movielen-1m (i. e. I mask the last item and input the previous sequence into it. For the item_index, I put in all items. Then I calculated the probability of all items)
The training Hit Rate is only 0.15 in this way. Is it normal?

pmixer · 2021-02-10T13:53:07Z

Hi, sorry to bother you. I trained the model 800 epochs and did predictions on the dataset movielen-1m (i. e. I mask the last item and input the previous sequence into it. For the item_index, I put in all items. Then I calculated the probability of all items)
The training Hit Rate is only 0.15 in this way. Is it normal?

As expected. See check https://github.com/pmixer/TiSASRec.debug and #6 if you are interested. Sharing on how to solve this issue is more than welcomed.

BEbillionaireUSD · 2021-02-10T13:59:32Z

Thanks a lot!!! 黄(Huáng)瓒(Zàn) <[email protected]> 于2021年2月10日周三下午9:53写道：

…

Hi, sorry to bother you. I trained the model 800 epochs and did predictions on the dataset movielen-1m (i. e. I mask the last item and input the previous sequence into it. For the item_index, I put in all items. Then I calculated the probability of all items) The training Hit Rate is only 0.15 in this way. Is it normal? Expected. See check https://github.com/pmixer/TiSASRec.debug and #6 <#6> if you are interested. Sharing on how to solve this issue is more than welcomed. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALYDYHTUYB77H6BXVB5G47TS6KFVPANCNFSM4XHL3MIA> .

-- Regards, Baitong Li

BEbillionaireUSD · 2021-02-10T14:34:47Z

Thanks a lot. It totally answers my question. Hope to promote this problem

pmixer added the documentation Improvements or additions to documentation label Feb 8, 2021

BEbillionaireUSD closed this as completed Feb 8, 2021

BEbillionaireUSD reopened this Feb 8, 2021

BEbillionaireUSD closed this as completed Feb 10, 2021

pmixer pinned this issue Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why put test[u][0] into the item_index list ?? #11

Why put test[u][0] into the item_index list ?? #11

BEbillionaireUSD commented Feb 7, 2021

pmixer commented Feb 7, 2021

BEbillionaireUSD commented Feb 8, 2021

pmixer commented Feb 8, 2021

BEbillionaireUSD commented Feb 8, 2021

BEbillionaireUSD commented Feb 8, 2021

pmixer commented Feb 10, 2021 •

edited

Loading

BEbillionaireUSD commented Feb 10, 2021 via email

BEbillionaireUSD commented Feb 10, 2021

Why put test[u][0] into the item_index list ?? #11

Why put test[u][0] into the item_index list ?? #11

Comments

BEbillionaireUSD commented Feb 7, 2021

pmixer commented Feb 7, 2021

BEbillionaireUSD commented Feb 8, 2021

pmixer commented Feb 8, 2021

BEbillionaireUSD commented Feb 8, 2021

BEbillionaireUSD commented Feb 8, 2021

pmixer commented Feb 10, 2021 • edited Loading

BEbillionaireUSD commented Feb 10, 2021 via email

BEbillionaireUSD commented Feb 10, 2021

pmixer commented Feb 10, 2021 •

edited

Loading