copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
BERTopic - GitHub Pages BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
Quick Start - BERTopic - GitHub Pages In BERTopic, there are a number of different topic representations that we can choose from They are all quite different from one another and give interesting perspectives and variations of topic representations
6B. LLM Generative AI - BERTopic - GitHub Pages Zephyr is a fine-tuned version of Mistral 7B that was trained on a mix of publicly available and synthetic datasets using Direct Preference Optimization (DPO) To use Zephyr in BERTopic, we will first need to install and update a couple of packages that can handle quantized versions of Zephyr:
The Algorithm - BERTopic - GitHub Pages Below, you will find different types of overviews of each step in BERTopic's main algorithm Each successive overview will be more in-depth than the previous overview
Tips Tricks - BERTopic - GitHub Pages Topic-term matrix Although BERTopic focuses on clustering our documents, the end result does contain a topic-term matrix This topic-term matrix is calculated using c-TF-IDF, a TF-IDF procedure optimized for class-based analyses To extract the topic-term matrix (or c-TF-IDF matrix) with the corresponding words, we can simply do the following:
Best Practices - BERTopic - GitHub Pages In BERTopic, you can model many different topic representations simultaneously to test them out and get different perspectives of topic descriptions This is called multi-aspect topic modeling Here, we will demonstrate a number of interesting and useful representations in BERTopic: KeyBERTInspired
FAQ - BERTopic - GitHub Pages Due to the stochastic nature of UMAP, the results from BERTopic might differ even if you run the same code multiple times Using custom embeddings allows you to try out BERTopic several times until you find the topics that suit you best You only need to generate the embeddings themselves once and run BERTopic several times with different
5. c-TF-IDF - BERTopic - GitHub Pages This class-based TF-IDF representation is enabled by default in BERTopic However, we can explicitly pass it to BERTopic through the ctfidf_model allowing for parameter tuning and the customization of the topic extraction technique:
Guided Topic Modeling - BERTopic - GitHub Pages Guided BERTopic has two main steps: First, we create embeddings for each seeded topic by joining them and passing them through the document embedder These embeddings will be compared with the existing document embeddings through cosine similarity and assigned a label
1. Embeddings - BERTopic - GitHub Pages When new state-of-the-art pre-trained embedding models are released, BERTopic will be able to use them As a result, BERTopic grows with any new models being released Out of the box, BERTopic supports several embedding techniques In this section, we will go through several of them and how they can be implemented Sentence Transformers