Profanity dataset
Webb26 juli 2024 · The dataset is free to distribute and falls under CC0, with the underlying comment text being governed by Wikipedia’s CC-SA-3.0. This dataset contains … WebbWe're creating the world's largest profanity dataset, in 20+ languages. Dataset This repo contains 1600+ popular English profanities and their variations. Columns. text: the …
Profanity dataset
Did you know?
WebbWe use cookies on Kaggle to deliver our services, analyze web traffic, and improve your experience on the site. By using Kaggle, you agree to our use of cookies. Webb12 dec. 2024 · Benchmark dataset for low-resource multiclass classification, with 4,015 training, 500 testing, and 500 validation examples, each labeled as part of five classes. Each sample can be a part of multiple classes. Collected as tweets and originally used in Livelo & Cheng (2024). Pretrained ELECTRA Models
We organized the profanity into the following categories: 1. sexual anatomy / sexual acts (ass kisser, dick, pigfucker) 2. bodily fluids / excrement (shit, cum) 3. sexual orientation / gender (faggot, tranny, bitch, whore) 4. racial / ethnic (chink, n3gro) 5. mental disability (retard, dumbass) 6. physical disability … Visa mer This repo contains 1600+ popular English profanities and their variations. Columns 1. text: the profanity 2. canonical_form_1: the profanity's canonical form 3. … Visa mer We'll be adding more languages and profanity annotations (e.g., augmenting each profanity with its severity level, type, and other variations) over time. Check out our … Visa mer Need a larger set of expletives and slurs, or a list of swear words in other languages (Spanish, French, German, Japanese, Portuguese, etc)? We work with top … Visa mer WebbData Exploration This dataset contains 159,571 comments from Wikipedia. The data consists of one input feature, the string data for the comments, and six labels for different categories of toxic comments: toxic, severe_toxic, obscene, threat, insult, and identity_hate.
WebbThere are 2 profanity datasets available on data.world. Find open data about profanity contributed by thousands of users and organizations across the world. Linus Torvalds … WebbDeep-Learning Technology AI-based image moderation service. Our Automated Intelligent Moderation (AIM) API service offers 24/7 protection from the risks associated with having user-generated content on your brand channels—detecting and removing offensive content and unwanted images in real-time. Learn More Start your free trial.
Webb24 maj 2024 · The profanity vector helps improve the language modeling on the data by emphasizing the profane words used in each comment. Along with model training and fine-tuning, we initially pre-process the code-mixed data to deal with variations in spelling and transliteration. Pre-processing
Webb23 maj 2024 · profanity-check is anywhere from 300 - 4000 times faster than profanity-filter in this benchmark! Accuracy This table speaks for itself: See the How section below … chilbogo highschoolWebbOther important features of our dataset are: (1) 50 more movies compared to the original dataset (2) up to 12 most similar movies for each movie in the dataset along with the MPAA rating of those similar movies (3) users' vote for MPAA components (violence, profanity, sex, drug/alcohol, frightening) (4) reason of the MPAA rating published by the … gothic blanketWebbMultilingual swear profanity. Current dataset consist of swear profanity on six languages: French (fr) Turkish (tr) Italian (it) Russian (ru) Spanish (es) Portugalian (pt) Sources: … gothic blazer damenWebbDataset The rapid development in technology where anything is just one click away; it connects us globally. Despite all the positive aspects of this modern technology, it also increases the security risk. Cybersecurity becomes a critical concern now. chilbo high schoolWebbUse Surge AI’s global data labeling workforce and platform to power your content moderation, sentiment analysis, customer support, GPT-3 fine-tuning, and more. chilbong wifeWebbhate speech detection datasets for racial biases. We evaluate how classification models trained on these datasets perform in the field, comparing their predictions for tweets written in language used by whites or African-Americans. 3 Research design 3.1 Hate speech and abusive language datasets We focus on Twitter, the most widely used data chilbolton down farm stockbridgeWebbUseful Resources. Useful Resources. from Luis von Ahn's Research Group. Offensive/Profane Word List. Description: A list of 1,300+ English terms that could be found offensive. The list contains some words that many people won't find offensive, but it's a good start for anybody wanting to block offensive or profane terms on their Site. chilboost