{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.11.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"nvidiaTeslaT4","dataSources":[],"dockerImageVersionId":31193,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":true}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Transformers: El Corazón de la IA Moderna (AI Student Collective Madrid)\n*Lauren Gallego Ropero -- 02/12/2025*\n\nEn este cuaderno se pueden encontrar todos los ejercicios prácticos que corresponden al taller. \nSe trata de cuatro ejemplos de uso de grandes modelos (todos basados en la arquitectura de Transformers) para tareas diferentes. \nEl objetivo es mostrar la versatilidad de estos modelos, además de lo sencillo que es utilizar muchos modelos open-source con la \nlibrería **Transformers** de **HuggingFace**.","metadata":{"_kg_hide-output":true}},{"cell_type":"code","source":"import os\nimport math\nimport torch\nimport matplotlib.pyplot as plt\nfrom PIL import Image\nfrom datasets import load_dataset\nfrom sklearn.decomposition import PCA\n\nfrom transformers import (\n    AutoTokenizer,\n    AutoModel,\n    AutoModelForCausalLM,\n    AutoModelForSequenceClassification,\n    AutoModelForSeq2SeqLM,\n    AutoFeatureExtractor,\n    AutoModelForImageClassification,\n    BertForSequenceClassification,\n    BertForQuestionAnswering,\n    CLIPProcessor,\n    CLIPModel,\n    pipeline\n)\nos.environ[\"TF_CPP_MIN_LOG_LEVEL\"] = \"3\"","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true,"execution":{"iopub.status.busy":"2025-12-01T17:15:39.760964Z","iopub.execute_input":"2025-12-01T17:15:39.761210Z"}},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"## 1. Embeddings\nEn esta primera tarea usaremos los embeddings preentrenados de uno de los primeros y más influyentes modelos transformer, BERT. \nPara ello tokenizaremos una frase, pasaremos cada token por el modelo y visualizaremos la representación de cada uno en dos dimensiones.","metadata":{}},{"cell_type":"code","source":"model_name = 'bert-base-uncased'\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModel.from_pretrained(model_name)\n\n\ntext = \"The quick brown fox jumps over the lazy dog while the cat sleeps quietly on the sofa.\"\ninputs = tokenizer(text, return_tensors=\"pt\")\n\nwith torch.no_grad():\n    embeddings = model.embeddings(inputs[\"input_ids\"])","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"tokens = tokenizer.convert_ids_to_tokens(inputs[\"input_ids\"][0])\nfor token, embedding in zip(tokens, embeddings[0]):\n    print(f\"{token:12s} -> {embedding[:5]} ...\")","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"# Pasamos los embeddings a CPU \nemb_np = embeddings[0].detach().numpy() \n\npca = PCA(n_components=2)\nemb_2d = pca.fit_transform(emb_np)","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"plt.figure(figsize=(8,6))\nplt.scatter(emb_2d[:,0], emb_2d[:,1])\n\nfor i, token in enumerate(tokens):\n    plt.text(emb_2d[i,0]+0.01, emb_2d[i,1]+0.01, token, fontsize=12)\n\nplt.title(\"Token embeddings visualized with PCA\")\nplt.xlabel(\"PC1\")\nplt.ylabel(\"PC2\")\nplt.grid(True)\nplt.show()","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Al final de cada ejercicio, eliminaremos el modelo y el tokenizer para limpiar el espacio en la GPU antes de guardar un nuevo modelo.","metadata":{}},{"cell_type":"code","source":"del model\ndel tokenizer\n\ntorch.cuda.empty_cache()","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"## 2. Clasificación de texto\n### 2.1 Zero-shot \nEn esta tarea utilizaremos un modelo de lenguaje encoder-decoder para 'zero-shot classification'. Esto significa que, dado un texto, \nel modelo lo clasificará entre una serie de categorías dadas. Lo peculiar es que el modelo no ha sido entrenado para esta tarea\ny nunca ha visto un ejemplo de ella, simplemente usa su conocimiento general.","metadata":{}},{"cell_type":"code","source":"model_name = \"facebook/bart-large-mnli\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name, device_map=\"auto\")\n\n# Utilizamos la función pipeline de Transformers\nclassifier = pipeline(\"zero-shot-classification\", model=model, tokenizer=tokenizer)","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"Introducimos nuestro texto y nuestras posibles categorías","metadata":{}},{"cell_type":"code","source":"text = \"In 'Breaking Bad', Walter White, a high school chemistry teacher, turns to cooking methamphetamine after being diagnosed with terminal cancer.\"\ncandidate_labels = [\"crime\", \"love\", \"fiction\"]\n\n\nresult = classifier(text, candidate_labels)\n\n# Visualización\nlabels = result['labels']\nscores = result['scores']\n\nplt.figure(figsize=(8,5))\nplt.bar(labels, scores, color='skyblue')\nplt.ylim(0, 1)\nplt.ylabel(\"Probabilidad\")\nplt.title(\"Resultados\")\nplt.show()","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"### 2.2 Sentiment analysis","metadata":{}},{"cell_type":"markdown","source":"Para la siguiente tarea, también clasificaremos texto, pero en esta ocasión utilizando un modelo que ha sido entrenado para una tarea en específico (datos específicos).\nEn este caso haremos **sentiment analysis**, una de las tareas más populares por sus numerosas aplicaciones y la gran cantidad de datos disponibles online.","metadata":{}},{"cell_type":"code","source":"dataset = load_dataset(\"tweet_eval\", \"sentiment\", split=\"train[:50]\")\n\nmodel_name = \"cardiffnlp/twitter-roberta-base-sentiment\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name, device_map=\"auto\")\n\nsentiment_classifier = pipeline(\"sentiment-analysis\", model=model, tokenizer=tokenizer)\nmodel_labels = {0: \"negative\", 1: \"neutral\", 2: \"positive\"}\n","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"texts = [str(t) for t in dataset[\"text\"]]\nresults = sentiment_classifier(texts, batch_size=16)  \n\n# Pasamos la respuesta a texto\npredictions = []\nfor res in results:\n    label_id = int(res[\"label\"].split(\"_\")[-1])  \n    predictions.append(model_labels[label_id])\n    \ntrue_labels = [model_labels[l] for l in dataset[\"label\"]]\n\n# Calculamos accuracy\naccuracy = sum([p == t for p, t in zip(predictions, true_labels)]) / len(true_labels)\nprint(f\"Accuracy en los primeros 50 tweets: {accuracy*100:.2f}%\")","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"print(\"\\nAlgunos Ejemplos:\")\nfor i in range(5):\n    print(f\"Tweet: {dataset[i]['text']}\")\n    print(f\"Valor real: {true_labels[i]}, Predicción: {predictions[i]}\\n\")","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"del model\ndel tokenizer\n\ntorch.cuda.empty_cache()","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"## 3. Generación de texto\nPara la siguiente tarea utilizaremos un modelo de lenguaje mucho más grande, el Mistral-7B-Instruct-v0.2\nNecesitaremos un modelo de lenguaje (decoder only) diseñado para la generación autoregresiva, es decir, predecir constantemente la siguiente palabra / token.\n\nSeguro que ya estás familiarizado con el formato de esta tarea: el modelo recibe un prompt de texto y nos devuelve una secuencia de texto.","metadata":{}},{"cell_type":"code","source":"model_name = \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\")\n\ndef run(prompt):\n    inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n    output = model.generate(**inputs, max_new_tokens=200)\n    print(tokenizer.decode(output[0], skip_special_tokens=True))\n","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"text = \"\"\"The Office\" is an American mockumentary sitcom that depicts the everyday lives of office employees working at Dunder Mifflin Paper Company in Scranton, Pennsylvania. \n          The show features a documentary-style filming with talking-head interviews from the staff. The series is known for its awkward humor, quirky characters, \n          and the often hilarious management style of regional manager Michael Scott. Key characters include Jim Halpert, Pam Beesly, Dwight Schrute, and many others who navigate \n          work, friendships, and office antics over the course of the series.\"\"\"\n\nquestion = \"Who is the regional manager of the Scranton branch in The Office?\"\n\n\nprompt = f\"\"\"\nBased on the following text, answer the question.\n\nText: \"{text}\"\n\nQuestion: \"{question}\"\n\nAnswer:\n\"\"\"\n\nrun(prompt)\n","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"del model\ndel tokenizer\n\ntorch.cuda.empty_cache()","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"## 4. Clasificación de imágenes\nPara esta última tarea necesitaremos un tipo de Transformer especial, capaz de procesar imágenes en lugar de text: el ViT o Vision Transformer.\nSe trata de un modelo muy similar al Transformer tradicional, simplemente modificando el procesamiento de los datos de entrada (cada pixel corresponde con un token).","metadata":{}},{"cell_type":"code","source":"model_name = \"nateraw/vit-base-patch16-224-cifar10\"\nmodel = AutoModelForImageClassification.from_pretrained(model_name)\nfeature_extractor = AutoFeatureExtractor.from_pretrained(model_name)\nmodel.eval()","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"labels = [\"airplane\",\"automobile\",\"bird\",\"cat\",\"deer\",\"dog\",\"frog\",\"horse\",\"ship\",\"truck\"]\ncifar = load_dataset(\"cifar10\", split=\"train\").shuffle(seed=33).select(range(200))","metadata":{"trusted":true},"outputs":[],"execution_count":null},{"cell_type":"code","source":"num_images = 9\n\n# Imágenes y ground thruth\nimages = [cifar[i][\"img\"].resize((224,224)) for i in range(num_images)]\ngts = [cifar[i][\"label\"] for i in range(num_images)]\n\n\n# Pasamos por el modelo\ninputs = feature_extractor(images=images, return_tensors=\"pt\")\nwith torch.no_grad():\n    logits = model(**inputs).logits\n\ntopk = torch.topk(torch.nn.functional.softmax(logits, dim=1), k=3, dim=1)\n\n# Plot\ncols = 3\nrows = math.ceil(num_images / cols)\nplt.figure(figsize=(15,15))\nfor i in range(num_images):\n    ax = plt.subplot(rows, cols, i+1)\n    plt.imshow(images[i])\n    plt.axis(\"off\")\n    pred_text = \"\\n\".join(\n        f\"{labels[idx.item()]} ({score.item():.2f})\"\n        for score, idx in zip(topk.values[i], topk.indices[i])\n    )\n    ax.set_title(f\"GT: {labels[gts[i]]}\\n{pred_text}\", fontsize=9)\n\nplt.tight_layout()\nplt.show()","metadata":{"trusted":true},"outputs":[],"execution_count":null}]}