Vision Language Context-Based Classification

This Deep Learning Package (DLPK) serves as a bridge between ArcGIS Pro and advanced vision-language models developed by OpenAI and Meta, enabling powerful zero-shot object classification on imagery and raster data. Vision-language models combine image understanding with natural language processing, allowing users to classify objects without predefined categories. Instead, users can specify custom class labels at the time of analysis, offering greater flexibility and adaptability across various domains such as environmental science, urban planning, remote sensing, and disaster response. By integrating these models into ArcGIS Pro, the DLPK empowers professionals to extract deeper insights from spatial data through a multimodal AI approach. The package supports models like OpenAI’s GPT, which requires an internet connection and sends imagery and labels to OpenAI servers for processing. Alternatively, Meta’s LLaVA Vision model operates entirely on the local machine, preserving data privacy and functioning without internet access. Note that this DLPK is currently not supported in ArcGIS Online.