—
title: Extracting Metadata¶ ↑
Before a file is uploaded, Shrine automatically extracts metadata from it, and stores them in the Shrine::UploadedFile object.
uploaded_file = uploader.upload(file) uploaded_file.metadata #=> # { # "size" => 345993, # "filename" => "matrix.mp4", # "mime_type" => "video/mp4", # }
Under the hood, Shrine#upload calls Shrine#extract_metadata, which you can also use directly to extract metadata from any IO object:
uploader.extract_metadata(io) #=> # { # "size" => 345993, # "filename" => "matrix.mp4", # "mime_type" => "video/mp4", # }
The following metadata is extracted by default:
| Key | Default source |
|---|---|
| ‘filename` | extracted from ‘io.original_filename` or `io.path` |
| ‘mime_type` | extracted from ‘io.content_type` |
| ‘size` | extracted from ‘io.size` |
Accessing metadata¶ ↑
You can access the stored metadata in three ways:
# via methods (if they're defined) uploaded_file.size uploaded_file.original_filename uploaded_file.mime_type # via the metadata hash uploaded_file.metadata["size"] uploaded_file.metadata["filename"] uploaded_file.metadata["mime_type"] # via the #[] operator uploaded_file["size"] uploaded_file["filename"] uploaded_file["mime_type"]
Controlling extraction¶ ↑
Shrine#upload accepts a :metadata option which accepts the following values:
-
Hash– adds/overrides extracted metadata with the given hashuploaded_file = uploader.upload(file, metadata: { "filename" => "Matrix[1999].mp4", "foo" => "bar" }) uploaded_file.original_filename #=> "Matrix[1999].mp4" uploaded_file.metadata["foo"] #=> "bar"
-
false– skips metadata extraction (useful in tests)uploaded_file = uploader.upload(file, metadata: false) uploaded_file.metadata #=> {}
-
true– forces metadata extraction when aShrine::UploadedFileis being uploaded (by default metadata is simply copied over)uploaded_file = uploader.upload(uploaded_file, metadata: true) uploaded_file.metadata # re-extracted metadata
MIME type¶ ↑
By default, the mime_type metadata will be copied over from the #content_type attribute of the input file (if present). However, since #content_type value comes from the Content-Type header of the upload request, it’s not guaranteed to hold the actual MIME type of the file (browser determines this header based on file extension).
Moreover, only ActionDispatch::Http::UploadedFile, Shrine::RackFile, and Shrine::DataFile objects have #content_type defined, so when uploading objects such as File, the mime_type value will be nil by default.
To remedy that, Shrine comes with a {determine_mime_type} plugin which is able to extract the MIME type from IO content:
# Gemfile gem "marcel", "~> 0.3"
Shrine.plugin :determine_mime_type, analyzer: :marcel
uploaded_file = uploader.upload StringIO.new("<?php ... ?>") uploaded_file.mime_type #=> "application/x-php"
You can choose different analyzers, and even mix-and-match them. See the {determine_mime_type} plugin docs for more details.
Image Dimensions¶ ↑
Shrine comes with a {store_dimensions} plugin for extracting image dimensions. It adds width and height metadata values, and also adds #width, #height, and #dimensions methods to the Shrine::UploadedFile object.
# Gemfile gem "fastimage" # default analyzer
Shrine.plugin :store_dimensions
uploaded_file = uploader.upload(image) uploaded_file.metadata["width"] #=> 1600 uploaded_file.metadata["height"] #=> 900 # convenience methods uploaded_file.width #=> 1600 uploaded_file.height #=> 900 uploaded_file.dimensions #=> [1600, 900]
By default, the plugin uses FastImage to analyze dimensions, but you can also have it use MiniMagick or ruby-vips. See the {store_dimensions} plugin docs for more details.
Custom metadata¶ ↑
In addition to the built-in metadata, Shrine allows you to extract and store any custom metadata, using the {add_metadata} plugin (which internally extends Shrine#extract_metadata).
For example, you might want to extract EXIF data from images:
# Gemfile gem "exiftool"
require "exiftool" class ImageUploader < Shrine plugin :add_metadata add_metadata :exif do |io, context| Shrine.with_file(io) do |file| Exiftool.new(file.path).to_hash end end end
uploaded_file = uploader.upload(image) uploaded_file.metadata["exif"] #=> {...} uploaded_file.exif #=> {...}
Or, if you’re uploading videos, you might want to extract some video-specific metadata:
# Gemfile gem "streamio-ffmpeg"
require "streamio-ffmpeg" class VideoUploader < Shrine plugin :add_metadata add_metadata do |io, context| movie = Shrine.with_file(io) { |file| FFMPEG::Movie.new(file.path) } { "duration" => movie.duration, "bitrate" => movie.bitrate, "resolution" => movie.resolution, "frame_rate" => movie.frame_rate } end end
uploaded_file = uploader.upload(video) uploaded_file.metadata #=> # { # ... # "duration" => 7.5, # "bitrate" => 481, # "resolution" => "640x480", # "frame_rate" => 16.72 # }
The yielded io object will not always be an object that responds to #path. For example, with the data_uri plugin the io can be a StringIO wrapper, while with restore_cached_data or refresh_metadata plugins the io might be a Shrine::UploadedFile object. So, we’re using Shrine.with_file to ensure we have a file object.
Adding metadata¶ ↑
If you wish to add metadata to an already attached file, you can do it as follows:
photo.image_attacher.add_metadata("foo" => "bar") photo.image.metadata #=> { ..., "foo" => "bar" } photo.save # persist changes
Metadata columns¶ ↑
If you want to write any of the metadata values into a separate database column on the record, you can use the metadata_attributes plugin.
Shrine.plugin :metadata_attributes, :mime_type => :type
photo = Photo.new(image: file) photo.image_type #=> "image/jpeg"
Direct uploads¶ ↑
When attaching files that were uploaded directly to the cloud or a tus server, Shrine won’t automatically extract metadata from them, instead it will copy any existing metadata that was set on the client side. The reason why this is the default behaviour is because metadata extraction requires (at least partially) retrieving file content from the storage, which could potentially be expensive depending on the storage and the type of metadata being extracted.
# no additional metadata will be extracted in this assignment by default photo.image = '{"id":"9e6581a4ea1.jpg","storage":"cache","metadata":{...}}'
Extracting on attachment¶ ↑
If you want metadata to be automatically extracted on assignment (which is useful if you want to validate the extracted metadata or have it immediately available for any other reason), you can load the restore_cached_data plugin:
Shrine.plugin :restore_cached_data # automatically extract metadata from cached files on assignment
photo.image = '{"id":"ks9elsd.jpg","storage":"cache","metadata":{}}' # metadata is extracted photo.image.metadata #=> # { # "size" => 4593484, # "filename" => "nature.jpg", # "mime_type" => "image/jpeg" # }
Extracting in the background¶ ↑
A) Extracting with promotion¶ ↑
If you’re using backgrounding, you can extract metadata during background promotion using the refresh_metadata plugin (which the restore_cached_data plugin uses internally):
Shrine.plugin :refresh_metadata # allow re-extracting metadata Shrine.plugin :backgrounding Shrine::Attacher.promote_block do PromoteJob.perform_async(self.class.name, record.class.name, record.id, name, file_data) end
class PromoteJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.refresh_metadata! # extract metadata attacher.atomic_promote end end
B) Extracting separately from promotion¶ ↑
You can also extract metadata in the background separately from promotion:
MetadataJob.perform_async( attacher.class.name, attacher.record.class.name, attacher.record.id, attacher.name, attacher.file_data, )
class MetadataJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.refresh_metadata! attacher.atomic_persist end end
Combining foreground and background¶ ↑
If you have some metadata that you want to extract in the foreground and some that you want to extract in the background, you can use the uploader context:
class VideoUploader < Shrine plugin :add_metadata add_metadata do |io, **options| next unless options[:background] # proceed only when `background: true` was specified # example of metadata extraction movie = Shrine.with_file(io) { |file| FFMPEG::Movie.new(file.path) } { "duration" => movie.duration, "bitrate" => movie.bitrate, "resolution" => movie.resolution, "frame_rate" => movie.frame_rate } end end
class PromoteJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.refresh_metadata!(background: true) # specify the flag attacher.atomic_promote end end
Now triggering metadata extraction in the controller on attachment (using restore_cached_data or refresh_metadata plugin) will skip the video metadata block, which will be triggered later in the background job.
Optimizations¶ ↑
If you want to do both metadata extraction and file processing during promotion, you can wrap both in an UploadedFile#open block to make sure the file content is retrieved from the storage only once.
class PromoteJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.file.open do attacher.refresh_metadata! attacher.create_derivatives end attacher.atomic_promote end end
If you’re dealing with large files and have metadata extractors that use Shrine.with_file, you might want to use the tempfile plugin to make sure the same copy of the uploaded file is reused for both metadata extraction and file processing.
Shrine.plugin :tempfile # load it globally so that it overrides `Shrine.with_file`
# ... attacher.file.open do attacher.refresh_metadata! attacher.create_derivatives(attacher.file.tempfile) end # ...