—
title: Extracting Metadata¶ ↑
Before a file is uploaded, Shrine
automatically extracts metadata from it, and stores them in the Shrine::UploadedFile
object.
uploaded_file = uploader.upload(file) uploaded_file.metadata #=> # { # "size" => 345993, # "filename" => "matrix.mp4", # "mime_type" => "video/mp4", # }
Under the hood, Shrine#upload
calls Shrine#extract_metadata
, which you can also use directly to extract metadata from any IO object:
uploader.extract_metadata(io) #=> # { # "size" => 345993, # "filename" => "matrix.mp4", # "mime_type" => "video/mp4", # }
The following metadata is extracted by default:
Key | Default source |
---|---|
‘filename` | extracted from ‘io.original_filename` or `io.path` |
‘mime_type` | extracted from ‘io.content_type` |
‘size` | extracted from ‘io.size` |
Accessing metadata¶ ↑
You can access the stored metadata in three ways:
# via methods (if they're defined) uploaded_file.size uploaded_file.original_filename uploaded_file.mime_type # via the metadata hash uploaded_file.metadata["size"] uploaded_file.metadata["filename"] uploaded_file.metadata["mime_type"] # via the #[] operator uploaded_file["size"] uploaded_file["filename"] uploaded_file["mime_type"]
Controlling extraction¶ ↑
Shrine#upload
accepts a :metadata
option which accepts the following values:
-
Hash
– adds/overrides extracted metadata with the given hashuploaded_file = uploader.upload(file, metadata: { "filename" => "Matrix[1999].mp4", "foo" => "bar" }) uploaded_file.original_filename #=> "Matrix[1999].mp4" uploaded_file.metadata["foo"] #=> "bar"
-
false
– skips metadata extraction (useful in tests)uploaded_file = uploader.upload(file, metadata: false) uploaded_file.metadata #=> {}
-
true
– forces metadata extraction when aShrine::UploadedFile
is being uploaded (by default metadata is simply copied over)uploaded_file = uploader.upload(uploaded_file, metadata: true) uploaded_file.metadata # re-extracted metadata
MIME type¶ ↑
By default, the mime_type
metadata will be copied over from the #content_type
attribute of the input file (if present). However, since #content_type
value comes from the Content-Type
header of the upload request, it’s not guaranteed to hold the actual MIME type of the file (browser determines this header based on file extension).
Moreover, only ActionDispatch::Http::UploadedFile
, Shrine::RackFile
, and Shrine::DataFile
objects have #content_type
defined, so when uploading objects such as File
, the mime_type
value will be nil by default.
To remedy that, Shrine
comes with a {determine_mime_type
} plugin which is able to extract the MIME type from IO content:
# Gemfile gem "marcel", "~> 0.3"
Shrine.plugin :determine_mime_type, analyzer: :marcel
uploaded_file = uploader.upload StringIO.new("<?php ... ?>") uploaded_file.mime_type #=> "application/x-php"
You can choose different analyzers, and even mix-and-match them. See the {determine_mime_type
} plugin docs for more details.
Image Dimensions¶ ↑
Shrine
comes with a {store_dimensions
} plugin for extracting image dimensions. It adds width
and height
metadata values, and also adds #width
, #height
, and #dimensions
methods to the Shrine::UploadedFile
object.
# Gemfile gem "fastimage" # default analyzer
Shrine.plugin :store_dimensions
uploaded_file = uploader.upload(image) uploaded_file.metadata["width"] #=> 1600 uploaded_file.metadata["height"] #=> 900 # convenience methods uploaded_file.width #=> 1600 uploaded_file.height #=> 900 uploaded_file.dimensions #=> [1600, 900]
By default, the plugin uses FastImage to analyze dimensions, but you can also have it use MiniMagick or ruby-vips. See the {store_dimensions
} plugin docs for more details.
Custom metadata¶ ↑
In addition to the built-in metadata, Shrine
allows you to extract and store any custom metadata, using the {add_metadata
} plugin (which internally extends Shrine#extract_metadata
).
For example, you might want to extract EXIF data from images:
# Gemfile gem "exiftool"
require "exiftool" class ImageUploader < Shrine plugin :add_metadata add_metadata :exif do |io, context| Shrine.with_file(io) do |file| Exiftool.new(file.path).to_hash end end end
uploaded_file = uploader.upload(image) uploaded_file.metadata["exif"] #=> {...} uploaded_file.exif #=> {...}
Or, if you’re uploading videos, you might want to extract some video-specific metadata:
# Gemfile gem "streamio-ffmpeg"
require "streamio-ffmpeg" class VideoUploader < Shrine plugin :add_metadata add_metadata do |io, context| movie = Shrine.with_file(io) { |file| FFMPEG::Movie.new(file.path) } { "duration" => movie.duration, "bitrate" => movie.bitrate, "resolution" => movie.resolution, "frame_rate" => movie.frame_rate } end end
uploaded_file = uploader.upload(video) uploaded_file.metadata #=> # { # ... # "duration" => 7.5, # "bitrate" => 481, # "resolution" => "640x480", # "frame_rate" => 16.72 # }
The yielded io
object will not always be an object that responds to #path
. For example, with the data_uri
plugin the io
can be a StringIO
wrapper, while with restore_cached_data
or refresh_metadata
plugins the io
might be a Shrine::UploadedFile
object. So, we’re using Shrine.with_file
to ensure we have a file object.
Adding metadata¶ ↑
If you wish to add metadata to an already attached file, you can do it as follows:
photo.image_attacher.add_metadata("foo" => "bar") photo.image.metadata #=> { ..., "foo" => "bar" } photo.save # persist changes
Metadata columns¶ ↑
If you want to write any of the metadata values into a separate database column on the record, you can use the metadata_attributes
plugin.
Shrine.plugin :metadata_attributes, :mime_type => :type
photo = Photo.new(image: file) photo.image_type #=> "image/jpeg"
Direct uploads¶ ↑
When attaching files that were uploaded directly to the cloud or a tus server, Shrine
won’t automatically extract metadata from them, instead it will copy any existing metadata that was set on the client side. The reason why this is the default behaviour is because metadata extraction requires (at least partially) retrieving file content from the storage, which could potentially be expensive depending on the storage and the type of metadata being extracted.
# no additional metadata will be extracted in this assignment by default photo.image = '{"id":"9e6581a4ea1.jpg","storage":"cache","metadata":{...}}'
Extracting on attachment¶ ↑
If you want metadata to be automatically extracted on assignment (which is useful if you want to validate the extracted metadata or have it immediately available for any other reason), you can load the restore_cached_data
plugin:
Shrine.plugin :restore_cached_data # automatically extract metadata from cached files on assignment
photo.image = '{"id":"ks9elsd.jpg","storage":"cache","metadata":{}}' # metadata is extracted photo.image.metadata #=> # { # "size" => 4593484, # "filename" => "nature.jpg", # "mime_type" => "image/jpeg" # }
Extracting in the background¶ ↑
A) Extracting with promotion¶ ↑
If you’re using backgrounding, you can extract metadata during background promotion using the refresh_metadata
plugin (which the restore_cached_data
plugin uses internally):
Shrine.plugin :refresh_metadata # allow re-extracting metadata Shrine.plugin :backgrounding Shrine::Attacher.promote_block do PromoteJob.perform_async(self.class.name, record.class.name, record.id, name, file_data) end
class PromoteJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.refresh_metadata! # extract metadata attacher.atomic_promote end end
B) Extracting separately from promotion¶ ↑
You can also extract metadata in the background separately from promotion:
MetadataJob.perform_async( attacher.class.name, attacher.record.class.name, attacher.record.id, attacher.name, attacher.file_data, )
class MetadataJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.refresh_metadata! attacher.atomic_persist end end
Combining foreground and background¶ ↑
If you have some metadata that you want to extract in the foreground and some that you want to extract in the background, you can use the uploader context:
class VideoUploader < Shrine plugin :add_metadata add_metadata do |io, **options| next unless options[:background] # proceed only when `background: true` was specified # example of metadata extraction movie = Shrine.with_file(io) { |file| FFMPEG::Movie.new(file.path) } { "duration" => movie.duration, "bitrate" => movie.bitrate, "resolution" => movie.resolution, "frame_rate" => movie.frame_rate } end end
class PromoteJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.refresh_metadata!(background: true) # specify the flag attacher.atomic_promote end end
Now triggering metadata extraction in the controller on attachment (using restore_cached_data
or refresh_metadata
plugin) will skip the video metadata block, which will be triggered later in the background job.
Optimizations¶ ↑
If you want to do both metadata extraction and file processing during promotion, you can wrap both in an UploadedFile#open
block to make sure the file content is retrieved from the storage only once.
class PromoteJob include Sidekiq::Worker def perform(attacher_class, record_class, record_id, name, file_data) attacher_class = Object.const_get(attacher_class) record = Object.const_get(record_class).find(record_id) # if using Active Record attacher = attacher_class.retrieve(model: record, name: name, file: file_data) attacher.file.open do attacher.refresh_metadata! attacher.create_derivatives end attacher.atomic_promote end end
If you’re dealing with large files and have metadata extractors that use Shrine.with_file
, you might want to use the tempfile
plugin to make sure the same copy of the uploaded file is reused for both metadata extraction and file processing.
Shrine.plugin :tempfile # load it globally so that it overrides `Shrine.with_file`
# ... attacher.file.open do attacher.refresh_metadata! attacher.create_derivatives(attacher.file.tempfile) end # ...