2
lorentz
1y

Consider an API that uses the HTTP path to represent position in a tree that literally represents a file tree with minimal constraints, and GET/PUT/DELETE methods to read, write and destroy the nodes. How would you encode read/write operations to per-node metadata? The kinds of metadata are static and around 4, so inventing HTTP verbs for each of them is infeasible but filtering is not necessary.

Options considered so far:
- toplevel resources alongside a namespaced /data such as /acl, /lock
- magic keywords to the Range header (this is apparently compliant)
- mimetypes such as text/plain+acl
- SETPROP / PROP methods in the spirit of WebDAV
- headers (I worry this may become an immitigable bottleneck really fast)

I'm looking for any kind of suggestion or insight, not perfect answers.

I read the WebDAV specification and I won't even suggest that I'm trying to align with it, the only protocol I'd seen in the past with comparable scope bloat is WebRTC.

Comments
  • 0
    Use custom methods (either GET or POST):

    - POST /data:lock

    - POST /data:rename

    - ...

    (see https://cloud.google.com/apis/...)
  • 1
    So I i get you right you don't wanna put the metadata properties name and the fact in the Uri, because the uri allready represents the Mode. But maybe at the end like attributes in xpath?

    PATCH /path/to/node/@meta/name

    And body is the new name.

    Why do you think headers would be a bottle neck?
  • 0
    The methods introdiced with webdav have an advantage:
    1. No confuusimg paths
    2. public known methods, devs that use it find a reference what it means
    3. Future compatibility: if you decide to implement a subset of webdav, you only have to change the contenttype or have an wrapper that formats the output based on the queried contenttype.
  • 0
    "I have a bunch of nails and a hammer. I also need some screws to combine with those nails, so I wonder how I go about adding a screwdriver to my hammer?" You don't, that's how.
  • 1
    Make metadata part of the tree. The most naive variant would have /dir1/children/dir2/children/file/acl pointing at the ACL of /dir1/dir2/file. If you forbid another character in names, you can condense that into /dir1/dir2/file:acl:group:read pointing at the read flag for group in the ACL and the content being represented by /dir1/dir2/file:content or just /dir1/dir2/file.

    The major benefit of path-based distinction between meta data and primary content is that the identification of what is actually accessed is right in your face making it easy to spot when accessing the wrong thing.
  • 1
    @Oktokolo This looks very clean, I'm concerned about injection vulnerabilities though. If at any point something has access to an ACL and accepts a path without filtering it for forbidden characters, it may be abused to modify the ACL.
  • 0
    The first option with the children prefix is safe but pretty ugly.
  • 1
    @horus I suspect headers would become a bottleneck because it's possible that users will want to store and retrieve lots of tiny files where eg. including the ACL in every request would increase the transmitted data tenfold
  • 0
    @stop I don't want to support WebDAV because it is built entirely on a version of the XML namespace standard that is no longer in use. Supporting any subset of WebDAV would mean that my brand new API requires setting your XML parser in legacy mode from day 1. I have other complaints about WebDAV's property system too, but this is the main one that forces me to break compat altogether.
  • 0
    It's worth considering designing the protocol in such a way that it doesn't collide with WebDAV so that an implementation can support both on the same URL, but this is not a priority since a sibling path can always be used for WebDAV access.
  • 1
    You cannot use /meta at the end of the url - there may be a file with that name. Restricting/reserving characters is also poor option as this this creates usage limitations.

    Protocols tend to specify metadata at tge beginning of the frame, so in your case that would be beginning of the url: /meta/stat/path/to/file or /meta/acl/path/to/file or /data/path/to/file. This way you can separately get/post file and its metadata. And creates no limitations and imo is a clean approach.

    Another soln could be through headers - using a header you could specify you are aiming for metadata rather than data. To me that would be a second best option when prefixing with req meta path is not possible
  • 0
    @netikras In the case of the header option, do you think a magic mimetype, magic range, or a custom header is preferable?

    1st opt:
    Accept: application/json+meta
    Content-Type: application/json+meta
    2nd opt:
    Range: metadata
    3rd opt:
    X-Meta-Level: metadata (default is data)

    Now thinking about it, 1 and possibly 2 could also be prone to injection attacks since middlemen may allow clients to influence the value of these headers.
  • 1
    @lorentz I don't think Range fits the bill, as per https://developer.mozilla.org/en-US... it should have a strict format.

    Content-type - ... IDK, feels iffy to me.

    If I had to, I'd go with a custom header.
  • 1
    @lorentz I'm wondering what you mean by injection attacks....

    I have a vague idea what you could mean, but I think you're overcomplicating stuff without a clear reason.

    Man in the middle would mean that your API is http only without any form of authentication, hence my question.

    Regarding WebDAV: Don't.

    WebDAV is kind of an own universe, as it tends to many different things. If you want lean and mean, then WebDAV is like the people you see in TV shows like "My life with 300 kilograms".

    The other question I have is what you mean concrete by metadata.

    Headers are not endlessly long, they *should* be in ASCII (though this differs based on webserver implementation)

    Header length varies by server and client implementations, if my brain serves me right, it was 4k to 8k...

    Http 2 explicitly states that header names should be *sent* (network side) in lower case, however many applications (application side) still represent it with upper chars.

    So two important things: ASCII and max-length of headers.

    Headers are received usually without the body - an often underestimated performance win. Fast evaluation allows to prematurely end the connection if people try funky stuff...

    Reason why *security* related information like the host header, content encoding etc. *must* be encoded in the headers.

    So this might be another reason to encode metadata, despite the size limit, inside headers.

    X-Meta-Level.

    I really disrecommend two things:
    A) Using the X-prefix
    2) "misusage" of existing headers

    1) see e.g. https://rfc-editor.org/rfc/rfc6648/
    Use a vendor prefix, e.g. lorentz, if it tickles your fetish. Otherwise just name it properly so it is distinct from known header names (see IANA).

    2) don't. many webservers and client implementations exist. Unless you can prove that you break none of them and your design doesn't cause side effects (which is impossible)… don't.

    Still would be interested what you mean by metadata?
  • 1
    @netikras My bad, I was going off of https://httpwg.org/specs/... but I misread the rule for extensible ranges. Ranges need a distinguishing name, so a compliant format could be

    Range: meta-level=metadata

    I want to use alternative range specifiers more conservatively elsewhere anyway because I've been learning about z/OS and I've come to the conclusion that supporting row-based ranges on the API level improves the expressiveness of thr API calls massively, ultimately resulting in more efficient operation.

    Range: rows=0-100

    The benefits of this are immediately apparent in very large directory listings, which most remote file access protocols tend to struggle with.
  • 0
    @IntrusionCM

    On the security risks:
    Consider a third party service which has extensive access to my API and exposes some functionality to its users. By using path postfixes I introduce a risk into their systems that they need to consider and mitigate. This is generally the case when a parameter has a class of low risk values and a high risk value which also matches the most convenient validation rules for the low risk class (the intended input range).

    On the header:
    I forgot that the X prefix is no longer recommended, thanks for the heads up.

    On the metadata:
    ACLs and locked state are the only two metadata fields currently, but I'm not opposed to adding more as use cases develop with the constraint that metadata managed by the file store must be relevant to the file store and not just the contents of the files. Eg. the author's identity is NOT relevant because it can go in the content where this is meaningful at all, and for security the access logs show more than headers could.
  • 0
    @lorentz Injection is always a concern - but not hiding stuff in headers or other obscure places doesn't make injection harder or easier to do or detect.
  • 0
    @Oktokolo No. In this case the problem is specifically with encoding elevated access in the postfix of a string that's otherwise safe to forward, because this makes the obvious solution unsafe and the safe solution tedious. I can't explain how injection pertains to API design any better than I did above.
  • 0
    By now most people have learned not to blindly concat SQL or HTML, but literally every single codebase I've ever worked on concats HTTP paths.
  • 1
    @lorentz I fail to understand your argumentation...

    If you are worried about illegal chars inside an URL path - that's the servers job.

    A (good) server will filter already, many other servers have an explicit strict mode.

    Even if an client managed to somehow bypass the server side validation, I'd still assume that you validate an incoming query.

    In HTTPs there is no man in the middle.

    With authentication you could be sure that the API is only used by the authenticated users.

    A ':' is a valid character in an path.

    Where is this idea of injection coming from?

    Unless you explicitly allow it in your API / server, I don't know how.

    If the client builds an invalid URL, you should be able to deal with it inside your API and return eg. Not found / Bad request status?

    If you mean sth like HTTP smuggling attacks... Or header based attacks... Now that's another topic.
  • 2
    @lorentz There is no access right encoding in the path - it just points at meta data in addition to the file. First parse the request so you know what is to be accessed, then do safety and security checks, then perform the action.
Add Comment